Monitoring Service в VyOS

VyOS предоставляет встроенные возможности для сбора и экспорта метрик производительности, системных данных и логов через Telegraf и Prometheus exporters, обеспечивая полный мониторинг сетевой инфраструктуры.

Обзор

Компоненты мониторинга

VyOS поддерживает два основных механизма мониторинга:

Telegraf - универсальный агент сбора метрик:

  • Сбор системных метрик (CPU, memory, disk, network)
  • Экспорт в множество систем мониторинга
  • Поддержка различных output plugins
  • Обработка логов и событий

Prometheus Exporters - специализированные экспортеры метрик:

  • Node Exporter - системные метрики оборудования и ОС
  • FRR Exporter - метрики маршрутизации Free Range Routing
  • Blackbox Exporter - проверка доступности сервисов

Поддерживаемые системы мониторинга

Telegraf может экспортировать метрики в:

  • InfluxDB - Time-series база данных
  • Prometheus - Система мониторинга и алертинга
  • Azure Data Explorer - Аналитическая платформа Microsoft Azure
  • Splunk - Платформа для анализа данных
  • Loki - Система агрегации логов от Grafana

Telegraf Configuration

Базовая настройка Telegraf

# Включить Telegraf
set service monitoring telegraf

commit
save

Системные метрики

Telegraf автоматически собирает:

  • CPU usage и load average
  • Memory и swap utilization
  • Disk I/O и space usage
  • Network interface statistics
  • System uptime

InfluxDB Integration

Конфигурация InfluxDB output

# Организация и bucket
set service monitoring telegraf influxdb authentication organization 'vyos-monitoring'
set service monitoring telegraf influxdb bucket 'vyos-metrics'

# Authentication token
set service monitoring telegraf influxdb authentication token 'YOUR_INFLUXDB_TOKEN'

# InfluxDB server URL
set service monitoring telegraf influxdb url 'http://influxdb.example.com'
set service monitoring telegraf influxdb port '8086'

commit
save

Параметры InfluxDB

Organization - организация в InfluxDB 2.x:

set service monitoring telegraf influxdb authentication organization 'company-ops'

Bucket - целевой bucket для метрик:

set service monitoring telegraf influxdb bucket 'network-metrics'

Token - authentication token для InfluxDB API:

set service monitoring telegraf influxdb authentication token 'ZAml9Uy5wrhA...=='

URL и Port:

set service monitoring telegraf influxdb url 'https://influxdb.cloud.example.com'
set service monitoring telegraf influxdb port '443'

Пример полной конфигурации InfluxDB

# InfluxDB Cloud configuration
set service monitoring telegraf influxdb authentication organization 'network-team'
set service monitoring telegraf influxdb authentication token 'eyJrIjoiVGVzdCIsIm4iOiJUZXN0...'
set service monitoring telegraf influxdb bucket 'vyos-production'
set service monitoring telegraf influxdb url 'https://eu-central-1-1.aws.cloud2.influxdata.com'
set service monitoring telegraf influxdb port '443'

commit
save

Prometheus Client Integration

Конфигурация Prometheus output

Telegraf может экспортировать метрики в формате Prometheus:

# Включить Prometheus client
set service monitoring telegraf prometheus-client

# Адрес и порт для scraping
set service monitoring telegraf prometheus-client listen-address '0.0.0.0'
set service monitoring telegraf prometheus-client port '9273'

# Разрешить доступ с Prometheus server
set service monitoring telegraf prometheus-client allow-from '192.168.1.100/32'
set service monitoring telegraf prometheus-client allow-from '10.0.0.0/8'

commit
save

Параметры Prometheus Client

Listen address:

set service monitoring telegraf prometheus-client listen-address '192.168.1.1'

Port (default: 9273):

set service monitoring telegraf prometheus-client port '9273'

Allow-from - network ACL:

set service monitoring telegraf prometheus-client allow-from '192.168.1.0/24'
set service monitoring telegraf prometheus-client allow-from '10.0.0.0/8'

HTTP Authentication (опционально):

set service monitoring telegraf prometheus-client authentication username 'prometheus'
set service monitoring telegraf prometheus-client authentication password 'secure_password'

Metric version:

set service monitoring telegraf prometheus-client metric-version 2

Prometheus scrape configuration

Добавьте в prometheus.yml на Prometheus server:

scrape_configs:
  - job_name: 'vyos-telegraf'
    static_configs:
      - targets:
        - '192.168.1.1:9273'
        labels:
          hostname: 'vyos-router-01'
          environment: 'production'

    # Если настроена аутентификация
    basic_auth:
      username: 'prometheus'
      password: 'secure_password'

    scrape_interval: 30s
    scrape_timeout: 10s

Azure Data Explorer Integration

Конфигурация Azure Data Explorer

# Azure AD authentication
set service monitoring telegraf azure-data-explorer authentication client-id 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
set service monitoring telegraf azure-data-explorer authentication client-secret 'YOUR_CLIENT_SECRET'
set service monitoring telegraf azure-data-explorer authentication tenant-id 'YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYYYY'

# Database configuration
set service monitoring telegraf azure-data-explorer database 'NetworkMetrics'
set service monitoring telegraf azure-data-explorer url 'https://mycluster.region.kusto.windows.net'

# Metrics grouping
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'SingleTable'

commit
save

Параметры Azure Data Explorer

Authentication - Service Principal credentials:

set service monitoring telegraf azure-data-explorer authentication client-id 'app-id'
set service monitoring telegraf azure-data-explorer authentication client-secret 'secret'
set service monitoring telegraf azure-data-explorer authentication tenant-id 'tenant-id'

Database:

set service monitoring telegraf azure-data-explorer database 'VyOSMetrics'

URL - Kusto cluster URL:

set service monitoring telegraf azure-data-explorer url 'https://vyoscluster.eastus.kusto.windows.net'

Metrics Grouping:

# SingleTable - все метрики в одной таблице
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'SingleTable'

# TablePerMetric - отдельная таблица для каждой метрики
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'TablePerMetric'

Splunk Integration

Конфигурация Splunk output

# Splunk HEC endpoint
set service monitoring telegraf splunk url 'https://splunk.example.com:8088'

# HEC token
set service monitoring telegraf splunk authentication token 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

# Source и sourcetype
set service monitoring telegraf splunk source 'vyos-router-01'
set service monitoring telegraf splunk sourcetype 'vyos:metrics'

commit
save

Параметры Splunk

URL - HTTP Event Collector endpoint:

set service monitoring telegraf splunk url 'https://splunk-hec.company.com:8088'

Authentication token:

set service monitoring telegraf splunk authentication token 'B5A79AAD-D822-46CC-80D1-819F80D7BFB0'

Source identifier:

set service monitoring telegraf splunk source 'vyos-gateway'

Sourcetype:

set service monitoring telegraf splunk sourcetype 'network:metrics'

Index (опционально):

set service monitoring telegraf splunk index 'network_metrics'

Loki Integration

Конфигурация Loki output

# Loki server URL
set service monitoring telegraf loki url 'http://loki.example.com'
set service monitoring telegraf loki port '3100'

# Опциональная аутентификация
set service monitoring telegraf loki authentication username 'loki-user'
set service monitoring telegraf loki authentication password 'secure_pass'

# Metric name as label
set service monitoring telegraf loki metric-name-label '__name__'

commit
save

Параметры Loki

URL и Port:

set service monitoring telegraf loki url 'https://loki.grafana.cloud'
set service monitoring telegraf loki port '443'

Authentication:

set service monitoring telegraf loki authentication username 'user'
set service monitoring telegraf loki authentication password 'api-key'

Labels:

set service monitoring telegraf loki metric-name-label 'metric_name'

Prometheus Node Exporter

Node Exporter предоставляет детальные системные метрики для Prometheus.

Базовая конфигурация

# Включить Node Exporter
set service monitoring prometheus node-exporter

# Listen address и port
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'

commit
save

VRF Support

# Запустить в определенном VRF
set service monitoring prometheus node-exporter vrf 'MGMT'

Textfile Collector

Для custom метрик через textfile collector:

# Директория для textfile метрик
# По умолчанию: /var/lib/prometheus/node-exporter/

Создайте файл метрики:

echo 'custom_metric{label="value"} 123' > /var/lib/prometheus/node-exporter/custom.prom

Метрики Node Exporter

Node Exporter предоставляет метрики:

  • CPU: usage, frequency, thermal throttling
  • Memory: total, free, cached, buffers, swap
  • Disk: I/O statistics, space usage, inodes
  • Network: bytes/packets TX/RX, errors, drops
  • Filesystem: mount points, disk usage
  • System: load average, uptime, context switches
  • Hardware: temperature sensors, fans (если доступно)

Prometheus scrape config для Node Exporter

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets:
        - '192.168.1.1:9100'
        labels:
          hostname: 'vyos-router-01'
          role: 'gateway'
          site: 'datacenter-1'

Prometheus FRR Exporter

FRR Exporter предоставляет метрики маршрутизации Free Range Routing.

Конфигурация FRR Exporter

# Включить FRR Exporter
set service monitoring prometheus frr-exporter

# Listen address и port
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'

# VRF
set service monitoring prometheus frr-exporter vrf 'MGMT'

commit
save

Метрики FRR Exporter

FRR Exporter предоставляет метрики:

  • BGP: peers status, prefixes received/advertised, session uptime
  • OSPF: neighbors, LSA counts, areas
  • RIP: neighbors, routes
  • ISIS: adjacencies, LSP counts
  • BFD: sessions status
  • Route counts: IPv4/IPv6 routes по protocol
  • VRF: routing table statistics per VRF

Prometheus scrape config для FRR

scrape_configs:
  - job_name: 'frr-exporter'
    static_configs:
      - targets:
        - '192.168.1.1:9342'
        labels:
          hostname: 'vyos-router-01'
          asn: '65001'

Prometheus Blackbox Exporter

Blackbox Exporter проверяет доступность внешних сервисов через HTTP, HTTPS, DNS, TCP, ICMP, gRPC.

Базовая конфигурация

# Включить Blackbox Exporter
set service monitoring prometheus blackbox-exporter

# Listen address и port
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'

# VRF
set service monitoring prometheus blackbox-exporter vrf 'MGMT'

commit
save

DNS Module Configuration

# DNS probe module
set service monitoring prometheus blackbox-exporter modules dns name 'dns4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-name 'example.com'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-type 'A'

# DNS сервер для запроса
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-address '8.8.8.8'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-port '53'

# Transport protocol
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' transport 'udp'

# Timeout
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' timeout '5'

commit
save

ICMP Module Configuration

# ICMP ping module (IPv4)
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' ttl '64'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' timeout '5'

# ICMP ping module (IPv6)
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' preferred-ip-protocol 'ip6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' timeout '5'

commit
save

HTTP Module Configuration

# HTTP probe
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' timeout '10'

# Expected status codes
# По умолчанию проверяет 2xx коды

# TLS configuration
set service monitoring prometheus blackbox-exporter modules http name 'https_check'
set service monitoring prometheus blackbox-exporter modules http name 'https_check' preferred-ip-protocol 'ip4'
# TLS проверка включена автоматически для https:// URL

commit
save

TCP Module Configuration

# TCP connection probe
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' timeout '5'

commit
save

gRPC Module Configuration

# gRPC health check
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health'
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' timeout '5'

# Service name для gRPC health check
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' service 'my-grpc-service'

# TLS для gRPC
# Автоматически включается для grpcs:// scheme

commit
save

Prometheus configuration для Blackbox

scrape_configs:
  # Blackbox exporter endpoint
  - job_name: 'blackbox-exporter'
    static_configs:
      - targets:
        - '192.168.1.1:9115'

  # ICMP probes
  - job_name: 'blackbox-icmp'
    metrics_path: /probe
    params:
      module: [icmp4]
    static_configs:
      - targets:
        - 8.8.8.8
        - 1.1.1.1
        - google.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.1.1:9115

  # HTTP probes
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://example.com
        - https://api.example.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.1.1:9115

  # DNS probes
  - job_name: 'blackbox-dns'
    metrics_path: /probe
    params:
      module: [dns4]
    static_configs:
      - targets:
        - example.com
        - google.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.1.1:9115

  # TCP probes
  - job_name: 'blackbox-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - example.com:443
        - smtp.example.com:25
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.1.1:9115

Grafana Dashboards

InfluxDB Dashboard

Grafana Data Source - InfluxDB v2:

Type: InfluxDB
URL: http://influxdb.example.com:8086
Organization: vyos-monitoring
Token: YOUR_INFLUXDB_TOKEN
Default Bucket: vyos-metrics

InfluxQL Query Example:

from(bucket: "vyos-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_idle")
  |> aggregateWindow(every: 1m, fn: mean)

Dashboard Panels:

  • CPU Usage (100 - idle%)
  • Memory Usage (used/total * 100)
  • Network Traffic (bytes TX/RX per interface)
  • Disk I/O (read/write bytes)
  • System Load Average

Prometheus Dashboard

Grafana Data Source - Prometheus:

Type: Prometheus
URL: http://prometheus.example.com:9090

PromQL Query Examples:

CPU Usage:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Network Traffic (TX):

rate(node_network_transmit_bytes_total{device="eth0"}[5m])

Network Traffic (RX):

rate(node_network_receive_bytes_total{device="eth0"}[5m])

Disk Usage:

(node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs|squashfs|vfat"} - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100

BGP Peers (FRR):

frr_bgp_peer_up

BGP Prefixes Received:

frr_bgp_peer_prefixes_received_count

Blackbox Probe Success:

probe_success

HTTP Response Time:

probe_http_duration_seconds

DNS Query Time:

probe_dns_lookup_time_seconds

Recommended Grafana Dashboards

Node Exporter Full:

Telegraf System Overview:

Blackbox Exporter:

Custom VyOS Dashboard Panels

System Overview Panel:

  • Hostname
  • Uptime
  • VyOS Version
  • System Load
  • CPU Temperature (если доступно)

Network Overview Panel:

  • Total Bandwidth TX/RX
  • Interface Status (Up/Down)
  • Packet Errors
  • Packet Drops

Routing Panel:

  • BGP Sessions Status
  • OSPF Neighbors
  • Route Counts по Protocol
  • Prefix Limits

Security Panel:

  • Firewall Dropped Packets
  • Connection Tracking Usage
  • Failed Login Attempts (из syslog)

Примеры конфигураций

Пример 1: Basic Prometheus Monitoring

# Node Exporter для системных метрик
set service monitoring prometheus node-exporter
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'

# FRR Exporter для метрик маршрутизации
set service monitoring prometheus frr-exporter
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'

# Blackbox для проверки доступности
set service monitoring prometheus blackbox-exporter
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'

# ICMP module
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'

# Firewall для Prometheus server
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9115,9342'
set firewall ipv4 input filter rule 100 protocol tcp

commit
save

Пример 2: InfluxDB Cloud Integration

# Telegraf с InfluxDB Cloud
set service monitoring telegraf influxdb authentication organization 'my-company'
set service monitoring telegraf influxdb authentication token 'eyJrIjoiVGVzdCIsIm4iOiJUZXN0...'
set service monitoring telegraf influxdb bucket 'vyos-prod'
set service monitoring telegraf influxdb url 'https://eu-central-1-1.aws.cloud2.influxdata.com'
set service monitoring telegraf influxdb port '443'

commit
save

Пример 3: Hybrid Monitoring (Telegraf + Prometheus)

# Telegraf с Prometheus client output
set service monitoring telegraf prometheus-client
set service monitoring telegraf prometheus-client listen-address '0.0.0.0'
set service monitoring telegraf prometheus-client port '9273'
set service monitoring telegraf prometheus-client allow-from '192.168.1.100/32'

# Node Exporter для дополнительных метрик
set service monitoring prometheus node-exporter
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'

# FRR Exporter
set service monitoring prometheus frr-exporter
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'

# Firewall
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9273,9342'
set firewall ipv4 input filter rule 100 protocol tcp

commit
save

Пример 4: Enterprise Multi-Output

# InfluxDB для long-term storage
set service monitoring telegraf influxdb authentication organization 'network-ops'
set service monitoring telegraf influxdb authentication token 'INFLUX_TOKEN'
set service monitoring telegraf influxdb bucket 'network-metrics'
set service monitoring telegraf influxdb url 'https://influx.company.com'
set service monitoring telegraf influxdb port '8086'

# Prometheus для real-time alerting
set service monitoring telegraf prometheus-client
set service monitoring telegraf prometheus-client listen-address '10.0.0.1'
set service monitoring telegraf prometheus-client port '9273'
set service monitoring telegraf prometheus-client allow-from '10.0.0.100/32'

# Splunk для logs и SIEM
set service monitoring telegraf splunk url 'https://splunk.company.com:8088'
set service monitoring telegraf splunk authentication token 'SPLUNK_HEC_TOKEN'
set service monitoring telegraf splunk source 'vyos-gateway-01'
set service monitoring telegraf splunk sourcetype 'vyos:metrics'

commit
save

Пример 5: Blackbox Probing

# Blackbox Exporter
set service monitoring prometheus blackbox-exporter
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'

# ICMP IPv4
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' ttl '64'

# ICMP IPv6
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' preferred-ip-protocol 'ip6'

# DNS проверка (IPv4)
set service monitoring prometheus blackbox-exporter modules dns name 'dns4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-name 'example.com'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-type 'A'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-address '8.8.8.8'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' transport 'udp'

# HTTP проверка
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' timeout '10'

# TCP проверка
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' timeout '5'

commit
save

Пример 6: VRF Isolation

# Management VRF для мониторинга
set service monitoring prometheus node-exporter vrf 'MGMT'
set service monitoring prometheus node-exporter listen-address '10.0.0.1'
set service monitoring prometheus node-exporter port '9100'

set service monitoring prometheus frr-exporter vrf 'MGMT'
set service monitoring prometheus frr-exporter listen-address '10.0.0.1'
set service monitoring prometheus frr-exporter port '9342'

set service monitoring prometheus blackbox-exporter vrf 'MGMT'
set service monitoring prometheus blackbox-exporter listen-address '10.0.0.1'
set service monitoring prometheus blackbox-exporter port '9115'

commit
save

Операционные команды

Проверка статуса Telegraf

# Service status
show service monitoring telegraf

# Check процесс
run show system processes | grep telegraf

Проверка Prometheus Exporters

# Node Exporter metrics
curl http://localhost:9100/metrics

# FRR Exporter metrics
curl http://localhost:9342/metrics

# Blackbox Exporter health
curl http://localhost:9115/health

Тестирование Blackbox Probes

# ICMP probe
curl 'http://localhost:9115/probe?module=icmp4&target=8.8.8.8'

# HTTP probe
curl 'http://localhost:9115/probe?module=http_2xx&target=https://example.com'

# DNS probe
curl 'http://localhost:9115/probe?module=dns4&target=example.com'

# TCP probe
curl 'http://localhost:9115/probe?module=tcp_connect&target=example.com:443'

Restart Services

# Restart Telegraf
restart monitoring telegraf

# Note: Prometheus exporters restart автоматически при изменении конфигурации

Мониторинг и диагностика

Проверка метрик

Node Exporter endpoint:

curl http://localhost:9100/metrics | head -20

Фильтр specific metrics:

curl http://localhost:9100/metrics | grep node_cpu
curl http://localhost:9100/metrics | grep node_memory
curl http://localhost:9100/metrics | grep node_network

Логи

# Telegraf logs
show log | match telegraf

# System logs
show log | match monitoring

Firewall проверка

# Проверить правила для monitoring ports
show firewall ipv4 input filter

# Test connectivity с Prometheus server
ping 192.168.1.100

Performance

# Проверить CPU usage от exporters
run show system processes | grep -E 'node_exporter|frr_exporter|blackbox'

# Network connections
run netstat -tulpn | grep -E '9100|9115|9273|9342'

Troubleshooting

Telegraf не отправляет метрики в InfluxDB

Проблема: Метрики не появляются в InfluxDB.

Причины:

  1. Неправильный token или organization
  2. Network connectivity issues
  3. Firewall блокирует исходящие соединения
  4. Неправильный bucket name

Диагностика:

# Проверить конфигурацию
show service monitoring telegraf influxdb

# Проверить connectivity
ping influxdb.example.com

# Проверить logs
show log | match telegraf

Решение:

# Проверить authentication
set service monitoring telegraf influxdb authentication token 'CORRECT_TOKEN'

# Проверить URL
set service monitoring telegraf influxdb url 'https://correct-url.influxdata.com'

# Commit и restart
commit
restart monitoring telegraf

Prometheus не может scrape метрики

Проблема: Prometheus targets показывают “down” status.

Причины:

  1. Firewall блокирует порты
  2. Exporter не запущен
  3. Listen address неправильный
  4. Network routing issues

Диагностика:

# Проверить exporters
curl http://localhost:9100/metrics
curl http://localhost:9342/metrics

# Проверить listening ports
run netstat -tulpn | grep -E '9100|9342'

# Firewall
show firewall ipv4 input filter

Решение:

# Firewall правило
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9342,9115'
set firewall ipv4 input filter rule 100 protocol tcp

# Listen на правильном interface
set service monitoring prometheus node-exporter listen-address '192.168.1.1'

commit

Blackbox probe failures

Проблема: Blackbox probes показывают probe_success = 0.

Причины:

  1. Target недоступен
  2. Timeout слишком короткий
  3. Firewall блокирует ICMP/DNS/HTTP
  4. Неправильная конфигурация module

Диагностика:

# Test probe вручную
curl 'http://localhost:9115/probe?module=icmp4&target=8.8.8.8'

# Проверить connectivity
ping 8.8.8.8

# DNS test
dig @8.8.8.8 example.com

Решение:

# Увеличить timeout
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' timeout '10'

# Проверить firewall для исходящих ICMP
show firewall ipv4 output filter

commit

FRR Exporter не показывает BGP metrics

Проблема: BGP метрики отсутствуют или нулевые.

Причины:

  1. BGP не настроен
  2. FRR daemon не запущен
  3. Exporter не имеет доступа к FRR vtysh

Диагностика:

# Проверить BGP
show ip bgp summary

# Проверить FRR
run show system processes | grep bgpd

# Test FRR exporter
curl http://localhost:9342/metrics | grep bgp

Решение:

# Restart FRR exporter
commit
# Exporter перезапустится автоматически

# Проверить FRR configuration
show protocols bgp

High memory usage от Telegraf

Проблема: Telegraf потребляет много памяти.

Причины:

  1. Слишком много output plugins
  2. Buffering issues
  3. High metrics cardinality

Решение:

  • Используйте только необходимые outputs
  • Настройте metric filtering
  • Увеличьте flush interval

Metrics missing в Grafana

Проблема: Панели в Grafana пустые или показывают “No data”.

Причины:

  1. Data source неправильно настроен
  2. Query syntax ошибка
  3. Time range не содержит данных
  4. Metric name изменился

Диагностика:

# Проверить что метрики доступны
curl http://192.168.1.1:9100/metrics | grep node_cpu

# Проверить в Grafana Query Inspector
# Grafana → Panel → Query Inspector → Query

Решение:

  • Проверить Data Source connectivity в Grafana
  • Validate PromQL query syntax
  • Adjust time range
  • Check metric names в /metrics endpoint

Безопасность

Firewall Protection

# Allow только с monitoring servers
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9115,9273,9342'
set firewall ipv4 input filter rule 100 protocol tcp

# Drop остальные
set firewall ipv4 input filter rule 999 action drop
set firewall ipv4 input filter rule 999 destination port '9100,9115,9273,9342'
set firewall ipv4 input filter rule 999 protocol tcp

HTTP Authentication для Prometheus Client

set service monitoring telegraf prometheus-client authentication username 'prometheus'
set service monitoring telegraf prometheus-client authentication password 'SecurePassword123!'

VRF Isolation

# Isolate monitoring в management VRF
set service monitoring prometheus node-exporter vrf 'MGMT'
set service monitoring prometheus frr-exporter vrf 'MGMT'
set service monitoring prometheus blackbox-exporter vrf 'MGMT'

TLS для Telegraf outputs

Для InfluxDB Cloud и Splunk используйте HTTPS URLs:

set service monitoring telegraf influxdb url 'https://influxdb.example.com'
set service monitoring telegraf splunk url 'https://splunk.example.com:8088'

Secrets Management

Никогда не коммитьте tokens в version control:

  • Используйте environment variables где возможно
  • Rotate tokens регулярно
  • Используйте read-only tokens где возможно

Лучшие практики

  1. Use VRF для management traffic:

    • Изолировать monitoring в dedicated VRF
    • Protect management network
  2. Firewall restrictive rules:

    • Allow только specific monitoring servers
    • Block public access к exporters
  3. Monitoring redundancy:

    • Multiple Prometheus servers
    • InfluxDB clustering
    • Grafana high availability
  4. Metric retention:

    • Short-term в Prometheus (15-30 дней)
    • Long-term в InfluxDB (1+ год)
  5. Alerting:

    • Configure Prometheus Alertmanager
    • Alert на critical метриках:
      • CPU > 80%
      • Memory > 90%
      • Disk > 85%
      • Interface down
      • BGP session down
  6. Dashboard organization:

    • Separate dashboards по функции
    • Use folders в Grafana
    • Standard naming conventions
  7. Performance:

    • Tune scrape intervals (30s для most cases)
    • Limit metric cardinality
    • Use recording rules для expensive queries
  8. Documentation:

    • Document custom metrics
    • Maintain dashboard inventory
    • Document alerting thresholds
  9. Testing:

    • Test queries перед deployment
    • Validate alert rules
    • Test failover scenarios
  10. Regular reviews:

    • Review metrics usage
    • Cleanup unused dashboards
    • Update alerting rules
    • Rotate credentials

Заключение

VyOS предоставляет comprehensive monitoring capabilities через Telegraf и Prometheus exporters, обеспечивая полную visibility в производительность и состояние сетевой инфраструктуры.

Основные возможности:

  • Telegraf - универсальный агент для multiple outputs (InfluxDB, Prometheus, Splunk, Azure, Loki)
  • Node Exporter - детальные системные метрики (CPU, memory, disk, network)
  • FRR Exporter - routing protocol metrics (BGP, OSPF, ISIS)
  • Blackbox Exporter - service availability probing (HTTP, DNS, ICMP, TCP)

Integration:

  • Seamless integration с Prometheus и Grafana
  • Native support для InfluxDB и Splunk
  • Cloud-ready (Azure Data Explorer, InfluxDB Cloud)

Рекомендации:

  • Используйте Prometheus + Grafana для real-time monitoring и alerting
  • Используйте InfluxDB для long-term storage и capacity planning
  • Настройте Blackbox Exporter для proactive service monitoring
  • Protect exporters с firewall и VRF isolation

Правильная конфигурация мониторинга обеспечивает proactive problem detection, capacity planning, и troubleshooting capabilities для сетевой инфраструктуры на базе VyOS.