Monitoring Service в VyOS
VyOS предоставляет встроенные возможности для сбора и экспорта метрик производительности, системных данных и логов через Telegraf и Prometheus exporters, обеспечивая полный мониторинг сетевой инфраструктуры.
Обзор
Компоненты мониторинга
VyOS поддерживает два основных механизма мониторинга:
Telegraf - универсальный агент сбора метрик:
- Сбор системных метрик (CPU, memory, disk, network)
- Экспорт в множество систем мониторинга
- Поддержка различных output plugins
- Обработка логов и событий
Prometheus Exporters - специализированные экспортеры метрик:
- Node Exporter - системные метрики оборудования и ОС
- FRR Exporter - метрики маршрутизации Free Range Routing
- Blackbox Exporter - проверка доступности сервисов
Поддерживаемые системы мониторинга
Telegraf может экспортировать метрики в:
- InfluxDB - Time-series база данных
- Prometheus - Система мониторинга и алертинга
- Azure Data Explorer - Аналитическая платформа Microsoft Azure
- Splunk - Платформа для анализа данных
- Loki - Система агрегации логов от Grafana
Telegraf Configuration
Базовая настройка Telegraf
# Включить Telegraf
set service monitoring telegraf
commit
saveСистемные метрики
Telegraf автоматически собирает:
- CPU usage и load average
- Memory и swap utilization
- Disk I/O и space usage
- Network interface statistics
- System uptime
InfluxDB Integration
Конфигурация InfluxDB output
# Организация и bucket
set service monitoring telegraf influxdb authentication organization 'vyos-monitoring'
set service monitoring telegraf influxdb bucket 'vyos-metrics'
# Authentication token
set service monitoring telegraf influxdb authentication token 'YOUR_INFLUXDB_TOKEN'
# InfluxDB server URL
set service monitoring telegraf influxdb url 'http://influxdb.example.com'
set service monitoring telegraf influxdb port '8086'
commit
saveПараметры InfluxDB
Organization - организация в InfluxDB 2.x:
set service monitoring telegraf influxdb authentication organization 'company-ops'Bucket - целевой bucket для метрик:
set service monitoring telegraf influxdb bucket 'network-metrics'Token - authentication token для InfluxDB API:
set service monitoring telegraf influxdb authentication token 'ZAml9Uy5wrhA...=='URL и Port:
set service monitoring telegraf influxdb url 'https://influxdb.cloud.example.com'
set service monitoring telegraf influxdb port '443'Пример полной конфигурации InfluxDB
# InfluxDB Cloud configuration
set service monitoring telegraf influxdb authentication organization 'network-team'
set service monitoring telegraf influxdb authentication token 'eyJrIjoiVGVzdCIsIm4iOiJUZXN0...'
set service monitoring telegraf influxdb bucket 'vyos-production'
set service monitoring telegraf influxdb url 'https://eu-central-1-1.aws.cloud2.influxdata.com'
set service monitoring telegraf influxdb port '443'
commit
savePrometheus Client Integration
Конфигурация Prometheus output
Telegraf может экспортировать метрики в формате Prometheus:
# Включить Prometheus client
set service monitoring telegraf prometheus-client
# Адрес и порт для scraping
set service monitoring telegraf prometheus-client listen-address '0.0.0.0'
set service monitoring telegraf prometheus-client port '9273'
# Разрешить доступ с Prometheus server
set service monitoring telegraf prometheus-client allow-from '192.168.1.100/32'
set service monitoring telegraf prometheus-client allow-from '10.0.0.0/8'
commit
saveПараметры Prometheus Client
Listen address:
set service monitoring telegraf prometheus-client listen-address '192.168.1.1'Port (default: 9273):
set service monitoring telegraf prometheus-client port '9273'Allow-from - network ACL:
set service monitoring telegraf prometheus-client allow-from '192.168.1.0/24'
set service monitoring telegraf prometheus-client allow-from '10.0.0.0/8'HTTP Authentication (опционально):
set service monitoring telegraf prometheus-client authentication username 'prometheus'
set service monitoring telegraf prometheus-client authentication password 'secure_password'Metric version:
set service monitoring telegraf prometheus-client metric-version 2Prometheus scrape configuration
Добавьте в prometheus.yml на Prometheus server:
scrape_configs:
- job_name: 'vyos-telegraf'
static_configs:
- targets:
- '192.168.1.1:9273'
labels:
hostname: 'vyos-router-01'
environment: 'production'
# Если настроена аутентификация
basic_auth:
username: 'prometheus'
password: 'secure_password'
scrape_interval: 30s
scrape_timeout: 10sAzure Data Explorer Integration
Конфигурация Azure Data Explorer
# Azure AD authentication
set service monitoring telegraf azure-data-explorer authentication client-id 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
set service monitoring telegraf azure-data-explorer authentication client-secret 'YOUR_CLIENT_SECRET'
set service monitoring telegraf azure-data-explorer authentication tenant-id 'YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYYYY'
# Database configuration
set service monitoring telegraf azure-data-explorer database 'NetworkMetrics'
set service monitoring telegraf azure-data-explorer url 'https://mycluster.region.kusto.windows.net'
# Metrics grouping
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'SingleTable'
commit
saveПараметры Azure Data Explorer
Authentication - Service Principal credentials:
set service monitoring telegraf azure-data-explorer authentication client-id 'app-id'
set service monitoring telegraf azure-data-explorer authentication client-secret 'secret'
set service monitoring telegraf azure-data-explorer authentication tenant-id 'tenant-id'Database:
set service monitoring telegraf azure-data-explorer database 'VyOSMetrics'URL - Kusto cluster URL:
set service monitoring telegraf azure-data-explorer url 'https://vyoscluster.eastus.kusto.windows.net'Metrics Grouping:
# SingleTable - все метрики в одной таблице
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'SingleTable'
# TablePerMetric - отдельная таблица для каждой метрики
set service monitoring telegraf azure-data-explorer metrics-grouping-type 'TablePerMetric'Splunk Integration
Конфигурация Splunk output
# Splunk HEC endpoint
set service monitoring telegraf splunk url 'https://splunk.example.com:8088'
# HEC token
set service monitoring telegraf splunk authentication token 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
# Source и sourcetype
set service monitoring telegraf splunk source 'vyos-router-01'
set service monitoring telegraf splunk sourcetype 'vyos:metrics'
commit
saveПараметры Splunk
URL - HTTP Event Collector endpoint:
set service monitoring telegraf splunk url 'https://splunk-hec.company.com:8088'Authentication token:
set service monitoring telegraf splunk authentication token 'B5A79AAD-D822-46CC-80D1-819F80D7BFB0'Source identifier:
set service monitoring telegraf splunk source 'vyos-gateway'Sourcetype:
set service monitoring telegraf splunk sourcetype 'network:metrics'Index (опционально):
set service monitoring telegraf splunk index 'network_metrics'Loki Integration
Конфигурация Loki output
# Loki server URL
set service monitoring telegraf loki url 'http://loki.example.com'
set service monitoring telegraf loki port '3100'
# Опциональная аутентификация
set service monitoring telegraf loki authentication username 'loki-user'
set service monitoring telegraf loki authentication password 'secure_pass'
# Metric name as label
set service monitoring telegraf loki metric-name-label '__name__'
commit
saveПараметры Loki
URL и Port:
set service monitoring telegraf loki url 'https://loki.grafana.cloud'
set service monitoring telegraf loki port '443'Authentication:
set service monitoring telegraf loki authentication username 'user'
set service monitoring telegraf loki authentication password 'api-key'Labels:
set service monitoring telegraf loki metric-name-label 'metric_name'Prometheus Node Exporter
Node Exporter предоставляет детальные системные метрики для Prometheus.
Базовая конфигурация
# Включить Node Exporter
set service monitoring prometheus node-exporter
# Listen address и port
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'
commit
saveVRF Support
# Запустить в определенном VRF
set service monitoring prometheus node-exporter vrf 'MGMT'Textfile Collector
Для custom метрик через textfile collector:
# Директория для textfile метрик
# По умолчанию: /var/lib/prometheus/node-exporter/Создайте файл метрики:
echo 'custom_metric{label="value"} 123' > /var/lib/prometheus/node-exporter/custom.promМетрики Node Exporter
Node Exporter предоставляет метрики:
- CPU: usage, frequency, thermal throttling
- Memory: total, free, cached, buffers, swap
- Disk: I/O statistics, space usage, inodes
- Network: bytes/packets TX/RX, errors, drops
- Filesystem: mount points, disk usage
- System: load average, uptime, context switches
- Hardware: temperature sensors, fans (если доступно)
Prometheus scrape config для Node Exporter
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets:
- '192.168.1.1:9100'
labels:
hostname: 'vyos-router-01'
role: 'gateway'
site: 'datacenter-1'Prometheus FRR Exporter
FRR Exporter предоставляет метрики маршрутизации Free Range Routing.
Конфигурация FRR Exporter
# Включить FRR Exporter
set service monitoring prometheus frr-exporter
# Listen address и port
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'
# VRF
set service monitoring prometheus frr-exporter vrf 'MGMT'
commit
saveМетрики FRR Exporter
FRR Exporter предоставляет метрики:
- BGP: peers status, prefixes received/advertised, session uptime
- OSPF: neighbors, LSA counts, areas
- RIP: neighbors, routes
- ISIS: adjacencies, LSP counts
- BFD: sessions status
- Route counts: IPv4/IPv6 routes по protocol
- VRF: routing table statistics per VRF
Prometheus scrape config для FRR
scrape_configs:
- job_name: 'frr-exporter'
static_configs:
- targets:
- '192.168.1.1:9342'
labels:
hostname: 'vyos-router-01'
asn: '65001'Prometheus Blackbox Exporter
Blackbox Exporter проверяет доступность внешних сервисов через HTTP, HTTPS, DNS, TCP, ICMP, gRPC.
Базовая конфигурация
# Включить Blackbox Exporter
set service monitoring prometheus blackbox-exporter
# Listen address и port
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'
# VRF
set service monitoring prometheus blackbox-exporter vrf 'MGMT'
commit
saveDNS Module Configuration
# DNS probe module
set service monitoring prometheus blackbox-exporter modules dns name 'dns4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-name 'example.com'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-type 'A'
# DNS сервер для запроса
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-address '8.8.8.8'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-port '53'
# Transport protocol
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' transport 'udp'
# Timeout
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' timeout '5'
commit
saveICMP Module Configuration
# ICMP ping module (IPv4)
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' ttl '64'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' timeout '5'
# ICMP ping module (IPv6)
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' preferred-ip-protocol 'ip6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' timeout '5'
commit
saveHTTP Module Configuration
# HTTP probe
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' timeout '10'
# Expected status codes
# По умолчанию проверяет 2xx коды
# TLS configuration
set service monitoring prometheus blackbox-exporter modules http name 'https_check'
set service monitoring prometheus blackbox-exporter modules http name 'https_check' preferred-ip-protocol 'ip4'
# TLS проверка включена автоматически для https:// URL
commit
saveTCP Module Configuration
# TCP connection probe
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' timeout '5'
commit
savegRPC Module Configuration
# gRPC health check
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health'
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' timeout '5'
# Service name для gRPC health check
set service monitoring prometheus blackbox-exporter modules grpc name 'grpc_health' service 'my-grpc-service'
# TLS для gRPC
# Автоматически включается для grpcs:// scheme
commit
savePrometheus configuration для Blackbox
scrape_configs:
# Blackbox exporter endpoint
- job_name: 'blackbox-exporter'
static_configs:
- targets:
- '192.168.1.1:9115'
# ICMP probes
- job_name: 'blackbox-icmp'
metrics_path: /probe
params:
module: [icmp4]
static_configs:
- targets:
- 8.8.8.8
- 1.1.1.1
- google.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.1:9115
# HTTP probes
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://api.example.com/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.1:9115
# DNS probes
- job_name: 'blackbox-dns'
metrics_path: /probe
params:
module: [dns4]
static_configs:
- targets:
- example.com
- google.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.1:9115
# TCP probes
- job_name: 'blackbox-tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- example.com:443
- smtp.example.com:25
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.1:9115Grafana Dashboards
InfluxDB Dashboard
Grafana Data Source - InfluxDB v2:
Type: InfluxDB
URL: http://influxdb.example.com:8086
Organization: vyos-monitoring
Token: YOUR_INFLUXDB_TOKEN
Default Bucket: vyos-metricsInfluxQL Query Example:
from(bucket: "vyos-metrics")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["_field"] == "usage_idle")
|> aggregateWindow(every: 1m, fn: mean)Dashboard Panels:
- CPU Usage (100 - idle%)
- Memory Usage (used/total * 100)
- Network Traffic (bytes TX/RX per interface)
- Disk I/O (read/write bytes)
- System Load Average
Prometheus Dashboard
Grafana Data Source - Prometheus:
Type: Prometheus
URL: http://prometheus.example.com:9090PromQL Query Examples:
CPU Usage:
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)Memory Usage:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100Network Traffic (TX):
rate(node_network_transmit_bytes_total{device="eth0"}[5m])Network Traffic (RX):
rate(node_network_receive_bytes_total{device="eth0"}[5m])Disk Usage:
(node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs|squashfs|vfat"} - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100BGP Peers (FRR):
frr_bgp_peer_upBGP Prefixes Received:
frr_bgp_peer_prefixes_received_countBlackbox Probe Success:
probe_successHTTP Response Time:
probe_http_duration_secondsDNS Query Time:
probe_dns_lookup_time_secondsRecommended Grafana Dashboards
Node Exporter Full:
- Dashboard ID: 1860
- Import URL: https://grafana.com/grafana/dashboards/1860
Telegraf System Overview:
- Dashboard ID: 928
- Import URL: https://grafana.com/grafana/dashboards/928
Blackbox Exporter:
- Dashboard ID: 7587
- Import URL: https://grafana.com/grafana/dashboards/7587
Custom VyOS Dashboard Panels
System Overview Panel:
- Hostname
- Uptime
- VyOS Version
- System Load
- CPU Temperature (если доступно)
Network Overview Panel:
- Total Bandwidth TX/RX
- Interface Status (Up/Down)
- Packet Errors
- Packet Drops
Routing Panel:
- BGP Sessions Status
- OSPF Neighbors
- Route Counts по Protocol
- Prefix Limits
Security Panel:
- Firewall Dropped Packets
- Connection Tracking Usage
- Failed Login Attempts (из syslog)
Примеры конфигураций
Пример 1: Basic Prometheus Monitoring
# Node Exporter для системных метрик
set service monitoring prometheus node-exporter
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'
# FRR Exporter для метрик маршрутизации
set service monitoring prometheus frr-exporter
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'
# Blackbox для проверки доступности
set service monitoring prometheus blackbox-exporter
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'
# ICMP module
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'
# Firewall для Prometheus server
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9115,9342'
set firewall ipv4 input filter rule 100 protocol tcp
commit
saveПример 2: InfluxDB Cloud Integration
# Telegraf с InfluxDB Cloud
set service monitoring telegraf influxdb authentication organization 'my-company'
set service monitoring telegraf influxdb authentication token 'eyJrIjoiVGVzdCIsIm4iOiJUZXN0...'
set service monitoring telegraf influxdb bucket 'vyos-prod'
set service monitoring telegraf influxdb url 'https://eu-central-1-1.aws.cloud2.influxdata.com'
set service monitoring telegraf influxdb port '443'
commit
saveПример 3: Hybrid Monitoring (Telegraf + Prometheus)
# Telegraf с Prometheus client output
set service monitoring telegraf prometheus-client
set service monitoring telegraf prometheus-client listen-address '0.0.0.0'
set service monitoring telegraf prometheus-client port '9273'
set service monitoring telegraf prometheus-client allow-from '192.168.1.100/32'
# Node Exporter для дополнительных метрик
set service monitoring prometheus node-exporter
set service monitoring prometheus node-exporter listen-address '0.0.0.0'
set service monitoring prometheus node-exporter port '9100'
# FRR Exporter
set service monitoring prometheus frr-exporter
set service monitoring prometheus frr-exporter listen-address '0.0.0.0'
set service monitoring prometheus frr-exporter port '9342'
# Firewall
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9273,9342'
set firewall ipv4 input filter rule 100 protocol tcp
commit
saveПример 4: Enterprise Multi-Output
# InfluxDB для long-term storage
set service monitoring telegraf influxdb authentication organization 'network-ops'
set service monitoring telegraf influxdb authentication token 'INFLUX_TOKEN'
set service monitoring telegraf influxdb bucket 'network-metrics'
set service monitoring telegraf influxdb url 'https://influx.company.com'
set service monitoring telegraf influxdb port '8086'
# Prometheus для real-time alerting
set service monitoring telegraf prometheus-client
set service monitoring telegraf prometheus-client listen-address '10.0.0.1'
set service monitoring telegraf prometheus-client port '9273'
set service monitoring telegraf prometheus-client allow-from '10.0.0.100/32'
# Splunk для logs и SIEM
set service monitoring telegraf splunk url 'https://splunk.company.com:8088'
set service monitoring telegraf splunk authentication token 'SPLUNK_HEC_TOKEN'
set service monitoring telegraf splunk source 'vyos-gateway-01'
set service monitoring telegraf splunk sourcetype 'vyos:metrics'
commit
saveПример 5: Blackbox Probing
# Blackbox Exporter
set service monitoring prometheus blackbox-exporter
set service monitoring prometheus blackbox-exporter listen-address '0.0.0.0'
set service monitoring prometheus blackbox-exporter port '9115'
# ICMP IPv4
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' ttl '64'
# ICMP IPv6
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6'
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp6' preferred-ip-protocol 'ip6'
# DNS проверка (IPv4)
set service monitoring prometheus blackbox-exporter modules dns name 'dns4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-name 'example.com'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' query-type 'A'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' server-address '8.8.8.8'
set service monitoring prometheus blackbox-exporter modules dns name 'dns4' transport 'udp'
# HTTP проверка
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules http name 'http_2xx' timeout '10'
# TCP проверка
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' preferred-ip-protocol 'ip4'
set service monitoring prometheus blackbox-exporter modules tcp name 'tcp_connect' timeout '5'
commit
saveПример 6: VRF Isolation
# Management VRF для мониторинга
set service monitoring prometheus node-exporter vrf 'MGMT'
set service monitoring prometheus node-exporter listen-address '10.0.0.1'
set service monitoring prometheus node-exporter port '9100'
set service monitoring prometheus frr-exporter vrf 'MGMT'
set service monitoring prometheus frr-exporter listen-address '10.0.0.1'
set service monitoring prometheus frr-exporter port '9342'
set service monitoring prometheus blackbox-exporter vrf 'MGMT'
set service monitoring prometheus blackbox-exporter listen-address '10.0.0.1'
set service monitoring prometheus blackbox-exporter port '9115'
commit
saveОперационные команды
Проверка статуса Telegraf
# Service status
show service monitoring telegraf
# Check процесс
run show system processes | grep telegrafПроверка Prometheus Exporters
# Node Exporter metrics
curl http://localhost:9100/metrics
# FRR Exporter metrics
curl http://localhost:9342/metrics
# Blackbox Exporter health
curl http://localhost:9115/healthТестирование Blackbox Probes
# ICMP probe
curl 'http://localhost:9115/probe?module=icmp4&target=8.8.8.8'
# HTTP probe
curl 'http://localhost:9115/probe?module=http_2xx&target=https://example.com'
# DNS probe
curl 'http://localhost:9115/probe?module=dns4&target=example.com'
# TCP probe
curl 'http://localhost:9115/probe?module=tcp_connect&target=example.com:443'Restart Services
# Restart Telegraf
restart monitoring telegraf
# Note: Prometheus exporters restart автоматически при изменении конфигурацииМониторинг и диагностика
Проверка метрик
Node Exporter endpoint:
curl http://localhost:9100/metrics | head -20Фильтр specific metrics:
curl http://localhost:9100/metrics | grep node_cpu
curl http://localhost:9100/metrics | grep node_memory
curl http://localhost:9100/metrics | grep node_networkЛоги
# Telegraf logs
show log | match telegraf
# System logs
show log | match monitoringFirewall проверка
# Проверить правила для monitoring ports
show firewall ipv4 input filter
# Test connectivity с Prometheus server
ping 192.168.1.100Performance
# Проверить CPU usage от exporters
run show system processes | grep -E 'node_exporter|frr_exporter|blackbox'
# Network connections
run netstat -tulpn | grep -E '9100|9115|9273|9342'Troubleshooting
Telegraf не отправляет метрики в InfluxDB
Проблема: Метрики не появляются в InfluxDB.
Причины:
- Неправильный token или organization
- Network connectivity issues
- Firewall блокирует исходящие соединения
- Неправильный bucket name
Диагностика:
# Проверить конфигурацию
show service monitoring telegraf influxdb
# Проверить connectivity
ping influxdb.example.com
# Проверить logs
show log | match telegrafРешение:
# Проверить authentication
set service monitoring telegraf influxdb authentication token 'CORRECT_TOKEN'
# Проверить URL
set service monitoring telegraf influxdb url 'https://correct-url.influxdata.com'
# Commit и restart
commit
restart monitoring telegrafPrometheus не может scrape метрики
Проблема: Prometheus targets показывают “down” status.
Причины:
- Firewall блокирует порты
- Exporter не запущен
- Listen address неправильный
- Network routing issues
Диагностика:
# Проверить exporters
curl http://localhost:9100/metrics
curl http://localhost:9342/metrics
# Проверить listening ports
run netstat -tulpn | grep -E '9100|9342'
# Firewall
show firewall ipv4 input filterРешение:
# Firewall правило
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9342,9115'
set firewall ipv4 input filter rule 100 protocol tcp
# Listen на правильном interface
set service monitoring prometheus node-exporter listen-address '192.168.1.1'
commitBlackbox probe failures
Проблема: Blackbox probes показывают probe_success = 0.
Причины:
- Target недоступен
- Timeout слишком короткий
- Firewall блокирует ICMP/DNS/HTTP
- Неправильная конфигурация module
Диагностика:
# Test probe вручную
curl 'http://localhost:9115/probe?module=icmp4&target=8.8.8.8'
# Проверить connectivity
ping 8.8.8.8
# DNS test
dig @8.8.8.8 example.comРешение:
# Увеличить timeout
set service monitoring prometheus blackbox-exporter modules icmp name 'icmp4' timeout '10'
# Проверить firewall для исходящих ICMP
show firewall ipv4 output filter
commitFRR Exporter не показывает BGP metrics
Проблема: BGP метрики отсутствуют или нулевые.
Причины:
- BGP не настроен
- FRR daemon не запущен
- Exporter не имеет доступа к FRR vtysh
Диагностика:
# Проверить BGP
show ip bgp summary
# Проверить FRR
run show system processes | grep bgpd
# Test FRR exporter
curl http://localhost:9342/metrics | grep bgpРешение:
# Restart FRR exporter
commit
# Exporter перезапустится автоматически
# Проверить FRR configuration
show protocols bgpHigh memory usage от Telegraf
Проблема: Telegraf потребляет много памяти.
Причины:
- Слишком много output plugins
- Buffering issues
- High metrics cardinality
Решение:
- Используйте только необходимые outputs
- Настройте metric filtering
- Увеличьте flush interval
Metrics missing в Grafana
Проблема: Панели в Grafana пустые или показывают “No data”.
Причины:
- Data source неправильно настроен
- Query syntax ошибка
- Time range не содержит данных
- Metric name изменился
Диагностика:
# Проверить что метрики доступны
curl http://192.168.1.1:9100/metrics | grep node_cpu
# Проверить в Grafana Query Inspector
# Grafana → Panel → Query Inspector → QueryРешение:
- Проверить Data Source connectivity в Grafana
- Validate PromQL query syntax
- Adjust time range
- Check metric names в /metrics endpoint
Безопасность
Firewall Protection
# Allow только с monitoring servers
set firewall ipv4 input filter rule 100 action accept
set firewall ipv4 input filter rule 100 source address '192.168.1.100/32'
set firewall ipv4 input filter rule 100 destination port '9100,9115,9273,9342'
set firewall ipv4 input filter rule 100 protocol tcp
# Drop остальные
set firewall ipv4 input filter rule 999 action drop
set firewall ipv4 input filter rule 999 destination port '9100,9115,9273,9342'
set firewall ipv4 input filter rule 999 protocol tcpHTTP Authentication для Prometheus Client
set service monitoring telegraf prometheus-client authentication username 'prometheus'
set service monitoring telegraf prometheus-client authentication password 'SecurePassword123!'VRF Isolation
# Isolate monitoring в management VRF
set service monitoring prometheus node-exporter vrf 'MGMT'
set service monitoring prometheus frr-exporter vrf 'MGMT'
set service monitoring prometheus blackbox-exporter vrf 'MGMT'TLS для Telegraf outputs
Для InfluxDB Cloud и Splunk используйте HTTPS URLs:
set service monitoring telegraf influxdb url 'https://influxdb.example.com'
set service monitoring telegraf splunk url 'https://splunk.example.com:8088'Secrets Management
Никогда не коммитьте tokens в version control:
- Используйте environment variables где возможно
- Rotate tokens регулярно
- Используйте read-only tokens где возможно
Лучшие практики
Use VRF для management traffic:
- Изолировать monitoring в dedicated VRF
- Protect management network
Firewall restrictive rules:
- Allow только specific monitoring servers
- Block public access к exporters
Monitoring redundancy:
- Multiple Prometheus servers
- InfluxDB clustering
- Grafana high availability
Metric retention:
- Short-term в Prometheus (15-30 дней)
- Long-term в InfluxDB (1+ год)
Alerting:
- Configure Prometheus Alertmanager
- Alert на critical метриках:
- CPU > 80%
- Memory > 90%
- Disk > 85%
- Interface down
- BGP session down
Dashboard organization:
- Separate dashboards по функции
- Use folders в Grafana
- Standard naming conventions
Performance:
- Tune scrape intervals (30s для most cases)
- Limit metric cardinality
- Use recording rules для expensive queries
Documentation:
- Document custom metrics
- Maintain dashboard inventory
- Document alerting thresholds
Testing:
- Test queries перед deployment
- Validate alert rules
- Test failover scenarios
Regular reviews:
- Review metrics usage
- Cleanup unused dashboards
- Update alerting rules
- Rotate credentials
Заключение
VyOS предоставляет comprehensive monitoring capabilities через Telegraf и Prometheus exporters, обеспечивая полную visibility в производительность и состояние сетевой инфраструктуры.
Основные возможности:
- Telegraf - универсальный агент для multiple outputs (InfluxDB, Prometheus, Splunk, Azure, Loki)
- Node Exporter - детальные системные метрики (CPU, memory, disk, network)
- FRR Exporter - routing protocol metrics (BGP, OSPF, ISIS)
- Blackbox Exporter - service availability probing (HTTP, DNS, ICMP, TCP)
Integration:
- Seamless integration с Prometheus и Grafana
- Native support для InfluxDB и Splunk
- Cloud-ready (Azure Data Explorer, InfluxDB Cloud)
Рекомендации:
- Используйте Prometheus + Grafana для real-time monitoring и alerting
- Используйте InfluxDB для long-term storage и capacity planning
- Настройте Blackbox Exporter для proactive service monitoring
- Protect exporters с firewall и VRF isolation
Правильная конфигурация мониторинга обеспечивает proactive problem detection, capacity planning, и troubleshooting capabilities для сетевой инфраструктуры на базе VyOS.