pfSense Multi-WAN Failover - Automatic Link Switchover
Failover in the Multi-WAN context is the automatic redirection of traffic from a primary internet link to a backup link when the primary is detected as down. pfSense implements failover through Gateway Groups with gateways separated into priority levels (tiers). Gateways on lower-numbered tiers are preferred, and gateways on higher-numbered tiers activate only when all gateways on the previous tier become unavailable.
Unlike load balancing, where gateways operate in parallel on the same tier, a failover configuration assigns gateways to different tiers, forming a priority chain. The backup link remains in hot standby and accepts traffic only upon primary link failure.
Failover vs. Load Balancing
Both mechanisms use Gateway Groups but differ in tier assignment logic:
| Characteristic | Load Balancing | Failover |
|---|---|---|
| Tier assignment | All gateways on the same tier | Gateways on different tiers |
| Link utilization | Simultaneous | Sequential (by priority) |
| Backup link | None (all active) | Activates on primary failure |
| Link load | Distributed | Concentrated on primary |
| Switchover | Automatic (failed gateway excluded) | Automatic (backup activated) |
A combined configuration merges both approaches. For example, two links on Tier 1 provide load balancing while a third link on Tier 2 serves as backup for both:
| Gateway | Tier | Role |
|---|---|---|
| WAN1_DHCP | 1 | Primary, load balanced |
| WAN2_DHCP | 1 | Primary, load balanced |
| WAN3_DHCP | 2 | Backup |
In this configuration, traffic is balanced between WAN1 and WAN2. If WAN1 fails, all traffic shifts to WAN2. If both primary links fail, WAN3 activates.
Gateway Monitoring for Failover
Failover speed and accuracy depend directly on gateway monitoring settings. Aggressive settings enable rapid switchover but increase the risk of false positives. Conservative settings reduce false positive risk but increase failure detection time.
Recommended Monitoring Parameters
For a typical failover configuration, the following parameters are recommended:
| Parameter | Value | Rationale |
|---|---|---|
| Monitor IP | Public DNS (8.8.8.8, 1.1.1.1) | Tests end-to-end reachability, not just the ISP gateway |
| Probe Interval | 1 second | Rapid failure detection |
| Loss Interval | 2000 ms | Sufficient timeout for high-latency links |
| Time Period | 30 seconds | Shortened averaging window for faster response |
| High Latency | 400 ms | Warning threshold for degradation |
| High Loss | 15% | Warning threshold for packet loss |
| Down | 10% | Threshold for switchover to backup link |
Choosing a Monitor IP
The Monitor IP determines what the monitoring system actually tests:
| Monitor IP | What Is Tested | Advantages | Disadvantages |
|---|---|---|---|
| ISP gateway IP | Nearest hop availability | Minimal latency, fast response | Does not detect upstream failures |
| Public DNS (8.8.8.8) | End-to-end internet reachability | Detects any failure in the chain | Additional latency |
| Own VPS | End-to-end reachability to target resource | Full control | Requires additional infrastructure |
Warning:
Each gateway must use a unique Monitor IP. If both gateways monitor the same address, a failure of that address (rather than the link) causes both gateways to transition to Down status simultaneously, resulting in complete connectivity loss.
dpinger Configuration
The dpinger daemon performs gateway monitoring. Its operation can be verified from the command line:
# Check dpinger processes
ps aux | grep dpinger
# View gateway status in real time
/usr/local/sbin/pfSsh.php playback gatewaystatusdpinger logs are written to the system journal and accessible under Status > System Logs > Gateways.
Creating a Failover Gateway Group
Step-by-Step Configuration
- Navigate to System > Routing > Gateway Groups.
- Click Add.
- Configure the group parameters:
| Parameter | Value | Description |
|---|---|---|
| Group Name | WAN_Failover | Group name |
| WAN1_DHCP | Tier 1 | Primary link |
| WAN2_DHCP | Tier 2 | Backup link |
| Trigger Level | Packet Loss or High Latency | Switchover condition |
| Description | Primary WAN1, failover to WAN2 | Purpose description |
- Click Save, then Apply Changes.
Selecting the Trigger Level for Failover
For failover configurations, the Trigger Level determines switchover sensitivity:
| Trigger Level | Time to Switchover | False Positive Risk | Recommended For |
|---|---|---|---|
| Member Down | Maximum | Minimal | Non-critical services |
| Packet Loss | Medium | Medium | Most configurations |
| High Latency | Minimum | High | Latency-sensitive services |
| Packet Loss or High Latency | Minimum | Maximum | Critical services with reliable links |
For production configurations, Packet Loss is recommended - it provides a balance between response speed and resilience to transient fluctuations.
Applying the Gateway Group to Firewall Rules
After creating the failover Gateway Group, it must be assigned to a firewall rule.
Rule Configuration
- Navigate to Firewall > Rules > LAN.
- Create or edit a rule for outbound traffic.
- Under Extra Options, click Display Advanced.
- In the Gateway field, select
WAN_Failover. - Click Save, then Apply Changes.
Different traffic types can use separate rules with distinct failover groups. For example, critical business traffic can use a group with aggressive monitoring settings while general traffic uses a group with conservative settings.
DNS with Failover
When switching to a backup link, DNS queries routed through the primary link stop receiving responses. This can cause name resolution delays until DNS switches to the backup path.
DNS Configuration for Failover
- Under System > General Setup, configure DNS servers for each WAN:
| DNS Server | IP Address | Gateway |
|---|---|---|
| DNS Server 1 | 8.8.8.8 | WAN1_DHCP |
| DNS Server 2 | 8.8.4.4 | WAN1_DHCP |
| DNS Server 3 | 1.1.1.1 | WAN2_DHCP |
| DNS Server 4 | 1.0.0.1 | WAN2_DHCP |
- Under Services > DNS Resolver, enable DNS Query Forwarding to use the configured upstream servers.
- pfSense automatically stops using DNS servers bound to an unavailable gateway and switches to servers bound to the available link.
Warning:
Without DNS Query Forwarding, the DNS Resolver performs recursive queries directly to root servers. In this case, DNS queries are routed through the default gateway. During failover, the default gateway changes automatically, but the transition period may cause brief name resolution delays.
Testing Failover
Before deploying the configuration to production, the switchover must be verified.
Method 1: Physical Disconnection
- Open Status > Gateways to observe gateway statuses.
- Physically disconnect the primary WAN interface cable.
- Observe the primary gateway status change:
- Status should transition to Warning, then Down.
- Time to status change depends on the Time Period and threshold settings.
- Verify that traffic has switched to the backup link:
- Open an external IP check service (such as ifconfig.me).
- The displayed IP should match the backup WAN external address.
- Reconnect the cable and confirm the return to the primary link.
Method 2: Blocking the Monitor IP
- Create a temporary firewall rule on the WAN interface blocking ICMP traffic to the primary gateway Monitor IP.
- Observe the switchover under Status > Gateways.
- Remove the temporary rule after verification.
This method tests failover without physical intervention.
Method 3: Threshold Adjustment
- Temporarily set the Down threshold to 0% for the primary gateway.
- Any packet loss triggers an immediate switchover.
- Restore original threshold values after verification.
Verification Checklist
| Check | Expected Result |
|---|---|
| Switchover time | Depends on Time Period and thresholds (typically 30-60 seconds) |
| DNS resolution | Continues working through the backup link |
| Active TCP connections | Interrupted (expected behavior) |
| New connections | Established through the backup link |
| Return to primary | Automatic after primary gateway recovery |
| Logs | Switchover entries in Status > System Logs > Gateways |
Recovery Behavior
After the primary gateway becomes reachable again, pfSense automatically returns traffic to the primary link (Tier 1). The recovery sequence:
- dpinger detects that the primary gateway is responding to probe packets again.
- After accumulating sufficient successful responses within the Time Period, the gateway status changes to Online.
- pfSense routes new connections through the primary link.
- Active connections through the backup link continue until they complete naturally.
Recovery time is governed by the Time Period parameter. At the default value (60 seconds), return to the primary link occurs approximately 60-90 seconds after connectivity is restored.
Warning:
Frequent switching between links (flapping) indicates primary link instability. If the primary link recovers and fails in rapid succession, active connections are interrupted with each switchover. In such cases, increase the Time Period or threshold values to reduce sensitivity.
VPN with Failover
IPsec
When using IPsec VPN with failover, the following considerations apply:
- An IPsec tunnel is bound to a specific WAN interface. When that interface fails, the tunnel drops.
- For automatic IPsec recovery through the backup WAN, create a separate Phase 1 configuration for each WAN interface.
- The remote peer must also be configured to accept connections from both IP addresses.
IPsec failover configuration:
| Parameter | WAN1 Phase 1 | WAN2 Phase 1 |
|---|---|---|
| Interface | WAN1 | WAN2 |
| Remote Gateway | peer-ip-address | peer-ip-address |
| Phase 2 Subnet | 192.168.1.0/24 | 192.168.1.0/24 |
When WAN1 fails, the IPsec tunnel through WAN1 drops, and the tunnel through WAN2 establishes automatically (when DPD - Dead Peer Detection - is enabled).
OpenVPN
OpenVPN supports several failover approaches:
- Gateway Group assignment to OpenVPN interface - the OpenVPN server or client binds to a specific WAN. For failover, create two OpenVPN instances on different WANs.
- Floating IP - when using a CARP VIP as the OpenVPN server address, switchover occurs at the CARP level.
- Client-side failover - in the OpenVPN client configuration, specify multiple
remotedirectives with different servers. The client automatically connects to the next server when the connection drops.
Troubleshooting
Failover Not Triggering
Symptom: the primary link is down, but traffic does not switch to the backup.
Checks:
- Gateway status - check Status > Gateways. If the primary gateway still shows Online, the issue is in monitoring settings:
- Verify the Monitor IP is correct and unreachable through the failed link.
- Confirm the Monitor IP is not reachable through the backup link (routing loop).
- Gateway Group - verify gateways are assigned to the correct tiers (primary on Tier 1, backup on Tier 2).
- Firewall rule - confirm the Gateway Group is assigned in the firewall rule.
- Trigger Level - verify the Trigger Level matches the failure type (for example, Member Down does not react to high loss, only to complete failure).
Slow Failure Detection
Symptom: switchover to the backup link takes several minutes.
Causes and solutions:
- Time Period too large - reduce to 30 seconds. dpinger requires data accumulation over the entire Time Period before changing status.
- Thresholds too high - if the Down threshold is set to 50%, the gateway must lose 50% of packets before being marked as Down. The recommended value is 10%.
- Monitor IP on ISP gateway - the ISP gateway may continue responding despite upstream connectivity loss. Replace with a public DNS server.
False Positives
Symptom: failover triggers while the primary link is operational.
Causes and solutions:
- Monitor IP overloaded - if the Monitor IP (such as 8.8.8.8) temporarily stops responding due to rate limiting or congestion, dpinger records losses. Use a less congested Monitor IP or increase thresholds.
- Time Period too small - transient network fluctuations cause switchover. Increase the Time Period to 60-120 seconds.
- Probe Interval too aggressive - at 1-second intervals with an unstable link, losses accumulate quickly. Increase to 2-3 seconds.
Flapping (Cyclic Switching)
Symptom: traffic constantly switches between primary and backup links.
Cause: the primary link is unstable - it periodically recovers and fails.
Solution:
- Increase the Time Period to 120-180 seconds for a longer stability window before returning.
- Increase the Down threshold to reduce sensitivity.
- If primary link instability cannot be resolved, consider transitioning to a load balancing configuration where both links are used simultaneously.
Monitoring and Alerting
pfSense logs gateway switchover events in the system journal. For timely incident response, external monitoring is recommended.
System Logs
Failover events are recorded under Status > System Logs > Gateways. Typical entries:
dpinger: WAN1_DHCP 8.8.8.8: Alarm latency 0us stddev 0us loss 100%
dpinger: WAN1_DHCP 8.8.8.8: Clear latency 5432us stddev 312us loss 0%SNMP Monitoring
pfSense supports SNMP for external monitoring. Gateway status is available through SNMP OIDs. SNMP configuration is under Services > SNMP.
Syslog
To send logs to an external syslog server (Wazuh, ELK, Graylog), configure remote syslog under Status > System Logs > Settings, in the Remote Logging Options section.
Related Sections
- Multi-WAN Load Balancing - configuring simultaneous use of multiple links
- Outbound NAT - configuring NAT for correct failover operation
- IPsec VPN - configuring IPsec tunnels with Multi-WAN failover