pfSense Multi-WAN Failover - Automatic Link Switchover

Failover in the Multi-WAN context is the automatic redirection of traffic from a primary internet link to a backup link when the primary is detected as down. pfSense implements failover through Gateway Groups with gateways separated into priority levels (tiers). Gateways on lower-numbered tiers are preferred, and gateways on higher-numbered tiers activate only when all gateways on the previous tier become unavailable.

Unlike load balancing, where gateways operate in parallel on the same tier, a failover configuration assigns gateways to different tiers, forming a priority chain. The backup link remains in hot standby and accepts traffic only upon primary link failure.

Failover vs. Load Balancing

Both mechanisms use Gateway Groups but differ in tier assignment logic:

CharacteristicLoad BalancingFailover
Tier assignmentAll gateways on the same tierGateways on different tiers
Link utilizationSimultaneousSequential (by priority)
Backup linkNone (all active)Activates on primary failure
Link loadDistributedConcentrated on primary
SwitchoverAutomatic (failed gateway excluded)Automatic (backup activated)

A combined configuration merges both approaches. For example, two links on Tier 1 provide load balancing while a third link on Tier 2 serves as backup for both:

GatewayTierRole
WAN1_DHCP1Primary, load balanced
WAN2_DHCP1Primary, load balanced
WAN3_DHCP2Backup

In this configuration, traffic is balanced between WAN1 and WAN2. If WAN1 fails, all traffic shifts to WAN2. If both primary links fail, WAN3 activates.

Gateway Monitoring for Failover

Failover speed and accuracy depend directly on gateway monitoring settings. Aggressive settings enable rapid switchover but increase the risk of false positives. Conservative settings reduce false positive risk but increase failure detection time.

Recommended Monitoring Parameters

For a typical failover configuration, the following parameters are recommended:

ParameterValueRationale
Monitor IPPublic DNS (8.8.8.8, 1.1.1.1)Tests end-to-end reachability, not just the ISP gateway
Probe Interval1 secondRapid failure detection
Loss Interval2000 msSufficient timeout for high-latency links
Time Period30 secondsShortened averaging window for faster response
High Latency400 msWarning threshold for degradation
High Loss15%Warning threshold for packet loss
Down10%Threshold for switchover to backup link

Choosing a Monitor IP

The Monitor IP determines what the monitoring system actually tests:

Monitor IPWhat Is TestedAdvantagesDisadvantages
ISP gateway IPNearest hop availabilityMinimal latency, fast responseDoes not detect upstream failures
Public DNS (8.8.8.8)End-to-end internet reachabilityDetects any failure in the chainAdditional latency
Own VPSEnd-to-end reachability to target resourceFull controlRequires additional infrastructure

Warning:

Each gateway must use a unique Monitor IP. If both gateways monitor the same address, a failure of that address (rather than the link) causes both gateways to transition to Down status simultaneously, resulting in complete connectivity loss.

dpinger Configuration

The dpinger daemon performs gateway monitoring. Its operation can be verified from the command line:

# Check dpinger processes
ps aux | grep dpinger

# View gateway status in real time
/usr/local/sbin/pfSsh.php playback gatewaystatus

dpinger logs are written to the system journal and accessible under Status > System Logs > Gateways.

Creating a Failover Gateway Group

Step-by-Step Configuration

  1. Navigate to System > Routing > Gateway Groups.
  2. Click Add.
  3. Configure the group parameters:
ParameterValueDescription
Group NameWAN_FailoverGroup name
WAN1_DHCPTier 1Primary link
WAN2_DHCPTier 2Backup link
Trigger LevelPacket Loss or High LatencySwitchover condition
DescriptionPrimary WAN1, failover to WAN2Purpose description
  1. Click Save, then Apply Changes.

Selecting the Trigger Level for Failover

For failover configurations, the Trigger Level determines switchover sensitivity:

Trigger LevelTime to SwitchoverFalse Positive RiskRecommended For
Member DownMaximumMinimalNon-critical services
Packet LossMediumMediumMost configurations
High LatencyMinimumHighLatency-sensitive services
Packet Loss or High LatencyMinimumMaximumCritical services with reliable links

For production configurations, Packet Loss is recommended - it provides a balance between response speed and resilience to transient fluctuations.

Applying the Gateway Group to Firewall Rules

After creating the failover Gateway Group, it must be assigned to a firewall rule.

Rule Configuration

  1. Navigate to Firewall > Rules > LAN.
  2. Create or edit a rule for outbound traffic.
  3. Under Extra Options, click Display Advanced.
  4. In the Gateway field, select WAN_Failover.
  5. Click Save, then Apply Changes.

Different traffic types can use separate rules with distinct failover groups. For example, critical business traffic can use a group with aggressive monitoring settings while general traffic uses a group with conservative settings.

DNS with Failover

When switching to a backup link, DNS queries routed through the primary link stop receiving responses. This can cause name resolution delays until DNS switches to the backup path.

DNS Configuration for Failover

  1. Under System > General Setup, configure DNS servers for each WAN:
DNS ServerIP AddressGateway
DNS Server 18.8.8.8WAN1_DHCP
DNS Server 28.8.4.4WAN1_DHCP
DNS Server 31.1.1.1WAN2_DHCP
DNS Server 41.0.0.1WAN2_DHCP
  1. Under Services > DNS Resolver, enable DNS Query Forwarding to use the configured upstream servers.
  2. pfSense automatically stops using DNS servers bound to an unavailable gateway and switches to servers bound to the available link.

Warning:

Without DNS Query Forwarding, the DNS Resolver performs recursive queries directly to root servers. In this case, DNS queries are routed through the default gateway. During failover, the default gateway changes automatically, but the transition period may cause brief name resolution delays.

Testing Failover

Before deploying the configuration to production, the switchover must be verified.

Method 1: Physical Disconnection

  1. Open Status > Gateways to observe gateway statuses.
  2. Physically disconnect the primary WAN interface cable.
  3. Observe the primary gateway status change:
    • Status should transition to Warning, then Down.
    • Time to status change depends on the Time Period and threshold settings.
  4. Verify that traffic has switched to the backup link:
    • Open an external IP check service (such as ifconfig.me).
    • The displayed IP should match the backup WAN external address.
  5. Reconnect the cable and confirm the return to the primary link.

Method 2: Blocking the Monitor IP

  1. Create a temporary firewall rule on the WAN interface blocking ICMP traffic to the primary gateway Monitor IP.
  2. Observe the switchover under Status > Gateways.
  3. Remove the temporary rule after verification.

This method tests failover without physical intervention.

Method 3: Threshold Adjustment

  1. Temporarily set the Down threshold to 0% for the primary gateway.
  2. Any packet loss triggers an immediate switchover.
  3. Restore original threshold values after verification.

Verification Checklist

CheckExpected Result
Switchover timeDepends on Time Period and thresholds (typically 30-60 seconds)
DNS resolutionContinues working through the backup link
Active TCP connectionsInterrupted (expected behavior)
New connectionsEstablished through the backup link
Return to primaryAutomatic after primary gateway recovery
LogsSwitchover entries in Status > System Logs > Gateways

Recovery Behavior

After the primary gateway becomes reachable again, pfSense automatically returns traffic to the primary link (Tier 1). The recovery sequence:

  1. dpinger detects that the primary gateway is responding to probe packets again.
  2. After accumulating sufficient successful responses within the Time Period, the gateway status changes to Online.
  3. pfSense routes new connections through the primary link.
  4. Active connections through the backup link continue until they complete naturally.

Recovery time is governed by the Time Period parameter. At the default value (60 seconds), return to the primary link occurs approximately 60-90 seconds after connectivity is restored.

Warning:

Frequent switching between links (flapping) indicates primary link instability. If the primary link recovers and fails in rapid succession, active connections are interrupted with each switchover. In such cases, increase the Time Period or threshold values to reduce sensitivity.

VPN with Failover

IPsec

When using IPsec VPN with failover, the following considerations apply:

  • An IPsec tunnel is bound to a specific WAN interface. When that interface fails, the tunnel drops.
  • For automatic IPsec recovery through the backup WAN, create a separate Phase 1 configuration for each WAN interface.
  • The remote peer must also be configured to accept connections from both IP addresses.

IPsec failover configuration:

ParameterWAN1 Phase 1WAN2 Phase 1
InterfaceWAN1WAN2
Remote Gatewaypeer-ip-addresspeer-ip-address
Phase 2 Subnet192.168.1.0/24192.168.1.0/24

When WAN1 fails, the IPsec tunnel through WAN1 drops, and the tunnel through WAN2 establishes automatically (when DPD - Dead Peer Detection - is enabled).

OpenVPN

OpenVPN supports several failover approaches:

  1. Gateway Group assignment to OpenVPN interface - the OpenVPN server or client binds to a specific WAN. For failover, create two OpenVPN instances on different WANs.
  2. Floating IP - when using a CARP VIP as the OpenVPN server address, switchover occurs at the CARP level.
  3. Client-side failover - in the OpenVPN client configuration, specify multiple remote directives with different servers. The client automatically connects to the next server when the connection drops.

Troubleshooting

Failover Not Triggering

Symptom: the primary link is down, but traffic does not switch to the backup.

Checks:

  1. Gateway status - check Status > Gateways. If the primary gateway still shows Online, the issue is in monitoring settings:
    • Verify the Monitor IP is correct and unreachable through the failed link.
    • Confirm the Monitor IP is not reachable through the backup link (routing loop).
  2. Gateway Group - verify gateways are assigned to the correct tiers (primary on Tier 1, backup on Tier 2).
  3. Firewall rule - confirm the Gateway Group is assigned in the firewall rule.
  4. Trigger Level - verify the Trigger Level matches the failure type (for example, Member Down does not react to high loss, only to complete failure).

Slow Failure Detection

Symptom: switchover to the backup link takes several minutes.

Causes and solutions:

  1. Time Period too large - reduce to 30 seconds. dpinger requires data accumulation over the entire Time Period before changing status.
  2. Thresholds too high - if the Down threshold is set to 50%, the gateway must lose 50% of packets before being marked as Down. The recommended value is 10%.
  3. Monitor IP on ISP gateway - the ISP gateway may continue responding despite upstream connectivity loss. Replace with a public DNS server.

False Positives

Symptom: failover triggers while the primary link is operational.

Causes and solutions:

  1. Monitor IP overloaded - if the Monitor IP (such as 8.8.8.8) temporarily stops responding due to rate limiting or congestion, dpinger records losses. Use a less congested Monitor IP or increase thresholds.
  2. Time Period too small - transient network fluctuations cause switchover. Increase the Time Period to 60-120 seconds.
  3. Probe Interval too aggressive - at 1-second intervals with an unstable link, losses accumulate quickly. Increase to 2-3 seconds.

Flapping (Cyclic Switching)

Symptom: traffic constantly switches between primary and backup links.

Cause: the primary link is unstable - it periodically recovers and fails.

Solution:

  1. Increase the Time Period to 120-180 seconds for a longer stability window before returning.
  2. Increase the Down threshold to reduce sensitivity.
  3. If primary link instability cannot be resolved, consider transitioning to a load balancing configuration where both links are used simultaneously.

Monitoring and Alerting

pfSense logs gateway switchover events in the system journal. For timely incident response, external monitoring is recommended.

System Logs

Failover events are recorded under Status > System Logs > Gateways. Typical entries:

dpinger: WAN1_DHCP 8.8.8.8: Alarm latency 0us stddev 0us loss 100%
dpinger: WAN1_DHCP 8.8.8.8: Clear latency 5432us stddev 312us loss 0%

SNMP Monitoring

pfSense supports SNMP for external monitoring. Gateway status is available through SNMP OIDs. SNMP configuration is under Services > SNMP.

Syslog

To send logs to an external syslog server (Wazuh, ELK, Graylog), configure remote syslog under Status > System Logs > Settings, in the Remote Logging Options section.

Related Sections

Last updated on