pfSense Failover Scenarios - Designing HA Clusters

Designing a resilient pfSense cluster requires careful consideration of network topology, active services, and acceptable downtime. pfSense supports two-node clusters in an active/passive configuration. Each failover scenario carries its own configuration requirements, limitations, and testing procedures. This section covers standard HA architectures, planned and unplanned failover procedures, and integration with additional services.

Active/Passive Topology

Active/passive is the only officially supported HA cluster configuration in pfSense. In this design, one node (master) handles all network traffic while the second node (backup) operates in hot standby, ready to assume the workload when the primary fails.

Operating Principle

The primary node holds all CARP VIPs and processes traffic. The secondary node receives state table updates via pfsync and configuration updates via XMLRPC. When heartbeat signals from the primary are lost, the secondary promotes itself to master for each CARP VIP and begins processing traffic.

Failover time is governed by the Advertising Frequency parameters in the CARP configuration:

ParameterValueImpact on Failover
Base1 (default)Base heartbeat interval in seconds
Skew primary0Minimum delay - primary sends heartbeats first
Skew secondary100Additional delay of 100/256 seconds
Failure detection time~3 x (base + skew/256)Approximately 3 seconds at default settings

At default settings, the backup node detects a primary failure in approximately 3 seconds and assumes the master role. Existing TCP sessions are preserved through pfsync - network clients notice only a brief pause in connectivity.

Standard Layout

                    Internet
                       |
                   [ISP Router]
                       |
              -------- WAN --------
              |                   |
         [Primary]           [Secondary]
         master               backup
              |                   |
              -------- LAN --------
              |                   |
         [LAN Switch]        [LAN Switch]
              |
         [Clients]
              |
         ---- Sync (172.16.1.0/24) ----

In the standard layout, both nodes connect to the same WAN segment and the same LAN segment. The sync interface links the nodes directly. CARP VIPs are assigned on both the WAN and LAN interfaces. Clients use the LAN CARP VIP as their default gateway.

Infrastructure Requirements

ComponentRequirement
LAN switchMulticast support, multiple MAC addresses per port
WAN switchMulticast or unicast CARP support
WAN IP addressesMinimum 3 (primary, secondary, CARP VIP)
LAN IP addressesMinimum 3 (primary, secondary, CARP VIP)
Sync linkDedicated interface with direct connection
pfSense versionIdentical on both nodes

Active/Passive with Multi-WAN

A pfSense cluster with multiple WAN connections provides failover at both the link level and the hardware level. Each WAN interface requires its own set of CARP VIPs.

Dual-WAN Layout

         ISP1              ISP2
          |                  |
     [Router1]          [Router2]
          |                  |
     --- WAN1 ---       --- WAN2 ---
     |          |       |          |
  [Primary]  [Secondary]
     |          |
     --- LAN ---
     |
  [Clients]

Configuration Details

In a multi-WAN HA deployment, CARP VIPs must be created on each WAN interface:

InterfacePrimarySecondaryCARP VIPVHID
WAN1198.51.100.201/24198.51.100.202/24198.51.100.200/24200
WAN2203.0.113.201/24203.0.113.202/24203.0.113.200/24210
LAN192.168.1.2/24192.168.1.3/24192.168.1.1/241

Outbound NAT rules must be configured separately for each WAN interface, specifying the corresponding CARP VIP as the translation address.

Gateway Groups in HA

Multi-WAN gateway groups function correctly in an HA configuration under the following conditions:

  • Gateways must monitor availability using individual node IP addresses, not CARP VIPs
  • Policy routing rules referencing gateway groups are synchronized via XMLRPC
  • After failover to the secondary, gateway groups continue operating with the same priority logic

Warning:

When using gateway groups with Tier settings for load balancing between WAN links, verify that both WAN interfaces are physically reachable from both cluster nodes. If one WAN link is unavailable on the secondary node during failover, all traffic will shift to the remaining link.

HA with IPsec VPN

Integrating IPsec tunnels with an HA cluster requires specific configuration. The key requirement is that all IPsec tunnels must bind to CARP VIPs rather than individual node addresses.

IPsec Configuration for HA

When configuring Phase 1 (IKE), specify the CARP VIP as both the My identifier and the Interface:

ParameterValueDescription
InterfaceWAN CARP VIPBinding to the virtual address
My identifierIP address: 198.51.100.200WAN CARP VIP address
Peer identifierRemote peer IPVPN partner address

During failover, the IPsec tunnel re-establishes automatically because the CARP VIP migrates with the master role. The remote peer continues connecting to the same IP address (CARP VIP), and IKE negotiation proceeds from scratch.

IPsec Recovery Time

Unlike standard TCP/UDP traffic, IPsec tunnels do not survive failover intact. Although pfsync replicates states, the IKE SA (Security Association) requires renegotiation:

PhaseApproximate Time
Failure detection (CARP)~3 seconds
CARP VIP migrationInstantaneous
IKE Phase 1 re-establishment2-5 seconds
IKE Phase 2 re-establishment1-2 seconds
Full tunnel recovery6-10 seconds

To minimize IPsec downtime during failover, configure DPD (Dead Peer Detection) on the remote peer with aggressive timers (e.g., 10-second interval, 3 retries).

Multiple IPsec Tunnels

When multiple IPsec tunnels are present, each must be bound to a CARP VIP. All tunnels are synchronized via XMLRPC when the IPsec checkbox is enabled in the synchronization settings. After failover, each tunnel recovers independently - some may come up faster than others depending on the remote peer’s DPD configuration.

HA with OpenVPN

OpenVPN servers and clients in an HA cluster also require binding to CARP VIPs. OpenVPN behavior during failover differs from IPsec and depends on the transport protocol (UDP or TCP) and tunnel mode (tun or tap).

OpenVPN Server Configuration for HA

When creating an OpenVPN server, specify:

ParameterValue
InterfaceWAN CARP VIP or LAN CARP VIP
Local portStandard port (1194 or custom)

OpenVPN configuration is synchronized via XMLRPC when the OpenVPN checkbox is enabled.

Failover Behavior

ModeProtocolFailover Behavior
tun + UDPUDPClients reconnect automatically via keepalive
tun + TCPTCPClients must reconnect (TCP session is lost)
tap + UDPUDPClients reconnect, L2 state is restored

For optimal HA compatibility, use tun mode with UDP transport. In this configuration, OpenVPN clients automatically detect connectivity loss through the keepalive mechanism and reconnect to the CARP VIP, which is already served by the backup node.

Warning:

OpenVPN certificates must be synchronized via XMLRPC (Certificates, CAs checkbox). If certificates differ between nodes, clients will be unable to connect to the backup node after failover.

HA with DHCP Server

The DHCP server in an HA cluster requires special attention to prevent address conflicts during split-brain scenarios.

Standard Configuration

In the default configuration, the DHCP server is bound to the LAN interface and assigns addresses from a single pool. Clients send requests to the CARP VIP (which serves as their default gateway), and DHCP responses come from the master node.

During failover, the DHCP server on the backup node activates and continues address assignment. Since the DHCP configuration is synchronized via XMLRPC, the backup node uses the same address pool.

Preventing Conflicts

To prevent address duplication during split-brain, use one of the following strategies:

StrategyDescriptionExample
Pool splittingEach node serves a different rangePrimary: .100-.199, Secondary: .200-.249
DHCP Failover PeerISC DHCP built-in failover mechanismAutomatic lease division
Short lease timeMinimize conflict window1-hour lease instead of 24 hours

When splitting pools, disable DHCP Server synchronization in the XMLRPC settings and configure ranges manually on each node.

Static DHCP Mappings

Static DHCP mappings are synchronized via XMLRPC along with the rest of the DHCP configuration. These mappings do not create conflicts during split-brain because each MAC address is bound to a fixed IP address.

Planned Maintenance Failover

Planned failover (maintenance failover) is performed when the primary node requires servicing - pfSense upgrades, hardware replacement, or diagnostics.

Step-by-Step Procedure

  1. Preparation: confirm the secondary node is fully synchronized

    • Check Status > CARP (failover) - all VIPs should show MASTER on the primary and BACKUP on the secondary
    • Verify no synchronization errors appear in the logs
  2. Switch to secondary: on the primary node, navigate to Status > CARP (failover) and click Enter Persistent CARP Maintenance Mode

    • All CARP VIPs transition to BACKUP on the primary
    • The secondary promotes to MASTER for all VIPs
    • The transition takes several seconds
  3. Verification: confirm traffic is being handled by the secondary

    • On the secondary, check Status > CARP (failover) - all VIPs should be in MASTER state
    • Test traffic through the firewall (web access, DNS, VPN)
    • Verify IPsec/OpenVPN tunnel functionality
  4. Primary maintenance: perform the required work on the primary node

    • pfSense upgrade
    • Hardware replacement
    • Diagnostics
  5. Return to primary: on the primary node, exit maintenance mode via Status > CARP (failover) - click Leave Persistent CARP Maintenance Mode

    • The primary resumes MASTER status for all VIPs
    • The secondary returns to BACKUP
  6. Final check: confirm the primary has reclaimed the MASTER role and synchronization is functioning correctly

Warning:

Before any planned failover, create configuration backups of both nodes via Diagnostics > Backup & Restore. This ensures recovery is possible if unexpected issues arise.

Upgrading pfSense in an HA Cluster

When upgrading pfSense in an HA cluster, a specific sequence must be followed:

  1. Create configuration backups of both nodes
  2. Place the primary in maintenance mode (traffic shifts to the secondary)
  3. Upgrade the primary node
  4. Wait for the upgrade and reboot to complete
  5. Confirm the primary has started correctly (do not exit maintenance mode yet)
  6. Place the secondary in maintenance mode on the secondary node (traffic returns to the upgraded primary)
  7. Upgrade the secondary node
  8. Wait for the upgrade and reboot to complete
  9. Exit maintenance mode on both nodes
  10. Verify synchronization and CARP status

The secondary is upgraded second because XMLRPC synchronization may be incompatible between different pfSense versions. Once both nodes run the same version, synchronization resumes normally.

Testing Failover

Regular failover testing is a mandatory practice for production clusters. Testing should be performed when the cluster is first deployed and after every significant configuration change.

Test 1: Planned Failover

Objective: verify maintenance mode operation.

  1. Record the current CARP VIP status on both nodes
  2. Place the primary in maintenance mode
  3. Test service availability (HTTP, DNS, VPN)
  4. Exit maintenance mode on the primary
  5. Confirm the primary has reclaimed MASTER status

Test 2: Interface Failure

Objective: verify automatic failover on link loss.

  1. Disconnect the WAN network cable on the primary node
  2. Observe the CARP VIP transition on WAN
  3. Verify the LAN CARP VIP also transitions (if peer IP monitoring is configured)
  4. Reconnect the cable
  5. Confirm the primary reclaims MASTER status (preemption)

Warning:

When testing interface disconnection, behavior depends on the CARP configuration. By default, pfSense only transitions the CARP VIP on the affected interface. To transition all VIPs when a single interface fails, configure IP monitoring via System > High Avail. Sync by specifying IP addresses to monitor (e.g., the upstream gateway address).

Test 3: Complete Node Failure

Objective: verify behavior when the primary node is powered off.

  1. Power off the primary node
  2. Wait for all CARP VIPs to transition to the secondary (~3 seconds)
  3. Test all service availability from the secondary
  4. Verify existing TCP sessions are preserved
  5. Power on the primary
  6. Confirm the primary reclaims MASTER status and synchronization resumes

Test 4: IPsec/OpenVPN Recovery

Objective: verify VPN tunnel recovery after failover.

  1. Establish IPsec and/or OpenVPN connections through the cluster
  2. Perform a planned failover to the secondary
  3. Record the recovery time for each tunnel
  4. Verify traffic flows through recovered tunnels
  5. Return the primary to active status

Documenting Results

Results of each test should be documented in the following format:

ParameterValue
Test dateYYYY-MM-DD
Test typePlanned failover / Interface failure / Complete failure
Failover timeX seconds
Services affectedHTTP, DNS, VPN, etc.
TCP sessions preservedYes / No
VPN tunnels recoveredYes / No (recovery time)
Issues discoveredDescription

Monitoring HA Status

Continuous cluster state monitoring enables early detection of issues before they impact service availability.

Built-in Monitoring Tools

ToolLocationInformation Provided
CARP StatusStatus > CARP (failover)MASTER/BACKUP status for each VIP
System LogsStatus > System LogsSynchronization errors, CARP events
StatesDiagnostics > StatesCurrent state count
pfTopDiagnostics > pfTopActive connections in real time

External Monitoring

For production clusters, external monitoring should be configured:

  • SNMP: pfSense supports SNMP for monitoring via Zabbix, Nagios, or similar platforms. Monitor CARP VIP status, state count, CPU utilization, and memory usage.
  • Syslog: configure remote syslog forwarding to preserve a history of CARP events and synchronization activity.
  • Wazuh: when pfSense is integrated with Wazuh, CARP event logs can be processed by detection rules to generate alerts on unplanned failovers.

Key Metrics

MetricNormal ValueAlert When
CARP VIP statusMASTER on primary, BACKUP on secondaryAny change without planned failover
State count differenceLess than 1%More than 5% divergence
Last XMLRPC sync timeNo more than 5 minutes agoMore than 15 minutes without sync
Synchronization errors0Any error

HA Limitations in pfSense

When designing an HA cluster, the platform’s architectural limitations must be taken into account.

Active/Active Is Not Supported

pfSense officially supports only the active/passive configuration. Active/active mode (where both nodes simultaneously process traffic with load balancing) is not implemented at the CARP and pfsync level. Attempting to achieve active/active by manually distributing CARP VIPs across nodes leads to asymmetric routing and state loss.

Two-Node Limitation

Only two-node clusters are officially supported. While it is technically possible to add a third node with a higher skew value, XMLRPC synchronization only supports specifying a single peer. Three-node configurations require manual configuration synchronization to the third node.

Layer 2 Dependency

CARP in multicast mode requires both nodes to reside in the same broadcast domain for each interface carrying a CARP VIP. This limits geographic distribution - nodes cannot be placed in separate data centers without establishing Layer 2 connectivity (e.g., via VXLAN or MPLS L2VPN).

Split-Brain

When connectivity between nodes is lost (sync interface failure), both nodes may transition to MASTER state. This results in:

  • Duplicate DHCP address assignments (if pools are not split)
  • CARP VIP MAC address conflicts on the network segment
  • State table inconsistency

To minimize split-brain risk, use a dedicated physical link for the sync interface and monitor its availability.

Migration from Other Platforms

Migrating from Cisco ASA Failover

When transitioning from Cisco ASA Active/Standby failover to pfSense HA, the following architectural differences must be considered:

FeatureCisco ASApfSense
Failover protocolProprietary (LAN-based failover)CARP (OpenBSD)
State synchronizationStateful failover linkpfsync
Configuration syncAutomatic config replicationXMLRPC
Active/ActiveSupported (with contexts)Not supported
Dedicated failover linkFailover interface + State linkSync interface (single for both)
Interface monitoringBuilt-in interface monitoringIP monitoring via CARP
PreemptionConfigurableAutomatic (by skew)

Migration procedure:

  1. Document the current ASA failover configuration (show failover, show running-config)
  2. Plan the pfSense HA addressing scheme (three addresses per interface)
  3. Configure pfSense primary and secondary as documented
  4. Migrate firewall rules, NAT, and VPN to the primary node
  5. Configure XMLRPC synchronization
  6. Test failover before switching production traffic
  7. Cut over production traffic to the pfSense HA cluster

Migrating from FortiGate HA Cluster

FortiGate supports both active/passive and active/active HA. When migrating to pfSense, consider:

FeatureFortiGatepfSense
HA protocolFGCP (FortiGate Clustering Protocol)CARP
Session syncHA heartbeat linkpfsync
Config syncAutomaticXMLRPC
Active/ActiveSupportedNot supported
Cluster > 2 nodesUp to 4 nodes2 nodes only
Session pickupTCP, UDP, ICMPVia pfsync (all pf protocols)
Heartbeat interfaceDedicated HA linkSync interface
Virtual MAC00:09:0f:09:xx:xx00:00:5e:00:01:xx (CARP)

When migrating from a FortiGate active/active configuration, the architecture must be redesigned for the pfSense active/passive model. This may require changes to traffic distribution at the upstream router or switch level.

The migration procedure follows the same pattern as Cisco ASA: documentation, planning, configuration, testing, cutover.

Disaster Recovery

When both cluster nodes fail completely (e.g., due to a power outage in the server room), the recovery procedure depends on the availability of configuration backups.

Recovery with Backups

  1. Power on both nodes
  2. Wait for pfSense to boot on both nodes
  3. Check CARP status - with a correct configuration, the primary should become MASTER
  4. Verify pfsync and XMLRPC synchronization
  5. If the configuration is corrupted, restore from backup via Diagnostics > Backup & Restore

Recovery After Single Node Loss

  1. The backup node automatically assumes the MASTER role
  2. Replace the failed node
  3. Install pfSense and restore the base configuration (hostname, IP addresses, sync interface)
  4. Configure XMLRPC on the primary to synchronize with the new secondary
  5. Trigger a full synchronization via System > High Avail. Sync (click Save)
  6. Verify CARP status and the state table

Related Sections

Last updated on