pfSense Failover Scenarios - Designing HA Clusters

Designing a resilient pfSense cluster requires careful consideration of network topology, active services, and acceptable downtime. pfSense supports two-node clusters in an active/passive configuration. Each failover scenario carries its own configuration requirements, limitations, and testing procedures. This section covers standard HA architectures, planned and unplanned failover procedures, and integration with additional services.

Active/Passive Topology

Active/passive is the only officially supported HA cluster configuration in pfSense. In this design, one node (master) handles all network traffic while the second node (backup) operates in hot standby, ready to assume the workload when the primary fails.

Operating Principle

The primary node holds all CARP VIPs and processes traffic. The secondary node receives state table updates via pfsync and configuration updates via XMLRPC. When heartbeat signals from the primary are lost, the secondary promotes itself to master for each CARP VIP and begins processing traffic.

Failover time is governed by the Advertising Frequency parameters in the CARP configuration:

Parameter	Value	Impact on Failover
Base	1 (default)	Base heartbeat interval in seconds
Skew primary	0	Minimum delay - primary sends heartbeats first
Skew secondary	100	Additional delay of 100/256 seconds
Failure detection time	~3 x (base + skew/256)	Approximately 3 seconds at default settings

At default settings, the backup node detects a primary failure in approximately 3 seconds and assumes the master role. Existing TCP sessions are preserved through pfsync - network clients notice only a brief pause in connectivity.

Standard Layout

                    Internet
                       |
                   [ISP Router]
                       |
              -------- WAN --------
              |                   |
         [Primary]           [Secondary]
         master               backup
              |                   |
              -------- LAN --------
              |                   |
         [LAN Switch]        [LAN Switch]
              |
         [Clients]
              |
         ---- Sync (172.16.1.0/24) ----

In the standard layout, both nodes connect to the same WAN segment and the same LAN segment. The sync interface links the nodes directly. CARP VIPs are assigned on both the WAN and LAN interfaces. Clients use the LAN CARP VIP as their default gateway.

Infrastructure Requirements

Component	Requirement
LAN switch	Multicast support, multiple MAC addresses per port
WAN switch	Multicast or unicast CARP support
WAN IP addresses	Minimum 3 (primary, secondary, CARP VIP)
LAN IP addresses	Minimum 3 (primary, secondary, CARP VIP)
Sync link	Dedicated interface with direct connection
pfSense version	Identical on both nodes

Active/Passive with Multi-WAN

A pfSense cluster with multiple WAN connections provides failover at both the link level and the hardware level. Each WAN interface requires its own set of CARP VIPs.

Dual-WAN Layout

         ISP1              ISP2
          |                  |
     [Router1]          [Router2]
          |                  |
     --- WAN1 ---       --- WAN2 ---
     |          |       |          |
  [Primary]  [Secondary]
     |          |
     --- LAN ---
     |
  [Clients]

Configuration Details

In a multi-WAN HA deployment, CARP VIPs must be created on each WAN interface:

Interface	Primary	Secondary	CARP VIP	VHID
WAN1	198.51.100.201/24	198.51.100.202/24	198.51.100.200/24	200
WAN2	203.0.113.201/24	203.0.113.202/24	203.0.113.200/24	210
LAN	192.168.1.2/24	192.168.1.3/24	192.168.1.1/24	1

Outbound NAT rules must be configured separately for each WAN interface, specifying the corresponding CARP VIP as the translation address.

Gateway Groups in HA

Multi-WAN gateway groups function correctly in an HA configuration under the following conditions:

Gateways must monitor availability using individual node IP addresses, not CARP VIPs
Policy routing rules referencing gateway groups are synchronized via XMLRPC
After failover to the secondary, gateway groups continue operating with the same priority logic

Warning:
When using gateway groups with Tier settings for load balancing between WAN links, verify that both WAN interfaces are physically reachable from both cluster nodes. If one WAN link is unavailable on the secondary node during failover, all traffic will shift to the remaining link.

HA with IPsec VPN

Integrating IPsec tunnels with an HA cluster requires specific configuration. The key requirement is that all IPsec tunnels must bind to CARP VIPs rather than individual node addresses.

IPsec Configuration for HA

When configuring Phase 1 (IKE), specify the CARP VIP as both the My identifier and the Interface:

Parameter	Value	Description
Interface	WAN CARP VIP	Binding to the virtual address
My identifier	IP address: 198.51.100.200	WAN CARP VIP address
Peer identifier	Remote peer IP	VPN partner address

During failover, the IPsec tunnel re-establishes automatically because the CARP VIP migrates with the master role. The remote peer continues connecting to the same IP address (CARP VIP), and IKE negotiation proceeds from scratch.

IPsec Recovery Time

Unlike standard TCP/UDP traffic, IPsec tunnels do not survive failover intact. Although pfsync replicates states, the IKE SA (Security Association) requires renegotiation:

Phase	Approximate Time
Failure detection (CARP)	~3 seconds
CARP VIP migration	Instantaneous
IKE Phase 1 re-establishment	2-5 seconds
IKE Phase 2 re-establishment	1-2 seconds
Full tunnel recovery	6-10 seconds

To minimize IPsec downtime during failover, configure DPD (Dead Peer Detection) on the remote peer with aggressive timers (e.g., 10-second interval, 3 retries).

Multiple IPsec Tunnels

When multiple IPsec tunnels are present, each must be bound to a CARP VIP. All tunnels are synchronized via XMLRPC when the IPsec checkbox is enabled in the synchronization settings. After failover, each tunnel recovers independently - some may come up faster than others depending on the remote peer’s DPD configuration.

HA with OpenVPN

OpenVPN servers and clients in an HA cluster also require binding to CARP VIPs. OpenVPN behavior during failover differs from IPsec and depends on the transport protocol (UDP or TCP) and tunnel mode (tun or tap).

OpenVPN Server Configuration for HA

When creating an OpenVPN server, specify:

Parameter	Value
Interface	WAN CARP VIP or LAN CARP VIP
Local port	Standard port (1194 or custom)

OpenVPN configuration is synchronized via XMLRPC when the OpenVPN checkbox is enabled.

Failover Behavior

Mode	Protocol	Failover Behavior
tun + UDP	UDP	Clients reconnect automatically via keepalive
tun + TCP	TCP	Clients must reconnect (TCP session is lost)
tap + UDP	UDP	Clients reconnect, L2 state is restored

For optimal HA compatibility, use tun mode with UDP transport. In this configuration, OpenVPN clients automatically detect connectivity loss through the keepalive mechanism and reconnect to the CARP VIP, which is already served by the backup node.

Warning:
OpenVPN certificates must be synchronized via XMLRPC (Certificates, CAs checkbox). If certificates differ between nodes, clients will be unable to connect to the backup node after failover.

HA with DHCP Server

The DHCP server in an HA cluster requires special attention to prevent address conflicts during split-brain scenarios.

Standard Configuration

In the default configuration, the DHCP server is bound to the LAN interface and assigns addresses from a single pool. Clients send requests to the CARP VIP (which serves as their default gateway), and DHCP responses come from the master node.

During failover, the DHCP server on the backup node activates and continues address assignment. Since the DHCP configuration is synchronized via XMLRPC, the backup node uses the same address pool.

Preventing Conflicts

To prevent address duplication during split-brain, use one of the following strategies:

Strategy	Description	Example
Pool splitting	Each node serves a different range	Primary: .100-.199, Secondary: .200-.249
DHCP Failover Peer	ISC DHCP built-in failover mechanism	Automatic lease division
Short lease time	Minimize conflict window	1-hour lease instead of 24 hours

When splitting pools, disable DHCP Server synchronization in the XMLRPC settings and configure ranges manually on each node.

Static DHCP Mappings

Static DHCP mappings are synchronized via XMLRPC along with the rest of the DHCP configuration. These mappings do not create conflicts during split-brain because each MAC address is bound to a fixed IP address.

Planned Maintenance Failover

Planned failover (maintenance failover) is performed when the primary node requires servicing - pfSense upgrades, hardware replacement, or diagnostics.

Step-by-Step Procedure

Preparation: confirm the secondary node is fully synchronized
- Check Status > CARP (failover) - all VIPs should show MASTER on the primary and BACKUP on the secondary
- Verify no synchronization errors appear in the logs
Switch to secondary: on the primary node, navigate to Status > CARP (failover) and click Enter Persistent CARP Maintenance Mode
- All CARP VIPs transition to BACKUP on the primary
- The secondary promotes to MASTER for all VIPs
- The transition takes several seconds
Verification: confirm traffic is being handled by the secondary
- On the secondary, check Status > CARP (failover) - all VIPs should be in MASTER state
- Test traffic through the firewall (web access, DNS, VPN)
- Verify IPsec/OpenVPN tunnel functionality
Primary maintenance: perform the required work on the primary node
- pfSense upgrade
- Hardware replacement
- Diagnostics
Return to primary: on the primary node, exit maintenance mode via Status > CARP (failover) - click Leave Persistent CARP Maintenance Mode
- The primary resumes MASTER status for all VIPs
- The secondary returns to BACKUP
Final check: confirm the primary has reclaimed the MASTER role and synchronization is functioning correctly

Warning:
Before any planned failover, create configuration backups of both nodes via Diagnostics > Backup & Restore. This ensures recovery is possible if unexpected issues arise.

Upgrading pfSense in an HA Cluster

When upgrading pfSense in an HA cluster, a specific sequence must be followed:

Create configuration backups of both nodes
Place the primary in maintenance mode (traffic shifts to the secondary)
Upgrade the primary node
Wait for the upgrade and reboot to complete
Confirm the primary has started correctly (do not exit maintenance mode yet)
Place the secondary in maintenance mode on the secondary node (traffic returns to the upgraded primary)
Upgrade the secondary node
Wait for the upgrade and reboot to complete
Exit maintenance mode on both nodes
Verify synchronization and CARP status

The secondary is upgraded second because XMLRPC synchronization may be incompatible between different pfSense versions. Once both nodes run the same version, synchronization resumes normally.

Testing Failover

Regular failover testing is a mandatory practice for production clusters. Testing should be performed when the cluster is first deployed and after every significant configuration change.

Test 1: Planned Failover

Objective: verify maintenance mode operation.

Record the current CARP VIP status on both nodes
Place the primary in maintenance mode
Test service availability (HTTP, DNS, VPN)
Exit maintenance mode on the primary
Confirm the primary has reclaimed MASTER status

Test 2: Interface Failure

Objective: verify automatic failover on link loss.

Disconnect the WAN network cable on the primary node
Observe the CARP VIP transition on WAN
Verify the LAN CARP VIP also transitions (if peer IP monitoring is configured)
Reconnect the cable
Confirm the primary reclaims MASTER status (preemption)

Warning:
When testing interface disconnection, behavior depends on the CARP configuration. By default, pfSense only transitions the CARP VIP on the affected interface. To transition all VIPs when a single interface fails, configure IP monitoring via System > High Avail. Sync by specifying IP addresses to monitor (e.g., the upstream gateway address).

Test 3: Complete Node Failure

Objective: verify behavior when the primary node is powered off.

Power off the primary node
Wait for all CARP VIPs to transition to the secondary (~3 seconds)
Test all service availability from the secondary
Verify existing TCP sessions are preserved
Power on the primary
Confirm the primary reclaims MASTER status and synchronization resumes

Test 4: IPsec/OpenVPN Recovery

Objective: verify VPN tunnel recovery after failover.

Establish IPsec and/or OpenVPN connections through the cluster
Perform a planned failover to the secondary
Record the recovery time for each tunnel
Verify traffic flows through recovered tunnels
Return the primary to active status

Documenting Results

Results of each test should be documented in the following format:

Parameter	Value
Test date	YYYY-MM-DD
Test type	Planned failover / Interface failure / Complete failure
Failover time	X seconds
Services affected	HTTP, DNS, VPN, etc.
TCP sessions preserved	Yes / No
VPN tunnels recovered	Yes / No (recovery time)
Issues discovered	Description

Monitoring HA Status

Continuous cluster state monitoring enables early detection of issues before they impact service availability.

Built-in Monitoring Tools

Tool	Location	Information Provided
CARP Status	Status > CARP (failover)	MASTER/BACKUP status for each VIP
System Logs	Status > System Logs	Synchronization errors, CARP events
States	Diagnostics > States	Current state count
pfTop	Diagnostics > pfTop	Active connections in real time

External Monitoring

For production clusters, external monitoring should be configured:

SNMP: pfSense supports SNMP for monitoring via Zabbix, Nagios, or similar platforms. Monitor CARP VIP status, state count, CPU utilization, and memory usage.
Syslog: configure remote syslog forwarding to preserve a history of CARP events and synchronization activity.
Wazuh: when pfSense is integrated with Wazuh, CARP event logs can be processed by detection rules to generate alerts on unplanned failovers.

Key Metrics

Metric	Normal Value	Alert When
CARP VIP status	MASTER on primary, BACKUP on secondary	Any change without planned failover
State count difference	Less than 1%	More than 5% divergence
Last XMLRPC sync time	No more than 5 minutes ago	More than 15 minutes without sync
Synchronization errors	0	Any error

HA Limitations in pfSense

When designing an HA cluster, the platform’s architectural limitations must be taken into account.

Active/Active Is Not Supported

pfSense officially supports only the active/passive configuration. Active/active mode (where both nodes simultaneously process traffic with load balancing) is not implemented at the CARP and pfsync level. Attempting to achieve active/active by manually distributing CARP VIPs across nodes leads to asymmetric routing and state loss.

Two-Node Limitation

Only two-node clusters are officially supported. While it is technically possible to add a third node with a higher skew value, XMLRPC synchronization only supports specifying a single peer. Three-node configurations require manual configuration synchronization to the third node.

Layer 2 Dependency

CARP in multicast mode requires both nodes to reside in the same broadcast domain for each interface carrying a CARP VIP. This limits geographic distribution - nodes cannot be placed in separate data centers without establishing Layer 2 connectivity (e.g., via VXLAN or MPLS L2VPN).

Split-Brain

When connectivity between nodes is lost (sync interface failure), both nodes may transition to MASTER state. This results in:

Duplicate DHCP address assignments (if pools are not split)
CARP VIP MAC address conflicts on the network segment
State table inconsistency

To minimize split-brain risk, use a dedicated physical link for the sync interface and monitor its availability.

Migration from Other Platforms

Migrating from Cisco ASA Failover

When transitioning from Cisco ASA Active/Standby failover to pfSense HA, the following architectural differences must be considered:

Feature	Cisco ASA	pfSense
Failover protocol	Proprietary (LAN-based failover)	CARP (OpenBSD)
State synchronization	Stateful failover link	pfsync
Configuration sync	Automatic config replication	XMLRPC
Active/Active	Supported (with contexts)	Not supported
Dedicated failover link	Failover interface + State link	Sync interface (single for both)
Interface monitoring	Built-in interface monitoring	IP monitoring via CARP
Preemption	Configurable	Automatic (by skew)

Migration procedure:

Document the current ASA failover configuration (show failover, show running-config)
Plan the pfSense HA addressing scheme (three addresses per interface)
Configure pfSense primary and secondary as documented
Migrate firewall rules, NAT, and VPN to the primary node
Configure XMLRPC synchronization
Test failover before switching production traffic
Cut over production traffic to the pfSense HA cluster

Migrating from FortiGate HA Cluster

FortiGate supports both active/passive and active/active HA. When migrating to pfSense, consider:

Feature	FortiGate	pfSense
HA protocol	FGCP (FortiGate Clustering Protocol)	CARP
Session sync	HA heartbeat link	pfsync
Config sync	Automatic	XMLRPC
Active/Active	Supported	Not supported
Cluster > 2 nodes	Up to 4 nodes	2 nodes only
Session pickup	TCP, UDP, ICMP	Via pfsync (all pf protocols)
Heartbeat interface	Dedicated HA link	Sync interface
Virtual MAC	00:09:0f:09:xx:xx	00:00:5e:00:01:xx (CARP)

When migrating from a FortiGate active/active configuration, the architecture must be redesigned for the pfSense active/passive model. This may require changes to traffic distribution at the upstream router or switch level.

The migration procedure follows the same pattern as Cisco ASA: documentation, planning, configuration, testing, cutover.

Disaster Recovery

When both cluster nodes fail completely (e.g., due to a power outage in the server room), the recovery procedure depends on the availability of configuration backups.

Recovery with Backups

Power on both nodes
Wait for pfSense to boot on both nodes
Check CARP status - with a correct configuration, the primary should become MASTER
Verify pfsync and XMLRPC synchronization
If the configuration is corrupted, restore from backup via Diagnostics > Backup & Restore

Recovery After Single Node Loss

The backup node automatically assumes the MASTER role
Replace the failed node
Install pfSense and restore the base configuration (hostname, IP addresses, sync interface)
Configure XMLRPC on the primary to synchronize with the new secondary
Trigger a full synchronization via System > High Avail. Sync (click Save)
Verify CARP status and the state table

Related Sections

CARP and Virtual IPs - CARP VIP, VHID, and Advertising Frequency configuration
Configuration Synchronization - detailed pfsync and XMLRPC setup
pfSense VPN - IPsec and OpenVPN configuration with CARP VIPs
pfSense NAT - outbound NAT with CARP VIPs in multi-WAN configurations
Multi-WAN - gateway groups and policy routing in HA context

Last updated on 7, April 2026

pfSense Configuration Sync - pfsync and XMLRPC