pfSense Failover Scenarios - Designing HA Clusters
Designing a resilient pfSense cluster requires careful consideration of network topology, active services, and acceptable downtime. pfSense supports two-node clusters in an active/passive configuration. Each failover scenario carries its own configuration requirements, limitations, and testing procedures. This section covers standard HA architectures, planned and unplanned failover procedures, and integration with additional services.
Active/Passive Topology
Active/passive is the only officially supported HA cluster configuration in pfSense. In this design, one node (master) handles all network traffic while the second node (backup) operates in hot standby, ready to assume the workload when the primary fails.
Operating Principle
The primary node holds all CARP VIPs and processes traffic. The secondary node receives state table updates via pfsync and configuration updates via XMLRPC. When heartbeat signals from the primary are lost, the secondary promotes itself to master for each CARP VIP and begins processing traffic.
Failover time is governed by the Advertising Frequency parameters in the CARP configuration:
| Parameter | Value | Impact on Failover |
|---|---|---|
| Base | 1 (default) | Base heartbeat interval in seconds |
| Skew primary | 0 | Minimum delay - primary sends heartbeats first |
| Skew secondary | 100 | Additional delay of 100/256 seconds |
| Failure detection time | ~3 x (base + skew/256) | Approximately 3 seconds at default settings |
At default settings, the backup node detects a primary failure in approximately 3 seconds and assumes the master role. Existing TCP sessions are preserved through pfsync - network clients notice only a brief pause in connectivity.
Standard Layout
Internet
|
[ISP Router]
|
-------- WAN --------
| |
[Primary] [Secondary]
master backup
| |
-------- LAN --------
| |
[LAN Switch] [LAN Switch]
|
[Clients]
|
---- Sync (172.16.1.0/24) ----In the standard layout, both nodes connect to the same WAN segment and the same LAN segment. The sync interface links the nodes directly. CARP VIPs are assigned on both the WAN and LAN interfaces. Clients use the LAN CARP VIP as their default gateway.
Infrastructure Requirements
| Component | Requirement |
|---|---|
| LAN switch | Multicast support, multiple MAC addresses per port |
| WAN switch | Multicast or unicast CARP support |
| WAN IP addresses | Minimum 3 (primary, secondary, CARP VIP) |
| LAN IP addresses | Minimum 3 (primary, secondary, CARP VIP) |
| Sync link | Dedicated interface with direct connection |
| pfSense version | Identical on both nodes |
Active/Passive with Multi-WAN
A pfSense cluster with multiple WAN connections provides failover at both the link level and the hardware level. Each WAN interface requires its own set of CARP VIPs.
Dual-WAN Layout
ISP1 ISP2
| |
[Router1] [Router2]
| |
--- WAN1 --- --- WAN2 ---
| | | |
[Primary] [Secondary]
| |
--- LAN ---
|
[Clients]Configuration Details
In a multi-WAN HA deployment, CARP VIPs must be created on each WAN interface:
| Interface | Primary | Secondary | CARP VIP | VHID |
|---|---|---|---|---|
| WAN1 | 198.51.100.201/24 | 198.51.100.202/24 | 198.51.100.200/24 | 200 |
| WAN2 | 203.0.113.201/24 | 203.0.113.202/24 | 203.0.113.200/24 | 210 |
| LAN | 192.168.1.2/24 | 192.168.1.3/24 | 192.168.1.1/24 | 1 |
Outbound NAT rules must be configured separately for each WAN interface, specifying the corresponding CARP VIP as the translation address.
Gateway Groups in HA
Multi-WAN gateway groups function correctly in an HA configuration under the following conditions:
- Gateways must monitor availability using individual node IP addresses, not CARP VIPs
- Policy routing rules referencing gateway groups are synchronized via XMLRPC
- After failover to the secondary, gateway groups continue operating with the same priority logic
Warning:
When using gateway groups with Tier settings for load balancing between WAN links, verify that both WAN interfaces are physically reachable from both cluster nodes. If one WAN link is unavailable on the secondary node during failover, all traffic will shift to the remaining link.
HA with IPsec VPN
Integrating IPsec tunnels with an HA cluster requires specific configuration. The key requirement is that all IPsec tunnels must bind to CARP VIPs rather than individual node addresses.
IPsec Configuration for HA
When configuring Phase 1 (IKE), specify the CARP VIP as both the My identifier and the Interface:
| Parameter | Value | Description |
|---|---|---|
| Interface | WAN CARP VIP | Binding to the virtual address |
| My identifier | IP address: 198.51.100.200 | WAN CARP VIP address |
| Peer identifier | Remote peer IP | VPN partner address |
During failover, the IPsec tunnel re-establishes automatically because the CARP VIP migrates with the master role. The remote peer continues connecting to the same IP address (CARP VIP), and IKE negotiation proceeds from scratch.
IPsec Recovery Time
Unlike standard TCP/UDP traffic, IPsec tunnels do not survive failover intact. Although pfsync replicates states, the IKE SA (Security Association) requires renegotiation:
| Phase | Approximate Time |
|---|---|
| Failure detection (CARP) | ~3 seconds |
| CARP VIP migration | Instantaneous |
| IKE Phase 1 re-establishment | 2-5 seconds |
| IKE Phase 2 re-establishment | 1-2 seconds |
| Full tunnel recovery | 6-10 seconds |
To minimize IPsec downtime during failover, configure DPD (Dead Peer Detection) on the remote peer with aggressive timers (e.g., 10-second interval, 3 retries).
Multiple IPsec Tunnels
When multiple IPsec tunnels are present, each must be bound to a CARP VIP. All tunnels are synchronized via XMLRPC when the IPsec checkbox is enabled in the synchronization settings. After failover, each tunnel recovers independently - some may come up faster than others depending on the remote peer’s DPD configuration.
HA with OpenVPN
OpenVPN servers and clients in an HA cluster also require binding to CARP VIPs. OpenVPN behavior during failover differs from IPsec and depends on the transport protocol (UDP or TCP) and tunnel mode (tun or tap).
OpenVPN Server Configuration for HA
When creating an OpenVPN server, specify:
| Parameter | Value |
|---|---|
| Interface | WAN CARP VIP or LAN CARP VIP |
| Local port | Standard port (1194 or custom) |
OpenVPN configuration is synchronized via XMLRPC when the OpenVPN checkbox is enabled.
Failover Behavior
| Mode | Protocol | Failover Behavior |
|---|---|---|
| tun + UDP | UDP | Clients reconnect automatically via keepalive |
| tun + TCP | TCP | Clients must reconnect (TCP session is lost) |
| tap + UDP | UDP | Clients reconnect, L2 state is restored |
For optimal HA compatibility, use tun mode with UDP transport. In this configuration, OpenVPN clients automatically detect connectivity loss through the keepalive mechanism and reconnect to the CARP VIP, which is already served by the backup node.
Warning:
OpenVPN certificates must be synchronized via XMLRPC (Certificates, CAs checkbox). If certificates differ between nodes, clients will be unable to connect to the backup node after failover.
HA with DHCP Server
The DHCP server in an HA cluster requires special attention to prevent address conflicts during split-brain scenarios.
Standard Configuration
In the default configuration, the DHCP server is bound to the LAN interface and assigns addresses from a single pool. Clients send requests to the CARP VIP (which serves as their default gateway), and DHCP responses come from the master node.
During failover, the DHCP server on the backup node activates and continues address assignment. Since the DHCP configuration is synchronized via XMLRPC, the backup node uses the same address pool.
Preventing Conflicts
To prevent address duplication during split-brain, use one of the following strategies:
| Strategy | Description | Example |
|---|---|---|
| Pool splitting | Each node serves a different range | Primary: .100-.199, Secondary: .200-.249 |
| DHCP Failover Peer | ISC DHCP built-in failover mechanism | Automatic lease division |
| Short lease time | Minimize conflict window | 1-hour lease instead of 24 hours |
When splitting pools, disable DHCP Server synchronization in the XMLRPC settings and configure ranges manually on each node.
Static DHCP Mappings
Static DHCP mappings are synchronized via XMLRPC along with the rest of the DHCP configuration. These mappings do not create conflicts during split-brain because each MAC address is bound to a fixed IP address.
Planned Maintenance Failover
Planned failover (maintenance failover) is performed when the primary node requires servicing - pfSense upgrades, hardware replacement, or diagnostics.
Step-by-Step Procedure
Preparation: confirm the secondary node is fully synchronized
- Check Status > CARP (failover) - all VIPs should show MASTER on the primary and BACKUP on the secondary
- Verify no synchronization errors appear in the logs
Switch to secondary: on the primary node, navigate to Status > CARP (failover) and click Enter Persistent CARP Maintenance Mode
- All CARP VIPs transition to BACKUP on the primary
- The secondary promotes to MASTER for all VIPs
- The transition takes several seconds
Verification: confirm traffic is being handled by the secondary
- On the secondary, check Status > CARP (failover) - all VIPs should be in MASTER state
- Test traffic through the firewall (web access, DNS, VPN)
- Verify IPsec/OpenVPN tunnel functionality
Primary maintenance: perform the required work on the primary node
- pfSense upgrade
- Hardware replacement
- Diagnostics
Return to primary: on the primary node, exit maintenance mode via Status > CARP (failover) - click Leave Persistent CARP Maintenance Mode
- The primary resumes MASTER status for all VIPs
- The secondary returns to BACKUP
Final check: confirm the primary has reclaimed the MASTER role and synchronization is functioning correctly
Warning:
Before any planned failover, create configuration backups of both nodes via Diagnostics > Backup & Restore. This ensures recovery is possible if unexpected issues arise.
Upgrading pfSense in an HA Cluster
When upgrading pfSense in an HA cluster, a specific sequence must be followed:
- Create configuration backups of both nodes
- Place the primary in maintenance mode (traffic shifts to the secondary)
- Upgrade the primary node
- Wait for the upgrade and reboot to complete
- Confirm the primary has started correctly (do not exit maintenance mode yet)
- Place the secondary in maintenance mode on the secondary node (traffic returns to the upgraded primary)
- Upgrade the secondary node
- Wait for the upgrade and reboot to complete
- Exit maintenance mode on both nodes
- Verify synchronization and CARP status
The secondary is upgraded second because XMLRPC synchronization may be incompatible between different pfSense versions. Once both nodes run the same version, synchronization resumes normally.
Testing Failover
Regular failover testing is a mandatory practice for production clusters. Testing should be performed when the cluster is first deployed and after every significant configuration change.
Test 1: Planned Failover
Objective: verify maintenance mode operation.
- Record the current CARP VIP status on both nodes
- Place the primary in maintenance mode
- Test service availability (HTTP, DNS, VPN)
- Exit maintenance mode on the primary
- Confirm the primary has reclaimed MASTER status
Test 2: Interface Failure
Objective: verify automatic failover on link loss.
- Disconnect the WAN network cable on the primary node
- Observe the CARP VIP transition on WAN
- Verify the LAN CARP VIP also transitions (if peer IP monitoring is configured)
- Reconnect the cable
- Confirm the primary reclaims MASTER status (preemption)
Warning:
When testing interface disconnection, behavior depends on the CARP configuration. By default, pfSense only transitions the CARP VIP on the affected interface. To transition all VIPs when a single interface fails, configure IP monitoring via System > High Avail. Sync by specifying IP addresses to monitor (e.g., the upstream gateway address).
Test 3: Complete Node Failure
Objective: verify behavior when the primary node is powered off.
- Power off the primary node
- Wait for all CARP VIPs to transition to the secondary (~3 seconds)
- Test all service availability from the secondary
- Verify existing TCP sessions are preserved
- Power on the primary
- Confirm the primary reclaims MASTER status and synchronization resumes
Test 4: IPsec/OpenVPN Recovery
Objective: verify VPN tunnel recovery after failover.
- Establish IPsec and/or OpenVPN connections through the cluster
- Perform a planned failover to the secondary
- Record the recovery time for each tunnel
- Verify traffic flows through recovered tunnels
- Return the primary to active status
Documenting Results
Results of each test should be documented in the following format:
| Parameter | Value |
|---|---|
| Test date | YYYY-MM-DD |
| Test type | Planned failover / Interface failure / Complete failure |
| Failover time | X seconds |
| Services affected | HTTP, DNS, VPN, etc. |
| TCP sessions preserved | Yes / No |
| VPN tunnels recovered | Yes / No (recovery time) |
| Issues discovered | Description |
Monitoring HA Status
Continuous cluster state monitoring enables early detection of issues before they impact service availability.
Built-in Monitoring Tools
| Tool | Location | Information Provided |
|---|---|---|
| CARP Status | Status > CARP (failover) | MASTER/BACKUP status for each VIP |
| System Logs | Status > System Logs | Synchronization errors, CARP events |
| States | Diagnostics > States | Current state count |
| pfTop | Diagnostics > pfTop | Active connections in real time |
External Monitoring
For production clusters, external monitoring should be configured:
- SNMP: pfSense supports SNMP for monitoring via Zabbix, Nagios, or similar platforms. Monitor CARP VIP status, state count, CPU utilization, and memory usage.
- Syslog: configure remote syslog forwarding to preserve a history of CARP events and synchronization activity.
- Wazuh: when pfSense is integrated with Wazuh, CARP event logs can be processed by detection rules to generate alerts on unplanned failovers.
Key Metrics
| Metric | Normal Value | Alert When |
|---|---|---|
| CARP VIP status | MASTER on primary, BACKUP on secondary | Any change without planned failover |
| State count difference | Less than 1% | More than 5% divergence |
| Last XMLRPC sync time | No more than 5 minutes ago | More than 15 minutes without sync |
| Synchronization errors | 0 | Any error |
HA Limitations in pfSense
When designing an HA cluster, the platform’s architectural limitations must be taken into account.
Active/Active Is Not Supported
pfSense officially supports only the active/passive configuration. Active/active mode (where both nodes simultaneously process traffic with load balancing) is not implemented at the CARP and pfsync level. Attempting to achieve active/active by manually distributing CARP VIPs across nodes leads to asymmetric routing and state loss.
Two-Node Limitation
Only two-node clusters are officially supported. While it is technically possible to add a third node with a higher skew value, XMLRPC synchronization only supports specifying a single peer. Three-node configurations require manual configuration synchronization to the third node.
Layer 2 Dependency
CARP in multicast mode requires both nodes to reside in the same broadcast domain for each interface carrying a CARP VIP. This limits geographic distribution - nodes cannot be placed in separate data centers without establishing Layer 2 connectivity (e.g., via VXLAN or MPLS L2VPN).
Split-Brain
When connectivity between nodes is lost (sync interface failure), both nodes may transition to MASTER state. This results in:
- Duplicate DHCP address assignments (if pools are not split)
- CARP VIP MAC address conflicts on the network segment
- State table inconsistency
To minimize split-brain risk, use a dedicated physical link for the sync interface and monitor its availability.
Migration from Other Platforms
Migrating from Cisco ASA Failover
When transitioning from Cisco ASA Active/Standby failover to pfSense HA, the following architectural differences must be considered:
| Feature | Cisco ASA | pfSense |
|---|---|---|
| Failover protocol | Proprietary (LAN-based failover) | CARP (OpenBSD) |
| State synchronization | Stateful failover link | pfsync |
| Configuration sync | Automatic config replication | XMLRPC |
| Active/Active | Supported (with contexts) | Not supported |
| Dedicated failover link | Failover interface + State link | Sync interface (single for both) |
| Interface monitoring | Built-in interface monitoring | IP monitoring via CARP |
| Preemption | Configurable | Automatic (by skew) |
Migration procedure:
- Document the current ASA failover configuration (show failover, show running-config)
- Plan the pfSense HA addressing scheme (three addresses per interface)
- Configure pfSense primary and secondary as documented
- Migrate firewall rules, NAT, and VPN to the primary node
- Configure XMLRPC synchronization
- Test failover before switching production traffic
- Cut over production traffic to the pfSense HA cluster
Migrating from FortiGate HA Cluster
FortiGate supports both active/passive and active/active HA. When migrating to pfSense, consider:
| Feature | FortiGate | pfSense |
|---|---|---|
| HA protocol | FGCP (FortiGate Clustering Protocol) | CARP |
| Session sync | HA heartbeat link | pfsync |
| Config sync | Automatic | XMLRPC |
| Active/Active | Supported | Not supported |
| Cluster > 2 nodes | Up to 4 nodes | 2 nodes only |
| Session pickup | TCP, UDP, ICMP | Via pfsync (all pf protocols) |
| Heartbeat interface | Dedicated HA link | Sync interface |
| Virtual MAC | 00:09:0f:09:xx:xx | 00:00:5e:00:01:xx (CARP) |
When migrating from a FortiGate active/active configuration, the architecture must be redesigned for the pfSense active/passive model. This may require changes to traffic distribution at the upstream router or switch level.
The migration procedure follows the same pattern as Cisco ASA: documentation, planning, configuration, testing, cutover.
Disaster Recovery
When both cluster nodes fail completely (e.g., due to a power outage in the server room), the recovery procedure depends on the availability of configuration backups.
Recovery with Backups
- Power on both nodes
- Wait for pfSense to boot on both nodes
- Check CARP status - with a correct configuration, the primary should become MASTER
- Verify pfsync and XMLRPC synchronization
- If the configuration is corrupted, restore from backup via Diagnostics > Backup & Restore
Recovery After Single Node Loss
- The backup node automatically assumes the MASTER role
- Replace the failed node
- Install pfSense and restore the base configuration (hostname, IP addresses, sync interface)
- Configure XMLRPC on the primary to synchronize with the new secondary
- Trigger a full synchronization via System > High Avail. Sync (click Save)
- Verify CARP status and the state table
Related Sections
- CARP and Virtual IPs - CARP VIP, VHID, and Advertising Frequency configuration
- Configuration Synchronization - detailed pfsync and XMLRPC setup
- pfSense VPN - IPsec and OpenVPN configuration with CARP VIPs
- pfSense NAT - outbound NAT with CARP VIPs in multi-WAN configurations
- Multi-WAN - gateway groups and policy routing in HA context