Network Bonding Explained: All 7 Modes, Architecture, and Real-World Configuration
Network bonding — also called NIC teaming, link aggregation, or Ethernet bonding — is the technique of combining two or more physical network interface cards (NICs) into a single logical interface managed by the operating system kernel. The result is a unified network device that delivers increased aggregate bandwidth, automatic failover, and load distribution across all member links simultaneously.
At the kernel level on Linux systems, bonding is implemented through the `bonding` kernel module, which presents a single virtual interface (typically named `bond0`) to the network stack. This abstraction means applications, routing tables, and firewall rules interact with one interface regardless of how many physical NICs are underneath — a critical architectural detail that simplifies management while delivering enterprise-grade resilience.
Why Network Bonding Matters in Production Environments
Before diving into modes, it is worth understanding precisely what problem bonding solves — and where it does not. A single Gigabit Ethernet port has a hard ceiling of approximately 125 MB/s of throughput. For a database server, a storage node, or a high-traffic web application, that ceiling is reached quickly. Bonding two 1 GbE NICs does not magically double throughput for a single TCP stream (that is a common misconception), but it does allow multiple simultaneous flows to saturate both links, effectively doubling aggregate capacity.
Beyond raw throughput, bonding eliminates the single point of failure that a lone NIC or cable represents. In environments where uptime is measured in nines, that matters enormously.
Core Benefits at a Glance
- Aggregate bandwidth: Multiple physical links contribute to total throughput for concurrent traffic flows
- Automatic failover: Link failure detection (via MII or ARP monitoring) triggers sub-second switchover to a surviving interface
- Load distribution: Traffic is spread across member interfaces according to the active bonding algorithm
- Transparent to applications: The bond interface has a single MAC address and IP, requiring no application-level changes
- Hardware cost efficiency: Bonding commodity NICs can be more cost-effective than upgrading to a single 10 GbE card in some scenarios
Network Bonding Architecture: How It Works Under the Hood
The Linux kernel bonding driver operates between Layer 2 (Data Link) and the physical NIC drivers. When a frame is transmitted, the bonding driver's transmit policy selects which slave interface to use. On receipt, all slave interfaces pass frames up to the bond master, which de-duplicates and delivers them to the network stack.
Link monitoring is the mechanism that detects failures. Two methods exist:
- MII (Media Independent Interface) monitoring: Polls the physical link state of each NIC at a configurable interval (`miimon` parameter, typically 100ms). Fast and reliable for detecting cable pulls or NIC failures.
- ARP monitoring: Sends ARP requests to a target IP and watches for replies. More useful when you need to verify end-to-end connectivity rather than just physical link state, but introduces dependency on a reachable ARP target.
The `downdelay` and `updelay` parameters add hysteresis — preventing rapid flapping when a link bounces. Setting these to 200ms each is a common production baseline.
All 7 Linux Bonding Modes: Technical Deep Dive
The Linux bonding driver defines seven distinct modes (0 through 6). Each implements a different transmit policy and failover behavior. Selecting the wrong mode is one of the most common misconfigurations in server deployments.
Mode 0 — Round-Robin (balance-rr)
Packets are transmitted sequentially across all active slave interfaces in a rotating fashion: packet 1 on eth0, packet 2 on eth1, packet 3 on eth0, and so on.
What actually happens: Round-robin operates at the packet level, not the flow level. This means a single TCP connection can have its packets delivered out of order if the two paths have different latency. The receiving host's TCP stack will reorder them, but this causes retransmits and throughput degradation in practice — particularly noticeable with large file transfers over a single connection.
Switch requirement: The switch ports must be configured as a static LAG (Link Aggregation Group) without LACP. Without this, the switch will see frames from the same MAC address arriving on multiple ports and may trigger a loop-protection shutdown.
Best use: Bulk transfer workloads with many simultaneous short-lived connections, where per-packet reordering is tolerable.
Mode 1 — Active-Backup
Only one slave interface is active at any time. All others are in a hot-standby state. When the active link fails (detected via MII or ARP monitoring), the bonding driver promotes a backup slave and sends a gratuitous ARP to update the network's MAC address tables.
Critical nuance: In active-backup mode, the bond interface always presents the same MAC address to the network (the MAC of the currently active slave). This means no special switch configuration is needed — from the switch's perspective, it is a normal single-host connection. This is the only mode that works correctly on switches without any LAG configuration.
Failover timing: With `miimon=100`, `downdelay=200`, `updelay=200`, you can expect failover in approximately 200–300ms — fast enough to avoid TCP session drops in most cases.
Best use: High-availability scenarios where simplicity and compatibility matter more than bandwidth — management interfaces, out-of-band access, or any environment where the switch is not under your control.
Mode 2 — Balance-XOR
Traffic is distributed using a transmit hash policy applied to each packet. The default hash is `(source_MAC XOR destination_MAC) modulo slave_count`. Higher-level policies (`layer3+4`) use IP addresses and port numbers for better distribution.
The layer3+4 policy: Configuring `xmit_hash_policy=layer3+4` dramatically improves distribution by hashing on source IP, destination IP, source port, and destination port. This ensures different TCP flows to the same destination server are spread across links, which the default MAC-based hash cannot achieve.
Switch requirement: Static LAG configuration on the switch (same as Mode 0), but without the packet-reordering problem since all packets within a single flow traverse the same interface.
Best use: Environments needing load balancing without LACP support, particularly when combined with the `layer3+4` hash policy.
Mode 3 — Broadcast
Every packet is transmitted simultaneously on all slave interfaces. Every slave sends an identical copy of every frame.
When this is actually useful: Broadcast mode is not about bandwidth — it is about guaranteed delivery to multiple network segments simultaneously. It is used in specialized high-availability clustering scenarios where two separate switches or network paths must both receive every packet (for example, certain storage replication or financial trading systems with redundant network fabrics). It is also used in some network monitoring setups.
The bandwidth cost: With two NICs in broadcast mode, you consume 2x the bandwidth on the wire for every packet. With four NICs, 4x. This mode should never be used for general-purpose traffic.
Mode 4 — 802.3ad / LACP (Dynamic Link Aggregation)
This is the IEEE 802.3ad standard, implemented via the Link Aggregation Control Protocol (LACP). The bonding driver and the switch exchange LACP PDUs (Protocol Data Units) to dynamically negotiate which links form the aggregation group, their parameters, and their health.
How LACP negotiation works: Each side sends LACPDUs advertising its system priority, port priority, and aggregation key. Links with matching keys on both sides form a LAG. If a link fails, LACP detects it and removes it from the group without any manual intervention.
Transmit hash policy: Like Mode 2, Mode 4 uses a hash policy for load distribution. The `layer3+4` policy is strongly recommended here as well. Note that LACP does not guarantee per-packet load balancing — it distributes flows across links, so a single large file transfer will still use only one physical link.
Switch configuration: The switch must have LACP enabled on the corresponding port channel. Mismatched LACP modes (active vs. passive) are a frequent source of bonding failures. Both sides can be set to `active` to ensure negotiation always proceeds.
Best use: Data centers, high-performance servers, and any environment where you control the switch configuration. This is the gold standard for production bonding when switch support is available.
Mode 5 — Balance-TLB (Adaptive Transmit Load Balancing)
Mode 5 distributes outgoing traffic across all slaves based on the current load of each interface (the least-loaded slave gets the next outgoing packet). Incoming traffic is received only on a single designated slave.
The key advantage: No switch configuration is required whatsoever. The bond interface uses different source MAC addresses per slave for outgoing traffic, which is valid behavior that any switch handles transparently.
The limitation: Incoming traffic is not balanced. If your server primarily receives large data volumes (a download server, a database replica receiving replication streams), Mode 5 provides no benefit for that direction. If your server primarily sends data, Mode 5 is highly effective.
Failover behavior: If the receiving slave fails, another slave takes over the receive role. Outgoing load balancing continues across remaining slaves.
Mode 6 — Balance-ALB (Adaptive Load Balancing)
Mode 6 extends Mode 5 by adding incoming load balancing through ARP negotiation. The bonding driver periodically sends ARP replies with different source MAC addresses to different clients, causing those clients to send return traffic to different slave interfaces.
The ARP manipulation mechanism: This is the clever part. The driver intercepts ARP replies and rotates the source MAC address among the slaves. Clients cache these ARP entries and direct their traffic to whichever slave MAC they learned. This achieves incoming load balancing without any switch-side configuration.
Practical caveat: The ARP-based incoming balancing only works for hosts that have recently communicated with the bonded server. New connections always arrive on the primary slave until an ARP reply is sent. In high-connection-rate scenarios, the incoming distribution may be uneven.
Best use: Environments without LACP-capable switches that need bidirectional load balancing. A solid choice for VPS Hosting environments where the hypervisor's virtual switch may not support LACP.
Bonding Mode Comparison Table
| Mode | Name | Load Balancing | Fault Tolerance | Switch Requirement | Bandwidth Gain | Best For |
|---|---|---|---|---|---|---|
| —— | —— | ————— | —————– | ——————- | —————- | ———- |
| 0 | Round-Robin | Per-packet | No | Static LAG | Yes (aggregate) | High-volume multi-flow transfers |
| 1 | Active-Backup | No | Yes | None | No | HA management interfaces |
| 2 | Balance-XOR | Per-flow (hash) | Yes | Static LAG | Yes (aggregate) | General load balancing |
| 3 | Broadcast | No | Yes (redundant) | None | No (wastes BW) | Specialized clustering |
| 4 | 802.3ad / LACP | Per-flow (hash) | Yes | LACP required | Yes (aggregate) | Data centers, production servers |
| 5 | Balance-TLB | TX only | Yes | None | TX only | Outbound-heavy workloads |
| 6 | Balance-ALB | TX + RX (ARP) | Yes | None | Yes (bidirectional) | No-LACP environments |
Configuring Network Bonding on Linux
Prerequisites
“`bash
Verify bonding module is available
modinfo bonding
Load the module if not already loaded
modprobe bonding
“`
Configuration via systemd-networkd (Modern Approach)
Create `/etc/systemd/network/bond0.netdev`:
“`ini
[NetDev]
Name=bond0
Kind=bond
[Bond]
Mode=802.3ad
TransmitHashPolicy=layer3+4
MIIMonitorSec=100ms
LACPTransmitRate=fast
“`
Create `/etc/systemd/network/bond0.network`:
“`ini
[Match]
Name=bond0
[Network]
Address=192.168.1.10/24
Gateway=192.168.1.1
“`
Create `/etc/systemd/network/eth0.network` and `eth1.network`:
“`ini
[Match]
Name=eth0
[Network]
Bond=bond0
“`
Configuration via `/etc/network/interfaces` (Debian/Ubuntu)
“`bash
auto bond0
iface bond0 inet static
address 192.168.1.10
netmask 255.255.255.0
gateway 192.168.1.1
bond-slaves eth0 eth1
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit-hash-policy layer3+4
auto eth0
iface eth0 inet manual
bond-master bond0
auto eth1
iface eth1 inet manual
bond-master bond0
“`
Verifying Bond Status
“`bash
Check bond status and slave states
cat /proc/net/bonding/bond0
Monitor interface statistics
ip -s link show bond0
Check LACP negotiation state (Mode 4)
cat /proc/net/bonding/bond0 | grep -A5 "LACP"
“`
The `/proc/net/bonding/bond0` output is the most important diagnostic tool. It shows the active slave, link status of each member, MII status, and (for Mode 4) LACP partner information.
Network Bonding on Dedicated Servers and VPS
On bare-metal Dedicated Servers, you have full control over both the server's NIC configuration and (typically) the switch port configuration, making Mode 4 (LACP) the natural choice for production workloads. Most data center providers can configure LACP on your switch ports upon request.
For VPS with cPanel environments, the hypervisor's virtual networking layer handles the underlying bonding at the host level. The guest VM typically sees a single virtual NIC, but the host may be running bonded physical interfaces beneath it — providing redundancy transparently.
When deploying GPU-intensive workloads on GPU Hosting infrastructure, network bonding becomes critical for feeding data to GPU nodes fast enough to prevent I/O starvation. Training pipelines and inference serving both benefit from the aggregate bandwidth that LACP bonding provides.
Common Pitfalls and Edge Cases
Spanning Tree Protocol (STP) conflicts: When adding bonded ports to a switch, STP may temporarily block the ports during negotiation. Configure PortFast (or equivalent) on the switch ports to prevent 30-second delays during link-up events.
MTU mismatches: All slave interfaces in a bond must have identical MTU settings. A mismatch causes intermittent packet loss that is extremely difficult to diagnose. Always verify with `ip link show` after configuration.
LACP timeout modes: LACP supports "slow" (30-second) and "fast" (1-second) timeout modes. Always use `lacp-rate fast` (`bond-lacp-rate 1`) in production. Slow mode means a failed link takes up to 90 seconds to be removed from the LAG.
Virtual machine live migration: If a VM with a bonded interface is migrated to a different host, the MAC addresses of the bond may change depending on the hypervisor. This can cause ARP cache stale entries and brief connectivity loss. Pre-stage gratuitous ARPs in your migration scripts.
Asymmetric hashing: With Mode 4 and `layer3+4` hashing, traffic from server A to server B may traverse eth0, while return traffic from B to A traverses eth1 on B's bond. This is normal and expected — each endpoint independently hashes its outgoing traffic.
NetworkManager interference: On RHEL/CentOS systems, NetworkManager can interfere with manually configured bonds. Either configure bonds through NetworkManager's nmcli interface or disable NetworkManager for the relevant interfaces using `NM_CONTROLLED=no` in the interface configuration file.
Bonding vs. Other High-Availability Network Techniques
Network bonding is not the only approach to NIC redundancy. Understanding when to use alternatives is equally important.
| Technique | Layer | Switch Needed | Bandwidth Gain | Use Case |
|---|---|---|---|---|
| ———– | ——- | ————– | —————- | ———- |
| Bonding (Mode 1) | L2 | No | No | Simple failover |
| Bonding (Mode 4 LACP) | L2 | Yes (LACP) | Yes | Production servers |
| SR-IOV | L1/L2 | No | Yes | VM direct NIC access |
| ECMP Routing | L3 | Yes | Yes | Multi-path routing |
| MLAG | L2 | Yes (MLAG-capable) | Yes | Cross-switch redundancy |
MLAG (Multi-Chassis Link Aggregation) deserves special mention: it allows a server running Mode 4 bonding to connect its two NICs to two physically separate switches, both participating in the same logical LAG. This eliminates the switch itself as a single point of failure — a level of redundancy that standard single-switch LACP cannot provide.
Decision Matrix: Choosing the Right Bonding Mode
Use this framework to select your bonding mode:
Do you control the switch configuration?
- No → Go to Mode 1, 5, or 6
- Need bidirectional load balancing? → Mode 6
- Primarily outbound traffic? → Mode 5
- Pure failover, maximum simplicity? → Mode 1
- Yes → Go to Mode 0, 2, or 4
- Need dynamic negotiation and best-practice compliance? → Mode 4 (LACP)
- Static LAG, simpler setup? → Mode 2 with layer3+4 hash
- Specialized broadcast requirement? → Mode 3
Is this a management/IPMI interface? Always use Mode 1. Never risk a management interface on a mode that requires switch configuration.
Are you on a cloud or virtual platform? Check whether the hypervisor's virtual switch supports LACP. If not, Mode 6 provides the best balance of load distribution and compatibility.
For teams managing multiple servers through VPS Control Panels, verifying bonding status should be part of the standard post-deployment checklist alongside SSL verification via SSL Certificates and DNS propagation checks after Domain Registration.
Technical Key-Takeaway Checklist
- Always set `miimon=100` and `downdelay=200 updelay=200` as a baseline for MII monitoring in production
- Use `xmit_hash_policy=layer3+4` with Mode 2 and Mode 4 to ensure flow-level distribution rather than MAC-level
- Verify `/proc/net/bonding/bond0` immediately after configuration — do not assume it is working
- Configure LACP rate to `fast` in Mode 4 to reduce failover detection time from 90 seconds to 3 seconds
- Ensure all slave NICs have identical MTU, speed, and duplex settings before adding them to a bond
- On production Dedicated Servers, always request LACP configuration from your data center provider rather than using static LAG
- Test failover explicitly by unplugging one cable — do not assume the configuration is correct until you have verified it under failure conditions
- Document which physical NIC corresponds to which slave (eth0, eth1) using `ethtool -i eth0` to avoid confusion during physical maintenance
- For cross-switch redundancy in critical environments, investigate MLAG before settling for single-switch LACP
FAQ
Does network bonding double the speed of a single file download?
No. Bonding distributes traffic across links at the flow level (or packet level in Mode 0). A single TCP connection uses only one physical link at a time in most modes. Bonding increases aggregate throughput across multiple simultaneous connections, not the speed of any individual connection.
What is the difference between bonding Mode 4 (LACP) and a static LAG?
A static LAG (used by Modes 0 and 2) manually defines which ports form the aggregation group with no negotiation. LACP (Mode 4) dynamically negotiates the LAG using control packets, automatically detecting misconfigurations, failed links, and adding/removing members. LACP is more robust and is the industry standard for production deployments.
Can I configure network bonding on a VPS?
It depends on the hypervisor and hosting provider. Most cloud VPS instances present a single virtual NIC to the guest, with bonding handled at the hypervisor level. Some providers offering bare-metal-like VPS or dedicated cloud instances support guest-level bonding. Check with your provider before attempting to configure bonding inside a VPS guest.
What happens to active connections when a bonded link fails?
In Mode 1 (Active-Backup), the bond sends a gratuitous ARP after failover, updating switch MAC tables. Existing TCP connections experience a brief pause (typically under 300ms with fast MII monitoring) but generally survive. In Mode 4, LACP detects the failure and redistributes flows to surviving links — existing flows on the failed link will need to be re-established by the application.
Why is my Mode 4 bond showing only one active slave in `/proc/net/bonding/bond0`?
The most common cause is a switch-side misconfiguration. Verify that the switch ports are configured in the same port channel with LACP enabled in active mode. Also check that `lacp-rate` is set consistently on both sides. A mismatched LACP key or system priority can prevent aggregation even when physical links are up.
