Linux NIC teaming recommendations

Introduction

In my job as a network engineer, I am constantly looking for ways to increase the availability of the network. This is especially true in the data center, where services are expected to always be available. One of the ways to increase the network availability of a server is by using multiple network interfaces. This technique has many different names, but I am just going to call it NIC teaming.

Purpose of NIC Teaming

NIC teaming increases network availability by removing single-points-of-failure (SPOF). These SPOFs are components that will cause a service outage if they become unavailable. If we consider a single network connection from your server to your switch, we can identify quite a few SPOFs:
  1. Server NIC failure
  2. Network cable failure (such as being cut or unplugged)
  3. Network switch failure (such as a planned firmware upgrade or unplanned outage)
Methods of NIC Teaming

The reason I am writing this blog is to help people understand the different options for NIC teaming. If you search the Internet (like I did), you will be hard pressed to find a standard NIC teaming setup that works across all operating systems. You may not be able to find a listing of pros/cons and requirements of each NIC teaming strategy.

In order to fully understand the NIC teaming options available in Linux, please read the official Linux Bonding How To. I am only going to cover two of these options, which are the two that I am going to recommend.

Adaptive Load Balancing (ALB)

The first recommended NIC teaming strategy is called "Adaptive Load Balancing" (ALB). This is specified in Linux by using bonding mode = 6.
"Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation."
When you use ALB, you should plug each NIC into a different switch. This removes all three SPOF mentioned above. Additionally, it provides a basic level of load-balancing. I highly recommend using ALB for NIC teaming, because it offers the most advantages without requiring special configuration on the network switch.

IEEE 802.3ad Dynamic link aggregation" (LACP)

The second recommended NIC teaming strategy is called "IEEE 802.3ad Dynamic link aggregation" (LACP). This is specified in Linux by using bonding mode = 4.

When you use LACP, you are required to plug all NICs into the same switch. You should only use LACP if you have an internally redundant switch, usually in the form of modular cards or a proprietary stack of switches. Additionally, you are also required to configure the switchports to use LACP. Once you have met all the requirements, you will have a great network connection. LACP can have the same fault-tolerence as ALB, and it has a better load-balancing than ALB.

Summary

Most people should use ALB (mode=6) for NIC teaming their Linux server because it is the simplest method to achieve fault-tolerance and load balancing. If you require higher bandwidth, and you have an internally redundant switch, and you can configure your switchports to use LACP, then you should use LACP (mode=4) for NIC teaming.

Here are a few links on how to configure NIC teaming in Ubuntu Linux:

HowTo do Ethernet Bonding on Ubuntu – Properly

UbuntuLTSP Trunking

Caveat: I am a network engineer, and not a server engineer. It is my goal for everyone to increase their server's network availability with this knowledge. If you have an opinion on this topic, please share it in the comments. Thanks!

Comments

  1. I was trying to set this up recently on a server I built for a local private school. For some reason, I was having some rather strange problems with it. Would you mind taking a look at this forum post and telling me if you know what I did wrong? http://ubuntuforums.org/showthread.php?t=1273542

    ReplyDelete
  2. At very least use the proper terminology Tristan. Teaming is the windows term and bonding is the 'nix term. Please us them.

    ReplyDelete
  3. Switches have very high MTBFs in the order of 100,000 hours. For the availability the system, switch failure is not the problem - the server is more likely to be the problem.

    More importantly, the whole site is usually powered through a UPS - the UPS availability limits the service availability since it powers the server and the dual/stacked switches.

    To achieve high availability you need to have dual AC supplies (with UPSs if you like), dual WAN links and dual LAN switches. You may need dual air-conditioners as well.

    Link aggregation is a low-cost solution that does buy you a little extra availability, but your OS, application, and even your server hardware (nic, hdd, raid controller, power supply, fan) is what you really need to focus on.

    ReplyDelete
  4. Phil,

    Great points. There are a lot of steps to make an entire system with high availability. The best idea is to start with the components that provide the most benefit with the least effort and cost.

    I agree that switches are very reliable, but I still want to upgrade the code on them at least once a year. Upgrades become much easier with NIC teaming (or port bonding, for Jeff). :)

    Tristan

    ReplyDelete
  5. Quick question : does this give the full 2Gb (assuming 2x 1Gb bonded) bandwidth if transferring data from one host to another? Or does it load balance across 2Gb of bandwidth, whilst allowing a max of 1Gb/s bandwidth per stream?

    ReplyDelete

Post a Comment

Popular posts from this blog

Using the Cisco console in Linux

What it takes to make Ubuntu ready for use