Lacp Esxi 7

8 views

Skip to first unread message

Inacayal Tanoesoedibjo

unread,

Jul 25, 2024, 12:33:37 AM7/25/24

to travtiodiepas

ESXi has 2 options when it comes to virtual networking: the vSphere Standard Switch (VSS) and the Distributed vSwitch (DVS). The VSS is ESXi local, which means it can only be managed from the ESXi host itself. Where the DVS can only be managed through the vCenter server, its configuration is being distributed to all connected ESXi hosts using Host Proxy Switches. The DVS offers several improvements over a VSS, like (for example) LACP support. The VSS does not support LACP.

lacp esxi 7

Download File ✺✺✺ https://urlin.us/2zMFS9

When configuring LACP you have to configure the LAG on the DVS and have to manually add the physical NICS (pNIC or VMNICs) of each individual host to LAG uplinks (or you can do it scripted as you should).

The amount of uplink ports in a LAG have to be configured globally and this amount of LAG-uplinks are distributed to all Host Proxy Switches, which means that when 2 LAG uplinks are configured at the DVS/vCenter server level all hosts connected to that DVS will receive two LAG uplinks. The connected vSphere ESXi hosts cannot deviate from that amount of LAG uplinks.

The benefit is that LAG as logical link can utilize all the available bandwidth and you can add additional bandwidth by adding additional physical links. It can also help in case of failed physical connection, the connection will automatically be detected by LACP and will be thrown out of the logical link. This is what we call a Layer 2 high availability solution: the logical path is controlled by LACP and automatically scales when needed creating an optimal path between two devices.

A DVS and/or VSS-switches offers multiple load-balancing options: by default load balancing based on Virtual Port id (sometimes called Source-MAC pinning) is being used on the VSS and DVS. It has the same drawback as IP hashing, but the good news is that having this type of load-balancing is that you do not have to configure the physical switch for layer 2 availability (LACP/IP hashing): A Virtual Machine is pinned to an uplink or an interface and it will stay there, as long as no failure occurs. When a failure on the pNIC occurs, the VM will be pinned to another available pNIC and (when configured properly). a RARP packet is being send to inform the physical switch so it can learn the MAC-address on this new interface, minimizing the outage time.

Keep in mind that with both Virtual Port ID and Load-Balanced Teaming (LBT) options, there is no configuration needed on the physical switch to enable the distribution of the VM workloads over the available physical NICs. Both teaming policies follow the networking standards and utilized all physical links by default (as long as they are active). So less configuration equals easier configuration, which lowers the operational complexity.

If you have multiple ESXi host, each host has its own PortChannel ID. It is becoming a daunting task if you have a large amount of port channels, you will have to start tracking down the ports and the connected ESXi host. This piece of information is not show with this command. You can utilize CDP and/or LLDP or when those are not available, consult you (hopefully not outdated) networking documentation to track down the correct ESXi host. #prone-to-error

Monitoring LACP from a vSphere ESXi perspective is even a little bit harder.
The LACP configuration is being provisioned from the vCenter and distributed to the proxy switches hosted on the ESXI hosts: It is centrally managed. On the other hand: The LACP port channel status has to be monitored from the ESXi host itself. So if you have a large amount of ESXi hosts, you have to log into each individual ESX host to check the status.

In the network world MLAGs are very common these days, as they overcome traditional network problems caused by loops. They still require a manual configuration AND the configuration on both MLAG peer switches must be exactly the same, which again imposes operational challenges, because you now have three (3) devices that need to be configured correctly in order to work. The amount of dependencies are constantly growing (with MLAG), but without an actual need as Virtual Port ID and LBT do not need LAG and/or MLAG. You can connect you ESXi host to two (or more) independent switches, use either Virtual Port ID or LBT and you are good to go.

With the available load-balancing algoritmes available in vSphere there is no need for a complex, prone-to-error LACP configuration, as VMware offers good (and some better) options from a configuration, utilization and availability perspective.

So why do some customer still use LACP you might ask? Usually it is because of the a lack of (VMware-) knowledge and add up the good experiences with LACP from the past by network admins.

LACP between switches and bare-metal servers are still a very good option, but VMWare offers some enhancement which neglect the use of LACP for vSphere environments.

While I do agree LACP has its challenges with ESXi. The conclusion of the article is in error. Once setup the performance of LACP is better. Not only that LACP can detect upstream network issues LBT is oblivious to. LACP is also the VSAN recommendation because VSAN cannot utilize LBT.
See -r-vsan-tm-network-design/static-lacp-with-route-based-on-ip-hash/

LBT can be marginally better in the use case where you have 2 elephant flows in the event they happen to collide because of hash methodology used. As I mentioned above, this can be avoided with some planning.

LBT is not network aware. What this means is, the host can load balance the traffic on its side but the network is oblivious to this. Which means the network will only send return traffic on the physical link on which it learned the VM MAC.

Lets say you have two switches connected to a Host. LBT pinned traffic from VM A to Switch 1 in VLAN 10.
Now VM B on Host 1 in VLAN 20 presented itself on Switch 2 because of LBT. For traffic to flow from VM A to VM B, it will have to traverse the inter-switch link. This will not only increase latency, it also has the potential to trash your inter-switch bandwidth.

With LACP however, the switch is aware of host load-balancing and will localize traffic where possible. What this means is, in the above example, both the switches (assuming they are configured in MLAG) would be aware of VM A and VM B MACs. Therefore traffic originating from VM A destined to VM B arriving at Switch 1 will be forwarded on the local switch port. The switches will actively try to avoid forwarding traffic over inter-switch MLAG channel. This is desirable because it keeps latency low and also avoid using up inter-switch bandwidth unless necessary.

Further, any performance gains that LBT provides between the host and the connected switch are likely to disappear if the traffic has to traverse inter-rack switches, since inter-rack switches are likely to use LACP or ECMP (L3 LACP) for inter-switch traffic load-balancing. So without proper network engineering, your elephant flows could end up colliding in the upstream network elsewhere negating any benefit.

2) High Availability
Indeed LBT detect link failures and can use beacon probing. But you get the same with LACP with both host and the network being aware of each other and reacting to failures on either side because of the LACP protocol. With LBT, the network is not aware of soft failures on the host side and cannot react to it.

4) VSAN Performance
It is beyond doubt that VSAN performs better with LACP than with LBT when you have > 2 VMs. This is an established fact. Claiming anything otherwise would be negating laws of physics ?

5) VVD Recommendation
It is just a recommendation. VVD cannot account for your networking, your performance requirements or for the nature of traffic flows and security. You have to do that yourself with solid network engineering.

6) LACP Configuration
I would agree that LACP configuration on the switch and the host is relatively more work compared to LBT. But only slightly. And this can be mitigated with automation scripts and good planning.

In summary, if you want quick and easy use LBT. If you want performance, availability, security at scale .. LACP is the answer. The installed base of LACP capable switches in terms of $$s is more than the market cap of VMW ?

My conclusion:
LACP could technically be better, but the operational complexity it introduces has made it impossible to find it anywhere in any of the VMware validated designs (VVD and VCF) and or recommendations. Companies want validated design these days, thanks to their operational simplicity.

Not a all. TCP retransmission are caused by packets which are lost during transit and must be resend. This is usually due faulty cabling: check for CRC errors on interfaces. Lacp bundles multiple link together, providing high availability

Just curious. Came across this site after working long hours trying to determine high level of TCP malformed packets and retransmissions across our server VLANs. Network team says VMware or OS issue and so we (VMware and OS team) are coming up empty. Thanks.

Great article. You inspired me to expand an your article, so I put together something that shares my take on this and other things not really covered here. I would greatly appreciate any feedback you have.

I have been attempting to create a basic two-port LAG, between by HP 4512zl switch, and two 10GbE interfaces on a VMWare ESXi 5.5 host, and no matter what combination on the switch I attempt, the LAG never comes up.

Right now, I've got the two NICs on the VMWare server configured in just standard vswitch0. Are you saying that if I want to do LACP, I will need to set up a distributed vswitch in VMWare and then enable LACP from there?

Do you happen to know what the default load-balancing method VMWare is using, if the two NICs are configured for load balancing in just a basic vswitch? Are they doing round-robin load balancing if I'm using IP hash?

In the past, I have seen some issues if you try to manage an esxi host over a link aggregation. In other words, you might try keeping your Management Network(VMKernel Port) on the esxi host on a single connection and then use a link aggregation for your Virtual Machine Port Groups / VMs.