[Iscsitarget-devel] Possible ESX5 And Broadcom iSCSI Adapter Incompatibility

2 views
Skip to first unread message

Caleb Anthony

unread,
Feb 13, 2012, 5:54:31 PM2/13/12
to iscsitar...@lists.sourceforge.net
Ok, I've got a strange one here, and I'm mostly looking for a place to start troubleshooting so that I can narrow down this strange issue.
 
I have two ESX5 clusters.
 
The first cluster uses the iSCSI Software Adapter and has VMkernel Port Binding configured as described in the VMware documentation. For what it's worth, two vSwitches each with 1 vmnic is bound to the Software Adapter.
 
These ESX servers work fantastic with IET 1.4.20.2. I can easily saturate my gigabit interfaces with this configuration.
 
The second ESX cluster is where I'm having problems.
 
In this cluster, the ESX servers each have three dual port Broadcom NetXtreme II BCM5709 NIC's. These NICs are seen by ESX server as iSCSI HBA's. The VMkernel Port Binding is similar to the configuration above, except I have six iSCSI HBA's. Six vSwitches each with 1 vmnic bound.
 
When I connect these ESX servers to IET, I can mount the LUN fine. But, if I try and SVMotion a virtual machine to an IET LUN, the ESX server basically hangs, and eventually the SVMotion fails. Sometimes the ESX is completely non-responsive, and I have to perform a hard reset to bring it back up.
 
The vmkernel logs explode with all kinds of errors, including:
 
LinScsi: SCSILinuxAbortCommands:1798:Failed, Driver bnx2i, for vmhba33
bnx2i:: bnx2i_conn_stop::vmnic4
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe
LONG VMFS3 rsv time on 'xxxxx' held for 3527 msecs
WARNING ScsiDeviceIO: 3068: Failing command 0x28
 
And on and on. I've tried all kinds of combinations of things to try and troubleshoot, but I've run out of ideas. The only thing I'm thinking at this point is a possible issue with IET and the Broadcom iSCSI HBA or possibly the Boradcom driver (bnx2i).
 
Any suggestions would be welcomed.

Yucong Sun (叶雨飞)

unread,
Feb 13, 2012, 5:59:47 PM2/13/12
to caleb.an...@gmail.com, iscsitar...@lists.sourceforge.net
I am not sure what to gain to use on nic HBA, maybe you can simply
disable them entirely in the boot rom, I hardly see any loss beside
some cpu cycles.

> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Iscsitarget-devel mailing list
> Iscsitar...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
>

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Iscsitarget-devel mailing list
Iscsitar...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel

Ross S. W. Walker

unread,
Feb 13, 2012, 6:03:07 PM2/13/12
to caleb.an...@gmail.com, iscsitar...@lists.sourceforge.net
Caleb Anthony [mailto:caleb.an...@gmail.com] wrote:
>
> In this cluster, the ESX servers each have three dual port
> Broadcom NetXtreme II BCM5709 NIC's. These NICs are seen by
> ESX server as iSCSI HBA's. The VMkernel Port Binding is
> similar to the configuration above, except I have six iSCSI
> HBA's. Six vSwitches each with 1 vmnic bound.
>
> When I connect these ESX servers to IET, I can mount the LUN
> fine. But, if I try and SVMotion a virtual machine to an IET
> LUN, the ESX server basically hangs, and eventually the
> SVMotion fails. Sometimes the ESX is completely
> non-responsive, and I have to perform a hard reset to bring
> it back up.
>
> The vmkernel logs explode with all kinds of errors, including:
>
> LinScsi: SCSILinuxAbortCommands:1798:Failed, Driver bnx2i, for vmhba33
> bnx2i:: bnx2i_conn_stop::vmnic4
> WARNING: NMP: nmp_DeviceRequestFastDeviceProbe
> LONG VMFS3 rsv time on 'xxxxx' held for 3527 msecs
> WARNING ScsiDeviceIO: 3068: Failing command 0x28
>
> And on and on. I've tried all kinds of combinations of things
> to try and troubleshoot, but I've run out of ideas. The only
> thing I'm thinking at this point is a possible issue with IET
> and the Broadcom iSCSI HBA or possibly the Boradcom driver (bnx2i).
>
> Any suggestions would be welcomed.

Any errors in the logs on IET box?

Is the BNX firmware up to date?

Are the VMware drivers up to date?

We might ask for a 'tcpdump' from the IET box depending on the
above answers. We'll let you know what options to use.

-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Caleb Anthony

unread,
Feb 16, 2012, 2:56:38 PM2/16/12
to iscsitar...@lists.sourceforge.net
I received an off list message from someone reminding me that I should have no more vSwitches than I have subnets, which I did. I read the documentation, and my configuration was wrong in that aspect. So I have fixed that, but the issue is still there.
 
There are no error messages on the IET server other than the VAAI unsupported things (Unsupported 93).

The ESX servers have been patched to the latest patch level.
 
I looked all over the Broadcaom site and couldn't find any firmware for my NIC's.
 
However, I believe I have figured out the problem. I use the Linux kernel ethernet bonding, and there seems to be a very low level communication issue between my bonded interface, and the ESX storage vSwitch. I think it's some type of ARP problem. If it just try and ping the ESX storage interface from the IET server, I get about 50% packet loss. Sometimes more, sometimes less. The same is true if I use vmkping on the ESX server to the IET server - 50% packet loss or so.
 
I currently run mode 6 on my bonding interface (adaptive load balancing) and it seems the MAC trickery involved in this mode isn't working with my ESX servers with the Broadcom NICs. However, like mentioned earlier, this configuration works with Intel NICs on my other ESX server and the software iSCSI adapter.
 
So I'm going to fool around with my bonding mode and see if that makes a difference.

Ross S. W. Walker

unread,
Feb 16, 2012, 3:34:23 PM2/16/12
to caleb.an...@gmail.com, iscsitar...@lists.sourceforge.net
Ah, I bet the Broadcom iSCSI adapters have a problem with the constantly changing MAC address.

Can the Broadcoms support multiple sessions over multiple interfaces? How about multiple connections in a single session going over multiple interfaces?

Don't know how many ports are on a card, hopefully two.

-Ross
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Iscsitarget-devel mailing list
Iscsitar...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel

Emmanuel Florac

unread,
Feb 16, 2012, 3:37:18 PM2/16/12
to caleb.an...@gmail.com, iscsitar...@lists.sourceforge.net
Le Thu, 16 Feb 2012 12:56:38 -0700 vous écriviez:

> I currently run mode 6 on my bonding interface (adaptive load
> balancing) and it seems the MAC trickery involved in this mode isn't
> working with my ESX servers with the Broadcom NICs. However, like
> mentioned earlier, this configuration works with Intel NICs on my
> other ESX server and the software iSCSI adapter.
>
> So I'm going to fool around with my bonding mode and see if that
> makes a difference.

balance-alb is often extremely tricky. Start by augmenting miimon quite
a lot over 100. In case you're stuck, try balance-tlb, it usually works
much more easily.

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <efl...@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/

Reply all
Reply to author
Forward
0 new messages