[Rocks-Discuss] Rocks 5.1, OFED and Cisco infiniband HCA

9 views
Skip to first unread message

Hartlieb, George

unread,
Jan 5, 2009, 2:24:58 PM1/5/09
to npaci-rocks...@sdsc.edu
Hi,

I have a few questions about Cisco Infiniband cards (HCA).

I have Cisco Infiniband cards purchased from Dell installed on all my
compute nodes.

I installed Rocks 5.1 with the Mellanox OFED Roll.

The Mellanox OFED Roll installed OK on the "head node" but the Cisco card
does not have the PSID set so the Roll could not check the Firmware version.

Looking for firmware upgrades I found out that Cisco is not selling
Infiniband cards anymore and the existing cards are at "end of life".

Looking at the OFED site, OFED version 1.4 does not list Cisco cards as
supported!

As the Mellanox OFED Roll is based on OFED 1.3 will it work with Cisco
cards?

What is the way forward for these cards?

Can the Mellanox firmware be manually install on Cisco cards?


Below are outputs of these ib commands:
mstflint -d 0c:00.0 q
ibstat
mstvpd mthca0


Thanks in advance,

George

-----------------------------------------------------------------

mstflint -d 0c:00.0 q
Image type: Failsafe
FW Version: 1.2.917
I.S. Version: 1
Device ID: 25204
Chip Revision: A0
Description: Node Port1 Sys image
GUIDs: 0005ad00000bcb78 0005ad00000bcb79 0005ad000100d050
Board ID: r­
VSD: r­
PSID:

ibstat
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.2.917
Hardware version: a0
Node GUID: 0x0005ad00000bcb78
System image GUID: 0x0005ad000100d050
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 3
LMC: 0
SM lid: 2
Capability mask: 0x02510a68
Port GUID: 0x0005ad00000bcb79

mstvpd mthca0
ID: Cheetah DDR
PN: SFS-HCA-320-A1
EC: Rev: B0
SN: CS0710X00222
V0: PW=10W;PCIe x8
V2: 0710
V1: N/A
YA: N/A
RW:

---------------------------------------------------

George Hartlieb System Administrator/Engineer
Phone: 650-604-5690 George....@nasa.gov

ELORET Corporation
NASA/Ames Research Center
MS 234-1
Moffett Field, CA 94035-1000

Mike Heinz

unread,
Jan 5, 2009, 3:49:16 PM1/5/09
to Discussion of Rocks Clusters
George,

Read this email carefully - I might be able to help you out, but there are also some risks.

First, the statement, "The Mellanox OFED Roll installed OK on the "head node" but the Cisco card does not have the PSID set so the Roll could not check the Firmware version." does not make sense to me. The PSID is not part of the firmware version number, they are independent. In addition, the only way for an HCA to have no PSID at all is for the firmware to have been erased.

HOWEVER....

Certain Cisco HCAs were manufactured by Mellanox but had custom firmware provided by a different company. That firmware was in a slightly different format than stock Mellanox HCAs. Because of this mstflint cannot read the firmware version and will report something like this:

[root@panic ~]$ mstflint -d 04:00.0 q
Image type: Failsafe
I.S. Version: 1
Chip Revision: A0
Description: Node Port1 Port2 Sys image
GUIDs: 00066a009800707f 00066a00a000707f 00066a01a000707f 00066a009800707f
Board ID: j (SS_0000000006)
VSD: j
PSID: SS_0000000006

You can get some information about such HCAs with ibv_devinfo:

[root@panic ~]$ ibv_devinfo
hca_id: mthca0
fw_ver: 5.3.0
node_guid: 0006:6a00:9800:707f
sys_image_guid: 0006:6a00:9800:707f
vendor_id: 0x066a
vendor_part_id: 25218
hw_ver: 0x20
board_id: SS_0000000006
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 3
port_lid: 3
port_lmc: 0x00

port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid: 0
port_lmc: 0x00


You can also get some low-level information using the mstvpd command:

[root@panic ~]$ mstvpd 04:00.0
ID: Lion cub DDR
PN: MHGA28-1TC
EC: A2
SN: MT0620X00351
V0: PCIe x8

V1: N/A
YA: N/A
RW:

Now, what does all this mean to you? Well, if your HCAs fall into this category, the good news is that despite the custom firmware they really are standard Mellanox HCAs and you can put standard Mellanox firmware on them. However, you're going to have to figure out, by hand, exactly what firmware needs to be put on each HCA - and some HCA models have several different firmwares depending on exactly which board revision you have.

If you want to pursue this, send me an email directly and I'll give you some pointers.

--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

Reply all
Reply to author
Forward
0 new messages