Hello,
I have two servers (let's call it A (staging) and B (production)) with identical setups:
- Ubuntu 14.04
- LTS kernel 3.19.0-28-generic
- Mellanox card X3 EN (MCX312A-XCBT)
- three VLANs (302, 388, 389) on one of the ports with assigned IP addresses
On both servers was performed installation of OFED drivers (version 3.0-2.0.1) with passed option "--vma-eth".
This OFED package contains libvma of version 7.0.4. Mellanox firmware was updated to the version 2.34.5000.
On both servers installation was successful.
On server A VMA works as expected, but on server B there is errors and VMA panic:
VMA ERROR : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA ERROR : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA ERROR : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA PANIC : ring[0x163b510]:170:create_resources() ibv_create_comp_channel for tx failed. m_p_tx_comp_event_channel = (nil) (errno=9 Bad file descriptor)
This error was seen earlier when we used alias on VLAN interface (see
https://github.com/Mellanox/libvma/issues/23). And according to issue this problem was fixed in libvma version 7.0.0. That issue was possible to reproduce on A server. In any case there is no more alias on this interface anymore.
Before upgrade versions was:
- kernel 3.13.0-49
- OFED package 2.4-1.0.4
- Mellanox firmware 2.33.5000
- libvma 6.8.3
Then I performed downgrade of libvma on server B to the version 6.9.1. After this issue has gone.
So I'm wonder what could cause different behavior with the same versions on this two servers? What could be tried to reproduce this issue on server A too?
If it possible to reproduce this issue then I'll be able to try newer versions of libvma and check if it help, and/or and send any needed information for further investigation if not.