Possible source of issues with VMA on VLAN interface?

90 views
Skip to first unread message

alexander....@gmail.com

unread,
Sep 13, 2015, 12:47:02 PM9/13/15
to libvma-dev
Hello,

I have two servers (let's call it A (staging) and B (production)) with identical setups:
- Ubuntu 14.04
- LTS kernel 3.19.0-28-generic
- Mellanox card X3 EN (MCX312A-XCBT)
- three VLANs (302, 388, 389) on one of the ports with assigned IP addresses

On both servers was performed installation of OFED drivers (version 3.0-2.0.1) with passed option "--vma-eth".
This OFED package contains libvma of version 7.0.4. Mellanox firmware was updated to the version 2.34.5000.

On both servers installation was successful.

On server A VMA works as expected, but on server B there is errors and VMA panic:
VMA ERROR  : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA ERROR  : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA ERROR  : utils:231:priv_read_file() ERROR while opening file /sys/class/net/eth5.302/device/resource
VMA PANIC  : ring[0x163b510]:170:create_resources() ibv_create_comp_channel for tx failed. m_p_tx_comp_event_channel = (nil) (errno=9 Bad file descriptor)

This error was seen earlier when we used alias on VLAN interface (see https://github.com/Mellanox/libvma/issues/23). And according to issue this problem was fixed in libvma version 7.0.0. That issue was possible to reproduce on A server. In any case there is no more alias on this interface anymore.

Before upgrade versions was:
- kernel 3.13.0-49
- OFED package 2.4-1.0.4
- Mellanox firmware 2.33.5000
- libvma 6.8.3

Then I performed downgrade of libvma on server B to the version 6.9.1. After this issue has gone.

So I'm wonder what could cause different behavior with the same versions on this two servers? What could be tried to reproduce this issue on server A too?
If it possible to reproduce this issue then I'll be able to try newer versions of libvma and check if it help, and/or and send any needed information for further investigation if not.

Alex Rosenbaum

unread,
Sep 16, 2015, 8:25:03 AM9/16/15
to libvma-dev

As far as binary worlds' go, I don't understand how the two identical setups can have different results. But of course this happens to us all.

We’ll need more details in order to investigate this and find the differences.

This should lead us to the issues you see with VLAN and/or aliasing interfaces.


You can try the latest VMA 7.0.7 (https://github.com/Mellanox/libvma/releases/tag/7.0.7)

Download or clone and build from source (follow: https://github.com/Mellanox/libvma/wiki/Build).


Also please provide us VMA logs (VMA_TRACELEVEL=4) of application or test code. You can:

1. post it here (in this thread)

2. open a https://github.com/Mellanox/libvma/issues ticket

3.  opening a "support at mellanox.com" ticket


thanks,

Alex

alexander....@gmail.com

unread,
Sep 17, 2015, 2:49:30 AM9/17/15
to libvma-dev
Alex, thanks. I'll try newer version during next update.

My initial goal in reproducing this behavior on test machine. So I was wonder if I'm probably something missing in configuration differences. Like maybe there is something except kernel/firmware/ofed versions that could affect VMA behavior. Or, while production system had several upgrades before this but test system has fresh system install right before upgrade, there is some rudiments of configs/blobs that could cause this. Like you know some binary file wasn't rewritten during some of the upgrades.

alexander....@gmail.com

unread,
Feb 20, 2016, 1:35:59 PM2/20/16
to libvma-dev
Hi,

It was my misconfiguration.

There was accidentally configured the same MAC address on other
interface on production machine.

So it was two Mellanox ports with exactly the same MAC addresses.
While second interface was not in use, it works fine without VMA, but
caused VMA to stop working.

And sure it was not exists on test machine. After configuring same MAC
addresses on both interfaces on test machine "issue" has been
replicated.

Thanks.

Alex Rosenbaum

unread,
Feb 23, 2016, 6:14:50 PM2/23/16
to libvma-dev
I'm glad you overcame the issue for testing VMA.
But this still sounds like a bug in VMA.
If your application works in OS we expect it to work with VMA just as well.

and include all information here, as well as OS, and anything else you think might help like output of 'route' and 'ip addr show'.

thanks
Alex

alexander....@gmail.com

unread,
Feb 26, 2016, 9:09:17 AM2/26/16
to libvma-dev
Hi,

I tried to reproduce this issue, but it worked anyway. But system configuration is slightly different now - new firmware and new system packages.
While I'll reinstall system, I'll check this scenario with current OFED packages.

alexander....@gmail.com

unread,
Mar 14, 2016, 4:35:12 AM3/14/16
to libvma-dev
Hi,

Issue does not reproduce.
Reply all
Reply to author
Forward
0 new messages