Sudden and inexplicably bad performance

Filipe Maia

unread,

May 19, 2021, 5:27:08 PM5/19/21

to fhgfs...@googlegroups.com

Hi,

I have a puzzling performance problem I can't quite figure out.

I have a filesystem with two storage servers. The servers have a few RAID6 arrays each, one array per storage target and are connected with some old Infiniband.

A few days ago the main server (A) froze and had to be rebooted. Since a few days ago the performance has dropped drastically.

When I read a file which is hosted on server A, and mounted in A, I get regular performance (a couple of hundred megabytes a second).

When I read the same file but mounted in server B or in a client I get about 10 MB/s.

I don't have file hosted exclusively in B so I didn't yet manage to try to reserve example.

Server A is also the management and main metadata server.

The load on both servers is pretty much non-existent. When trying to read and write directly to the storage target filesystem I can easily get hundreds of megabytes/s in all of them.

I tried to turn on logging to 5 but I didn't find anything obviously wrong.

I'm a bit at a loss at what could be the problem. Does anyone have any suggestions of things to try?

Cheers,

Filipe

Filipe Maia

unread,

May 19, 2021, 5:48:05 PM5/19/21

to fhgfs...@googlegroups.com

Just a further clarification, beegfs-ctl --storagebench shows excellent performance on both reads and writes with a min throughput above 500MB/s.

Lehmann, Greg (IM&T, Pullenvale)

unread,

May 19, 2021, 6:15:28 PM5/19/21

to fhgfs...@googlegroups.com

Check your network. Do some bandwidth tests from server A to server B? ib_write_bw and ib_read_bw for the IB side of things.

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiWrXDaeWiebK3Us%3DCcELXKzdFwQR5DrFVQA6dZHbKFbEg%40mail.gmail.com.

Filipe Maia

unread,

May 20, 2021, 9:02:06 AM5/20/21

to fhgfs...@googlegroups.com

Thanks Greg, that was quite useful.

The ib_write_bw out of server A (named botticelli) is very low and inconsistent. In this example a001 is just a client.

[root@botticelli etc]# ib_write_bw a001
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx4_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x41 QPN 0x0dc4 PSN 0x3677c4 RKey 0x04334c VAddr 0x007fe51312d000
 remote address: LID 0x08 QPN 0x0509 PSN 0xec3474 RKey 0xd8021a13 VAddr 0x007fb12cc36000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      5000             3166.14            108.72 		   0.001739
---------------------------------------------------------------------------------------

Doing a write in the other direction is fine:

[root@a001 ~]# ib_write_bw botticelli
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx4_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x08 QPN 0x0507 PSN 0x28e3de RKey 0xc8021a13 VAddr 0x007fc8516c9000
 remote address: LID 0x41 QPN 0x0dbf PSN 0x2b1b1e RKey 0xd000254b VAddr 0x007ff7fbc25000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      5000             3631.09            3630.93		   0.058095
---------------------------------------------------------------------------------------

The opposite is the case for reads (ib_read_bw botticelli run in a001 is slow)

To confuse things more using netperf to measure IPoIB bandwidth returns normal results:

[root@botticelli etc]# netperf a001-ib
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 29389.48

I also the following messages in dmesg of botticelli:

[13898.294861] i40iw_make_cm_node: cm_node arpindex
[13904.351018] i40iw_parse_mpa: unsupported mpa rev = 15
[13904.351519] node destroyed before established

Could they be related with the problem?

What would be the next thing to try? Many thanks for any tips!

Cheers,

Filipe

To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/SYAPR01MB2240DFB46FC09A191CE04986F02B9%40SYAPR01MB2240.ausprd01.prod.outlook.com.

Andreas Skau

unread,

May 20, 2021, 10:03:26 AM5/20/21

to fhgfs...@googlegroups.com

Check the output of beegfs-net on the client to see if you're actually on RDMA and haven't fallen back to ethernet.

To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiV%2BJZdxzSjBWFhJn69k81_E2dy7NGoRWy-Wy%3DLiE_UwCw%40mail.gmail.com.

Filipe Maia

unread,

May 20, 2021, 10:36:56 AM5/20/21

to fhgfs...@googlegroups.com

Thanks for the tip. That does not seem to be the problem:

# beegfs-net

mgmt_nodes
=============
botticelli [ID: 1]
Connections: TCP: 1 (192.168.177.7:8008 [fallback route]);

meta_nodes
=============
botticelli [ID: 1]
Connections: RDMA: 1 (192.168.176.7:8005 [fallback route]);
carracci [ID: 2]
Connections: RDMA: 1 (192.168.176.5:8005);

storage_nodes
=============
botticelli [ID: 1]
Connections: RDMA: 3 (192.168.176.7:8003 [fallback route]);
carracci [ID: 2]
Connections: RDMA: 2 (192.168.176.5:8003);

To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAJAVdOs-ZsMM%2BhF0rMVwyE8p0PBFmbmWCPLpGze-8Ny-g22t4Q%40mail.gmail.com.

Lehmann, Greg (IM&T, Pullenvale)

unread,

May 20, 2021, 4:56:22 PM5/20/21

to fhgfs...@googlegroups.com

Assuming nothing else has changed…

Reseat cable in the ports at both server and switch end. Try swapping in a different cable/HCA if that fails to help.

You haven’t said what generation of Infiniband you have, or if you are using Mellanox ofed or the distribution ofed.

To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiV%2BJZdxzSjBWFhJn69k81_E2dy7NGoRWy-Wy%3DLiE_UwCw%40mail.gmail.com.

Filipe Maia

unread,

May 20, 2021, 5:13:23 PM5/20/21

to fhgfs...@googlegroups.com

I'm using QDR infiniband (4x) with mlx4 cards and the distribution ofed.

Now i'm not so convinced the infiniband is the problem.

While the test with ib_write_bw are strange, when I test with qperf everything looks fine.

I also noticed the low performance seems to only happen to file which are stored in two storage targets of botticelli that are low on free space (in the Cap. Pool column of beegfs-df). Files which are distributed over "normal" storage targets, including one in botticelli, seem to behave normally.

I also found a probably unrelated issue.

When I run beegfs-ctl --getentryinfo on a client I get the following seg fault:

# beegfs-ctl --getentryinfo 244968
Entry type: file
EntryID: 29-60840599-1
Metadata node: (0) 23:06:05 Main [PThread.cpp:99] >> Received a SIGSEGV. Trying to shut down...
(1) 23:06:05 Main [PThread::signalHandler] >> Backtrace:
1: beegfs-ctl(_ZN7PThread13signalHandlerEi+0x47) [0x5a8007]
2: /lib64/libc.so.6(+0x36400) [0x7fcf6cc31400]
3: /lib64/libstdc++.so.6(_ZNSsC1ERKSs+0x18) [0x7fcf6d7bbf78]
4: beegfs-ctl(_ZNK4Node14getTypedNodeIDEv+0x1a) [0x56297a]
5: beegfs-ctl(_ZN16ModeGetEntryInfo7executeEv+0xbaf) [0x47a9df]
6: beegfs-ctl(_ZN3App11executeModeEPK12RunModesElem+0x23) [0x450c23]
7: beegfs-ctl(_ZN3App9runNormalEv+0x67) [0x455a97]
8: beegfs-ctl(_ZN3App3runEv+0x57) [0x455e47]
9: beegfs-ctl(_ZN7PThread9runStaticEPv+0xfe) [0x45045e]
10: beegfs-ctl(_ZN7Program4mainEiPPc+0x49) [0x44e759]
11: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fcf6cc1d555]
12: beegfs-ctl() [0x44fd55]

Unrecoverable error: Segmentation fault

(0) 23:06:05 Main [App] >> Segmentation fault

Running it on a server (with the filesystem mounted) works fine. This is with beegfs-7.2-el7

To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/SYAPR01MB2240CB7849D8679507C42BB1F02A9%40SYAPR01MB2240.ausprd01.prod.outlook.com.

Filipe Maia

unread,

May 23, 2021, 4:26:48 PM5/23/21

to fhgfs...@googlegroups.com

In the end, I found some issues with one Infiniband cable (which was triggering SymbolErrorCounter).

Changing it improved the performance tremendously, although I'm still getting PortXmitWait errors so I have to do some more work on the infiniband.