Sudden and inexplicably bad performance

189 views
Skip to first unread message

Filipe Maia

unread,
May 19, 2021, 5:27:08 PM5/19/21
to fhgfs...@googlegroups.com
Hi,

I have a puzzling performance problem I can't quite figure out.
I have a filesystem with two storage servers. The servers have a few RAID6 arrays each, one array per storage target and are connected with some old Infiniband.

A few days ago the main server (A) froze and had to be rebooted. Since a few days ago the performance has dropped drastically.
When I read a file which is hosted on server A, and mounted in A, I get regular performance (a couple of hundred megabytes a second).
When I read the same file but mounted in server B or in a client I get about 10 MB/s.
I don't have file hosted exclusively in B so I didn't yet manage to try to reserve example.

Server A is also the management and main metadata server.
The load on both servers is pretty much non-existent. When trying to read and write directly to the storage target filesystem I can easily get hundreds of megabytes/s in all of them.

I tried to turn on logging to 5 but I didn't find anything obviously wrong. 

I'm a bit at a loss at what could be the problem. Does anyone have any suggestions of things to try?

Cheers,
Filipe

Filipe Maia

unread,
May 19, 2021, 5:48:05 PM5/19/21
to fhgfs...@googlegroups.com
Just a further clarification, beegfs-ctl --storagebench shows excellent performance on both reads and writes with a min throughput above 500MB/s.

Lehmann, Greg (IM&T, Pullenvale)

unread,
May 19, 2021, 6:15:28 PM5/19/21
to fhgfs...@googlegroups.com

Check your network. Do some bandwidth tests from server A to server B? ib_write_bw and ib_read_bw for the IB side of things.

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiWrXDaeWiebK3Us%3DCcELXKzdFwQR5DrFVQA6dZHbKFbEg%40mail.gmail.com.

Filipe Maia

unread,
May 20, 2021, 9:02:06 AM5/20/21
to fhgfs...@googlegroups.com
Thanks Greg, that was quite useful.

The ib_write_bw out of server A (named botticelli) is very low and inconsistent. In this example a001 is just a client.
[root@botticelli etc]# ib_write_bw a001
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF Device         : mlx4_0
 Number of qps   : 1 Transport type : IB
 Connection type : RC Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x41 QPN 0x0dc4 PSN 0x3677c4 RKey 0x04334c VAddr 0x007fe51312d000
 remote address: LID 0x08 QPN 0x0509 PSN 0xec3474 RKey 0xd8021a13 VAddr 0x007fb12cc36000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      5000             3166.14            108.72   0.001739
---------------------------------------------------------------------------------------
Doing a write in the other direction is fine:
[root@a001 ~]# ib_write_bw botticelli
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF Device         : mlx4_0
 Number of qps   : 1 Transport type : IB
 Connection type : RC Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x08 QPN 0x0507 PSN 0x28e3de RKey 0xc8021a13 VAddr 0x007fc8516c9000
 remote address: LID 0x41 QPN 0x0dbf PSN 0x2b1b1e RKey 0xd000254b VAddr 0x007ff7fbc25000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      5000             3631.09            3630.93   0.058095
---------------------------------------------------------------------------------------

The opposite is the case for reads (ib_read_bw botticelli run in a001 is slow)

To confuse things more using netperf to measure IPoIB bandwidth returns normal results:
[root@botticelli etc]# netperf a001-ib
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    29389.48


I also the following messages in dmesg of botticelli:
[13898.294861] i40iw_make_cm_node: cm_node arpindex
[13904.351018] i40iw_parse_mpa: unsupported mpa rev = 15
[13904.351519] node destroyed before established
Could they be related with the problem?

What would be the next thing to try? Many thanks for any tips!

Cheers,
Filipe

Andreas Skau

unread,
May 20, 2021, 10:03:26 AM5/20/21
to fhgfs...@googlegroups.com
Check the output of beegfs-net on the client to see if you're actually on RDMA and haven't fallen back to ethernet.

Filipe Maia

unread,
May 20, 2021, 10:36:56 AM5/20/21
to fhgfs...@googlegroups.com
Thanks for the tip. That does not seem to be the problem:

# beegfs-net
mgmt_nodes
=============
botticelli [ID: 1]
   Connections: TCP: 1 (192.168.177.7:8008 [fallback route]);

meta_nodes
=============
botticelli [ID: 1]
   Connections: RDMA: 1 (192.168.176.7:8005 [fallback route]);
carracci [ID: 2]
   Connections: RDMA: 1 (192.168.176.5:8005);

storage_nodes
=============
botticelli [ID: 1]
   Connections: RDMA: 3 (192.168.176.7:8003 [fallback route]);
carracci [ID: 2]
   Connections: RDMA: 2 (192.168.176.5:8003);


Lehmann, Greg (IM&T, Pullenvale)

unread,
May 20, 2021, 4:56:22 PM5/20/21
to fhgfs...@googlegroups.com

Assuming nothing else has changed…

 

Reseat cable in the ports at both server and switch end. Try swapping in a different cable/HCA if that fails to help.

 

You haven’t said what generation of Infiniband you have, or if you are using Mellanox ofed or the distribution ofed.

Filipe Maia

unread,
May 20, 2021, 5:13:23 PM5/20/21
to fhgfs...@googlegroups.com
I'm using QDR infiniband (4x) with mlx4 cards and the distribution ofed.
Now i'm not so convinced the infiniband is the problem.
While the test with ib_write_bw are strange, when I test with qperf everything looks fine.

I also noticed the low performance seems to only happen to file which are stored in two storage targets of botticelli that are low on free space (in the Cap. Pool column of beegfs-df). Files which are distributed over "normal" storage targets, including one in botticelli, seem to behave normally.

I also found a probably unrelated issue.
When I run beegfs-ctl --getentryinfo on a client I get the following seg fault:
# beegfs-ctl --getentryinfo 244968
Entry type: file
EntryID: 29-60840599-1
Metadata node: (0) 23:06:05 Main [PThread.cpp:99] >> Received a SIGSEGV. Trying to shut down...
(1) 23:06:05 Main [PThread::signalHandler] >> Backtrace:
1: beegfs-ctl(_ZN7PThread13signalHandlerEi+0x47) [0x5a8007]
2: /lib64/libc.so.6(+0x36400) [0x7fcf6cc31400]
3: /lib64/libstdc++.so.6(_ZNSsC1ERKSs+0x18) [0x7fcf6d7bbf78]
4: beegfs-ctl(_ZNK4Node14getTypedNodeIDEv+0x1a) [0x56297a]
5: beegfs-ctl(_ZN16ModeGetEntryInfo7executeEv+0xbaf) [0x47a9df]
6: beegfs-ctl(_ZN3App11executeModeEPK12RunModesElem+0x23) [0x450c23]
7: beegfs-ctl(_ZN3App9runNormalEv+0x67) [0x455a97]
8: beegfs-ctl(_ZN3App3runEv+0x57) [0x455e47]
9: beegfs-ctl(_ZN7PThread9runStaticEPv+0xfe) [0x45045e]
10: beegfs-ctl(_ZN7Program4mainEiPPc+0x49) [0x44e759]
11: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fcf6cc1d555]
12: beegfs-ctl() [0x44fd55]

Unrecoverable error: Segmentation fault

(0) 23:06:05 Main [App] >> Segmentation fault


Running it on a server (with the filesystem mounted) works fine. This is with beegfs-7.2-el7

Filipe Maia

unread,
May 23, 2021, 4:26:48 PM5/23/21
to fhgfs...@googlegroups.com
In the end, I found some issues with one Infiniband cable (which was triggering SymbolErrorCounter).
Changing it improved the performance tremendously, although I'm still getting PortXmitWait errors so I have to do some more work on the infiniband.

Cheers,
Filipe
Reply all
Reply to author
Forward
0 new messages