Check your network. Do some bandwidth tests from server A to server B? ib_write_bw and ib_read_bw for the IB side of things.
--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiWrXDaeWiebK3Us%3DCcELXKzdFwQR5DrFVQA6dZHbKFbEg%40mail.gmail.com.
[root@botticelli etc]# ib_write_bw a001
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx4_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x41 QPN 0x0dc4 PSN 0x3677c4 RKey 0x04334c VAddr 0x007fe51312d000
remote address: LID 0x08 QPN 0x0509 PSN 0xec3474 RKey 0xd8021a13 VAddr 0x007fb12cc36000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 5000 3166.14 108.72 0.001739
---------------------------------------------------------------------------------------
Doing a write in the other direction is fine:
[root@a001 ~]# ib_write_bw botticelli
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx4_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x08 QPN 0x0507 PSN 0x28e3de RKey 0xc8021a13 VAddr 0x007fc8516c9000
remote address: LID 0x41 QPN 0x0dbf PSN 0x2b1b1e RKey 0xd000254b VAddr 0x007ff7fbc25000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 5000 3631.09 3630.93 0.058095
---------------------------------------------------------------------------------------
[13898.294861] i40iw_make_cm_node: cm_node arpindex
[13904.351018] i40iw_parse_mpa: unsupported mpa rev = 15
[13904.351519] node destroyed before established
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/SYAPR01MB2240DFB46FC09A191CE04986F02B9%40SYAPR01MB2240.ausprd01.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiV%2BJZdxzSjBWFhJn69k81_E2dy7NGoRWy-Wy%3DLiE_UwCw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAJAVdOs-ZsMM%2BhF0rMVwyE8p0PBFmbmWCPLpGze-8Ny-g22t4Q%40mail.gmail.com.
Assuming nothing else has changed…
Reseat cable in the ports at both server and switch end. Try swapping in a different cable/HCA if that fails to help.
You haven’t said what generation of Infiniband you have, or if you are using Mellanox ofed or the distribution ofed.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAN5hRiV%2BJZdxzSjBWFhJn69k81_E2dy7NGoRWy-Wy%3DLiE_UwCw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/SYAPR01MB2240CB7849D8679507C42BB1F02A9%40SYAPR01MB2240.ausprd01.prod.outlook.com.