Hi,
I'd like to setup NVMe-oF over RDMA on d750 nodes. However, when trying to discover the target node, I encountered the error below:
root@node-2:~# nvme discover -t rdma -a 10.10.1.2 -s 4420
Failed to write to /dev/nvme-fabrics: Connection reset by peerroot@node-2:~# dmesg
[ 2745.805437] nvme nvme1: Connect rejected: status 8 (invalid service ID).
[ 2745.812142] nvme nvme1: rdma connection establishment failed (-104)
I configured the target successfully, which is node-1, by following this
tutorial. Before that, I installed Broadcom driver from
here.
Also, I validated RoCE network as this
page. It turned out that the `ib_write_bw` test could pass with the public IP instead of the private IP (10.10.1.2). When using the private IP, I got:
root@node-2:~# ib_write_bw -d bnxt_re0 -F --report_gbits -p 1800 -s 1048576 -q 16 10.10.1.2
Couldn't connect to 10.10.1.2:1800
Unable to open file descriptor for socket connection Unable to init the socket connection
Does anyone have any thoughts about this issue? Any reply is appreciated!
Best,
Haoyu