ibv_reg_mr returns -EOPNOTSUPP for mapped files

195 views
Skip to first unread message

Ana Khorguani

unread,
Jun 14, 2022, 7:03:15 AM6/14/22
to pmem
Hello,

I am trying to test RDMA with Intel Optane Persistent Memory. The machines I use are connected to Omni-Path Edge Switch 100 Series. The linux kernel version I use is 5.10.0-14-amd64. 

For benchmarking, I use linux-rdma/perftest. I want to register a persistent file for RDMA buffers. To avoid the page cache on DRAM, I want to leverage DAX capabilities.

For this purpose, I configured persistent memory in fsdax mode, created ext4  filesystem, which supports DAX, and mounted it with "-o dax"  flag. E.g.:  mount -o dax /dev/pmem0  /mnt

However, when I use mmaped file from this filesystem, ibv_reg_mr() returns error: "Operation not supported" error (error code is 95).
On the other hand, if I mount filesystem without "-o dax", without DAX, then I do not encounter this issue. 

I wondered if someone has encountered a similar issue and could help me identify from where it is coming. Could it be because I use Omni-Path instead of Infiniband? Or is there something specific with "-o dax" that is not compatible with ibv_reg_mr? 

Thank you in advance, 
Ana

Tomasz Gromadzki

unread,
Jun 14, 2022, 9:09:25 AM6/14/22
to pm...@googlegroups.com

Hello Ana,
PMem configured in fsdax mode is only supported by selective RNIC vendors that support on-demand paging.
You have to convert your PMme to dev-dax mode if your RNIC does not support ODP.

Please see the librpma library implementation of memory registration as a reference.

Regards
Tomasz

--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Ana Khorguani

unread,
Jun 14, 2022, 10:22:39 AM6/14/22
to pmem
Hello Tomasz,

Thank you very much, this is a very useful information. I will take a closer look to the code as well.

The reason I wanted to use fsdax mode is the inconsistency I encountered when experimenting with dev-dax mode.

With dev-dax mode, first I used numactl --membind, and then memkind  library with Intel Optane Persistent Memory for the RDMA testing benchmark. I have not used rpma specifically yet.

However, the issue was that I only occasionally saw the read bandwidth, and I newer saw any write bandwidth. I used perf for measurements: perf stat -e  unc_m_pmm_bandwidth.read,unc_m_pmm_bandwidth.write

Would you have any advice why was it possible that RDMA managed to "ignore" NVM in dev-dax mode, in the ways I tried using it? Is there something specific with rpma that would guarantee that I would observe the write bandwidth?


Thank you in advance,
Ana

gro...@o2.pl

unread,
Jun 28, 2022, 5:43:38 AM6/28/22
to pmem
Hello Ana,

it is possible that your write operation ends up in the CPU cache due to the DDIO mechanism.

To get real RDMA performance numbers with Intel Optane Persistent Memory you should use a dedicated fio engine and also configure the target platform to disable the DDIO.

You can learn more from our librpma documentation:
Example performance reports are also available there.

Regards
Tomasz
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Tom Grom

unread,
Jul 6, 2022, 2:26:24 PM7/6/22
to pmem

Hello Ana,

 

Linux-rdma/perftest uses RDMA Write only. RDMA Write ends when PCIe Write is initiated (but not finished).

The tool has been designed to test only RDMA performance and is not aware of memory persistency.

 

In example you mentioned https://github.com/pmem/rpma/tree/master/examples/04-write-to-persistent  RDMA Read is used to flush data from RNIC and also push data out from PCIe bus based on PCIe ordering/fencing rules.

 

Best regards

Tomasz

 

From: pm...@googlegroups.com <pm...@googlegroups.com> On Behalf Of Ana Khorguani
Sent: Tuesday, July 5, 2022 10:56 AM
To: pmem <pm...@googlegroups.com>
Subject: Re: ibv_reg_mr returns -EOPNOTSUPP for mapped files

 

Hello Tomasz,

 

About DDIO, I used https://github.com/aliireza/ddio-bench benchmark do disable it during my experiments. 

 

Thank you for the pointers. I will look into fio engine.

 

I also tried using the example from rpma, here: https://github.com/pmem/rpma/tree/master/examples/04-write-to-persistent  This seems to work better, since I see the read and write bandwidth to pmem. Now I am trying to understand what is different with this benchmark compared to linux-rdma/perftest.

 

Best regards,

Ana

--
You received this message because you are subscribed to a topic in the Google Groups "pmem" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pmem/Qpq2PmETsKs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/60cba125-ba7a-4cf1-9f3d-409eb2759f73n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages