PM 200 series over RDMA

84 views
Skip to first unread message

Jian Gao

unread,
Jun 15, 2022, 11:28:12 AM6/15/22
to pmem
Hello,

I am currently benchmarking Intel Optane Persistent Memory 200 series over RDMA.

I want to fully utilize PM write bandwidth over RDMA. To achieve this goal I need to disable Intel Data-Direct I/O (DDIO) for the RDMA NIC, otherwise the LLC evictions can result in PM internal write amplification and thus very low (~30%) PM write bandwidth.

Previously, with Optane PM 100 series, I can simply disable DDIO with setpci. For PM 200 series, unfortunately, the datasheet for 3rd Gen Xeon processors is not publicly available. Our PM server BIOS also does not support disabling Intel VT, so I eventually got stuck.

I am only a graduate student and not a business partner of Intel (therefore I have no access to the confidential documents). Is it true that the only solution to this problem for me is to purchase another server that can disable DDIO in its BIOS?
Or are there some workarounds for this problem?

Thank you in advance,
Jian

gro...@o2.pl

unread,
Jun 20, 2022, 11:39:33 AM6/20/22
to pmem
Hello Jian,

unfortunately for Intel Optane Persistent Memory 200 series, the only way to disable DDIO is via BIOS settings.

What server model and what BIOS do you use?

Regards
Tomasz

Jian Gao

unread,
Jun 20, 2022, 12:25:52 PM6/20/22
to pmem
Hello Tomasz,

Thanks for the reply! Such an unfortunate piece of news...
We have several servers equipped with Optane PM 200 series. I'm currently using an Inspur's NF5280M6 server with an AMI BIOS. Inspur documents its BIOS quite well (in Chinese), but the document say nothing about DDIO, so maybe changing DDIO is impossible on that server. Nevertheless I think I can try again or just expect for future BIOS updates, or I just go check the documentation of other servers in our lab. 
Anyway, thank you for telling me that the BIOS is the only way to disable DDIO for Optane PM 200 series.

I have one more question. By "the only way to disable DDIO is via BIOS settings", do you mean that this is going to be the only way even in the future? Or perhaps some day Intel will make the 3rd Gen Xeon CPU's datasheet publicly available and we can adjust DDIO state just like 2nd Gen Xeon CPUs? If it is convenient for you, could you please tell me if you know anything about that?

Thanks again!

Regards,
Jian

P.S. If anyone receives this reply for more than once, I apologize for my mistakes in sending this reply. Sadly I am totally not familiar with Google groups.
My sincere apologies to Tomasz for I might already have sent you 3 copies of this reply...


In 2022/06/20 UTC+8 23:39:33<gro...@o2.pl> wrotes:

gro...@o2.pl

unread,
Jun 20, 2022, 12:36:24 PM6/20/22
to pmem
Hello Jian,

there is an ongoing effort to extend AMI BIOS with DDIO related knob. I do not know detailed schedule for this change.

There will not be any other option for the PM 200 series

Regards,
Tomasz

Jian Gao

unread,
Jun 20, 2022, 12:55:20 PM6/20/22
to pmem
Hello Tomasz,

Thank you very much for the information.
That really helps.

Regards,
Jian

Anton Gavriliuk

unread,
Jun 21, 2022, 2:41:27 AM6/21/22
to gro...@o2.pl, pmem
Hi

> Or are there some workarounds for this problem?


> I am currently benchmarking Intel Optane Persistent Memory 200 series over RDMA.

Could you please provide more details - number of DCPMMs per CPU, pmem mode (fsdax, devdax, raw, sector) ?, how do you map pmem to remote host ?


Anton


пн, 20 июн. 2022 г. в 18:39, gro...@o2.pl <gro...@o2.pl>:
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/9248634f-945b-4d6d-b743-939cc23405f8n%40googlegroups.com.

Jian Gao

unread,
Jun 21, 2022, 11:34:03 AM6/21/22
to pmem
Hello Anton,

Thanks for the reply.

Actually, we have already tried the method described in the provided link (https://github.com/aliireza/ddio-bench), but unfortunately it won't work. It changes DDIO state by modifying the PERFCTRLSTS_0 PCIe register of the RDMA NIC, which only works on 2nd Gen Xeon processors.

Changing LLC ways reserved for DDIO also contributes little to improving the performance. The underlying reason is that with DDIO enabled, RDMA writes will go into the LLC. LLC evicts at cacheline (64B) granularity, while PM has an internal granularity of 256B, so random evictions from LLC causes an internal write amplification of roughly 4x. Reserving more cache ways for DDIO does not make LLC evict "sequentially", so sadly this does not help. 

The benchmarking details are:
- Server: Intel Xeon Gold 6330 CPU, Mellanox CX-6 200Gbps single port RDMA NIC, 4x128GB DCPMMs installed
- Server-side PM region configured to devdax mode; first mmap /dev/dax0.0, then expose mmap-ed PM to client by ibv_reg_mr
- Client performs sequential RDMA writes (IBV_WR_RDMA_WRITE) to the server-side PM and measure write bandwidth for different write sizes
- Server monitors internal PM media I/O with this script: https://github.com/SJTU-IPADS/librdpma/blob/main/nvm/analysis.py
Results: we observe a maximum write bandwidth of ~3.5 GB/s at client side and serious write amplification at server side.

Regards,
Jian

gro...@o2.pl

unread,
Jun 21, 2022, 3:58:11 PM6/21/22
to pmem
Hello Jian,
please take a look at https://pmem.io/rpma/reports/ where you can find remote performance memory access performance reports for Optane Persistent Memory 100/200 series.

You can also take a look at ready to use RPMA benchmarking toolset at https://github.com/pmem/rpma/tree/master/tools/perf

The solution Anton mentioned is only valid for Optane 100 Series.

Regards
Tomasz

Jian Gao

unread,
Jun 21, 2022, 9:38:26 PM6/21/22
to pmem
Hello Tomasz,

That was really helpful. It seems that librpma comes together with many useful tools and documentations. Sadly I missed them previously.
I will go take a deeper look at librpma.

Regards,
Jian

Reply all
Reply to author
Forward
0 new messages