About global persistent flush for CXL-attached DRAM sharing

354 views
Skip to first unread message

sekwo...@gmail.com

unread,
Apr 4, 2023, 1:58:03 AM4/4/23
to pmem
Hi all,

I wanted to bring up a topic for discussion regarding the need of Global Persistent Flush (GPF) in CXL-attached DRAM memory sharing scenarios.

As far as I understand, GPF was initially proposed to provide the same functionality as ADR or eADR with NVDIMMs. It is used to ensure data persistency in CXL-PMEM settings in the event of host failure, such as power failure. However, I am wondering if GPF is also necessary to guarantee crash consistency for memory sharing when multiple hosts share the same CXL-attached device memory (DRAM).

Consider the example of an in-memory shared log on CXL-attached DRAM, which is shared across multiple hosts. If Host 1 writes data and commit records to the shared log (at different cache lines) in order, and Host 2 reads these records to check their validity, a power failure at Host 1 while the data record still exists in the CPU cache of Host 1, but only the commit record became evicted from the cache, may result in the loss of the data record from the CXL-attached DRAM. However, if Host 2 reads the log records after this failure, it may assume that the log records were correctly written by Host 1, since the commit record exists in the shared CXL-attached DRAM.

To address this issue, I believe that GPF support (or programming similar to that used for NVDIMM-attached PM settings using cache-line flushes, non-temporal stores) may be necessary even for CXL-attached DRAM memory sharing.

I would appreciate your thoughts on this matter.

Thanks,
Sekwon

Andy Rudoff

unread,
Apr 4, 2023, 8:10:55 AM4/4/23
to pmem
Hi Sekwon,

You give an excellent example of why memory sharing has many of the same consistency issues as pmem.  A CXL memory pool that supports sharing can report its capacity to the host as persistent and that will cause GPF to apply to that capacity.  Note that reporting capacity as pmem is all about what programming model SW should use with it and whether or not the physical media is actually persistent is immaterial.  So if you want shared DRAM to be treated as persistent by SW, just report it as persistent capacity.

On Linux, you'll find very little difference at the application level.  For both CXL shared volatile memory and CXL shared pmem there will be some series of steps an application makes to get it mapped into its address space.  The application is then responsible for coordinating shared access (using locks, or some other communication method with other hosts, etc) and for keeping the data structures consistent (using transactions, flushes, etc.).  Really the main difference is that if the CXL device reports it as pmem, then the GPF flow will apply to it on power loss.

Thanks,

-andy

sekwo...@gmail.com

unread,
Apr 10, 2023, 5:35:28 PM4/10/23
to pmem
Hi Andy,

I appreciate your prompt response and detailed explanation. Your answer has helped me better understand GPF (Global Persistent Flush), but I still have a few more questions.

Firstly, I am wondering whether GPF support will become a universal feature for CXL environments or remain optional, as was the case with eADR. If it remains optional, applications using CXL shared memory or pmem may still need to use non-temporal stores or cache-line flush instructions manually to ensure correct crash consistency.

Secondly, I am curious about crash scenarios other than power loss. While GPF or eADR flushes out CPU caches at power loss, it may not apply to other crash scenarios such as process crashes or kernel failures. Do OS kernels typically have an implementation, such as using WBINVD when detecting these crashes, to flush CPU caches in such situations?

Thank you for your time and help.

Thanks,
Sekwon
Reply all
Reply to author
Forward
0 new messages