Hi All,
I run a random-write benchmark on persistent memory by the following steps:
- Create a large array of 8-byte integers
- Randomly write 8-bytes and persist the data
I did the step 2 for ~15.5 millions time (in a loop). Eventually I write a total of ~120 MB in persistent memory. I used a single thread for this test.
Now, to persist the data I used the following two options:
- persistence-option-1: pmemobj_persist() function provided in PMDK
- persistence-option-2: Call (CLWB + sfence) instruction
- I used the following macro for CLWB
- #define _mm_clwb(addr)\ asm volatile(".byte 0x66; xsaveopt %0" : "+m" (*(volatile char *)addr));
- I used _mm_sfence() function from the “xmmintrin.h” library
In my understanding, both of the persistence options are equivalent. As, pmemobj_persist() ultimately call the CLWB and sfence to persist the data. However, it took ~400 seconds with the persistence-option-1 where it only take ~4 seconds with the persistence-option-2. I am wandering where this overhead comes from? Does the pmemobj_persist() do anything special other than calling (CLWB + sfence) instruction?