Hi all,
I'm currently evaluating the performance of the pmem_memcpy(). Therefore, I used the PMDK benchmark tool and modified it a little bit.
One modification was to compare libc_memcpy + pmem_flush against the plain libc_memcpy (
Github). In the multithreaded benchmarks of this operation, I noticed that the performance of the plain memcpy() is much worse than its counterpart.
In the attached images you can see that especially for bigger data sizes the plain variant performs very badly (purple). I've tested with 64B, 256B, and 4KiB (4096B).
Does anyone have any idea what could be the reason for the slowdown? And why does running a pmem_flush() or pmem_persist() (green, red) afterwards improve performance?
Thanks in advance for your help and feedback