libpmem2 benchmark

71 views
Skip to first unread message

Robert Jandow

unread,
Mar 9, 2021, 11:20:37 AM3/9/21
to pmem
Hi everyone,

I'm currently trying to compare the performance of libpmem2 and libpmem. For this reason, I've modified the libpmem flush benchmark to work with libpmem2.
Besides the actual benchmark operations, I've adapted the init method to create the necessary map for libpmem2. 

However, I've noticed a huge performance loss when running my modified benchmark with emulated persistent memory (PM) instead of RAM. I've used the pmem guide to create the emulated PM. My test system has an Intel Xeon E5-1620v4 with 16GB RAM and 16GB emulated PM.
These are the detailed results (the config is attached):

flush_noop_RAM: pmem2_get_persist_fn [1]
0.130026;769076.244065;0.130026;0.130026;0.130026;0.000000;1315;23;43952;1178.899057;1625;4626;12797;1;100000;4096;0;1;false;-1;0;noop;rand;true
flush_noop_PM: pmem2_get_persist_fn [1]
2.397897;41703.209695;2.397897;2.397897;2.397897;0.000000;23993;45;197234;18654.259359;35711;54306;92040;1;100000;4096;0;1;false;-1;0;noop;rand;true

As already mentioned, there is a large discrepancy between emulated PM and RAM even with the noop operation. To avoid a wrong PM configuration, I've performed the same test with the "original" flush benchmark. There was no difference between PM and RAM.

From this I conclude that something is wrong with my benchmark. Since I'm using the noop operation, the error must be in the init or exit method of the benchmark. For the initialisation, I've followed the documentation and the examples provided by the PMDK repository. I've tried different variations but none of them changed the behavior.The source code of my pmem2 benchmark can be found on Github.

Has anyone experienced similar behavior? Or did I miss something when implementing libpmem2?

pmem2_flush.cfg

ppbb...@gmail.com

unread,
Mar 9, 2021, 12:50:55 PM3/9/21
to pmem
I have two guesses:
1. The emulated pmem device isn't mounted with the dax option.
2. The benchmark does no warmup, so you are essentially benchmarking page allocation performance. And it's expected that its faster on tmpfs versus ext4 or xfs.

steve

unread,
Mar 9, 2021, 12:55:07 PM3/9/21
to ppbb...@gmail.com, pmem
Hi wtorek,

I'm not sure what your goal is, but as far as I know it's pretty unlikely that you are going to get performance figures that have much relationship to
reality with simulated pmem.

On Tue, 9 Mar 2021 09:50:55 -0800 (PST), "ppbb...@gmail.com" <ppbb...@gmail.com> wrote:

>I have two guesses:
>1. The emulated pmem device isn't mounted with the dax option.
>2. The benchmark does no warmup, so you are essentially benchmarking page
>allocation performance. And it's expected that its faster on tmpfs versus
>ext4 or xfs.
>
>wtorek, 9 marca 2021 o 17:20:37 UTC+1 rober...@gmail.com napisa?(a):
>
>> Hi everyone,
>>
>> I'm currently trying to compare the performance of libpmem2 and libpmem.
>> For this reason, I've modified the libpmem flush benchmark to work with
>> libpmem2.
>> Besides the actual benchmark operations, I've adapted the init method to
>> create the necessary map for libpmem2.
>>
>> However, I've noticed a huge performance loss when running my modified
>> benchmark with emulated persistent memory (PM) instead of RAM. I've used the
>> pmem guide <https://pmem.io/2016/02/22/pm-emulation.html> to create the
>> emulated PM. My test system has an Intel Xeon E5-1620v4 with 16GB RAM and
>> 16GB emulated PM.
>> These are the detailed results (the config is attached):
>>
>> flush_noop_RAM: pmem2_get_persist_fn [1]
>> 0.130026;769076.244065;0.130026;0.130026;0.130026;0.000000;1315;23;43952;1178.899057;1625;4626;12797;1;100000;4096;0;1;false;-1;0;noop;rand;true
>>
>> flush_noop_PM: pmem2_get_persist_fn [1]
>>
>> 2.397897;41703.209695;2.397897;2.397897;2.397897;0.000000;23993;45;197234;18654.259359;35711;54306;92040;1;100000;4096;0;1;false;-1;0;noop;rand;true
>>
>> As already mentioned, there is a large discrepancy between emulated PM and
>> RAM even with the noop operation. To avoid a wrong PM configuration, I've
>> performed the same test with the "original" flush benchmark. There was no
>> difference between PM and RAM.
>>
>> From this I conclude that something is wrong with my benchmark. Since I'm
>> using the noop operation, the error must be in the init or exit method of
>> the benchmark. For the initialisation, I've followed the documentation and
>> the examples provided by the PMDK repository. I've tried different
>> variations but none of them changed the behavior.The source code of my
>> pmem2 benchmark can be found on Github
>> <https://github.com/RobertJndw/pmdk/blob/bench-pmem2/src/benchmarks/pmem2_flush.cpp>
>> .
>>
>> Has anyone experienced similar behavior? Or did I miss something when
>> implementing libpmem2?
>>
------------
Steve Heller

Robert Jandow

unread,
Mar 9, 2021, 1:57:10 PM3/9/21
to pmem
I don't have access to nativ persistent memory yet. To test the functionality of the benchmark, I wanted to use the emulated PM. Afterwards I can get the representative values with the actual hardware.

Robert Jandow

unread,
Mar 11, 2021, 10:59:29 AM3/11/21
to pmem
Okay I have done some more testing. It seems that the delay shifts when I'm enabling warm-up. I measured the time of the three phases (init, main and exit).
With enabled warmup, the init method becomes very slow. On the other hand, if I disable warmup the main operation takes much longer. As mentioned in a previous comment, the benchmark without warmup measures the page allocation performance.

Is it possible that the page allocation performance of libpmem2 is a lot slower than libpmem on DAX mounted file systems?

ppbb...@gmail.com

unread,
Mar 12, 2021, 4:39:24 AM3/12/21
to pmem
There's a bug libpmem2 that essentially forced 4k alignment - https://github.com/pmem/pmdk/commit/cd4cc8d016567a202d6695439714e9596638828a
This is likely the reason for the performance difference you are observing. Huge pages mean fewer page allocations.
The problem is fixed in both the master and stable-1.0 PMDK branches, so you can test this by building from source.
Reply all
Reply to author
Forward
0 new messages