st...@subphysical.com
unread,Jan 30, 2022, 1:49:04 PM1/30/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pm...@googlegroups.com
Hi all,
I've been doing performance testing on adding tens of billions of records to my new pmem-tuned hash
table and have found some odd performance anomalies that I would like to understand.
These anomalies occur on both of my test systems.
Test system 1: 2x Xeon Silver 4215, each socket having 64 GB DRAM and 1 TB Optane pmem comprising 4
256 GB parts.
Test system 2: 2x Xeon Gold 6326, each socket having 128 GB DRAM and 2 TB Optane pmem series 200
comprising 4 512 GB parts.
Test specification: adding 5 billion variable-length records averaging 8-byte keys and 8-byte values
to a hash table already containing tens of billions of records.
Typical behavior: When not rehashing, performance is consistent at around 5 million records per
second on test system 1 and about 7 million on test system 2. When rehashing, performance is
consistent at around 350k records per second on test system 1 and 600k records per second on test
system 2. Note that rehashing is very write-intensive because after each block of ~50000 new records
being written, it moves several million records in the file to a new location.
Anomalous behavior: On occasion, rehashing stops for 10 seconds or more, then runs very slowly
(about 1/10th of normal speed) for as much as several minutes, with periods during which zero new
records are added. Eventually, it comes back to normal behavior and then the rest of the run is
normal.
If I then run exactly the same load again, with the same hardware, same software, and same dataset,
I do not see the anomalous behavior.
Hypothesis: The slowdown occurs because the pmem controller is doing maintenance of some kind. Can
anyone confirm or deny this, preferably with some information about how I could predict or at least
detect and report this maintenance to the user of the hash table?
------------
Steve Heller