Yes, it is possible with MLC to measure idle latency, loaded latency, and bandwidth.
For the following, I assume a 2-Socket server configured in AppDirect where each Region has a single FSDAX namespace mounted with the DAX option:
--- Setup ---
$ sudo ipmctl create -goal PersistentMemoryType=AppDirect
$ sudo systemctl reboot
$ sudo ndctl create-namespace --continue
$ sudo mkfs.ext4 /dev/pmem0
$ sudo mkfs.ext4 /dev/pmem1
$ sudo mkdir /pmemfs0 /pmemfs1
$ sudo mount -o dax /dev/pmem0 /pmemfs0
$ sudo mount -o dax /dev/pmem1 /pmemfs1
This system has the following NUMA layout
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 192114 MB
node 0 free: 187552 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 192012 MB
node 1 free: 191380 MB
node distances:
node 0 1
0: 10 21
1: 21 10
--- Sequental Idle Latency Tests (Local NUMA) ---
// Use the first CPU on Socket 0/1 to perform the test to the local socket PMem
# mlc --idle_latency -c0 -J/pmemfs0
# mlc --idle_latency -c24 -J/pmemfs1
--- Sequential Idle Latency Tests (Remote NUMA) ---
// Use the first CPU on Socket 0/1 to perform the test to the remote socket PMem
# mlc --idle_latency -c24 -J/pmemfs0
# mlc --idle_latency -c0 -J/pmemfs1
--- Random Idle Latency Tests (Local NUMA) ---
// Use the first CPU on Socket 0/1 to perform the test to the local socket PMem
# mlc --idle_latency -l256 -c0 -J/pmemfs0
# mlc --idle_latency -l256 -c24 -J/pmemfs1
--- Random Idle Latency Tests (Remote NUMA) ---
// Use the first CPU on Socket 0/1 to perform the test to the remote socket PMem
# mlc --idle_latency -l256 -c24 -J/pmemfs0
# mlc --idle_latency -l256 -c0 -J/pmemfs1
--- Sequential Read Loaded Latency Tests ---
// Create a file containing delays between memory accesses, thus simulating an app doing compute work
// These numbers are the defaults used by MLC loaded latency. Feel free to use your own.
# cat <<EOF > loaded_latency_delays
00000
00002
00015
00050
00100
00200
00300
00400
00500
00700
01000
01300
01700
02500
03500
05000
09000
20000
EOF
// Create a Per Thread test definition file as input to MLC.
// Here, we create a single thread test and another test using all threads on Socket0
# echo "0 R seq 100000 pmem /pmemfs0" > PMem_PERTHREAD
# echo "0-23,48-71 R seq 100000 pmem /pmemfs0" >> PMem_PERTHREAD
// Run the MLC test using the delay and per thread input files. Each test runs for 5seconds
# mlc --loaded_latency -g./loaded_latency_delays -o./PMem_PERTHREAD -t5 > llat_seq_READ.txt
--- Random Read Loaded Latency Tests ---
// Use the same loaded_latency_delays
// Create a Per Thread test definition file as input to MLC.
// Here, we create a single thread test and another test using all threads on Socket0
# echo "0 R rand 100000 pmem /pmemfs0" > PMem_PERTHREAD
# echo "0-23,48-71 R rand 100000 pmem /pmemfs0" >> PMem_PERTHREAD
// Run the MLC test using the delay and per thread input files. Each test runs for 5seconds
# mlc --loaded_latency -g./loaded_latency_delays -o./PMem_PERTHREAD -t5 > llat_rand_READ.txt
--- Max Bandwidth Tests ---
MLC has a wide variety of read/write tests built-in. See the '-w' option:
-Wn where n means
2 - 2:1 read-write ratio
3 - 3:1 read-write ratio
4 - 3:2 read-write ratio
5 - 1:1 read-write ratio
6 - 0:1 read-Non Temporal Write ratio
7 - 2:1 read-Non Temporal Write ratio
8 - 1:1 read-Non Temporal Write ratio
9 - 3:1 read-Non Temporal Write ratio
10 - 2:1 read-Non Temporal Write ratio (stream triad-like)
Same as -W7 but the 2 reads are from 2 different buffers
11 - 3:1 read-Write ratio (stream triad-like with RFO)
Same as -W3 but the 2 reads are from 2 different buffers
12 - 4:1 read-Write ratio
21 - 100% read with 2 addr streams - valid with only -o option
23 - 3:1 read-write ratio with 2 addr streams - valid with only -o option
27 - 2:1 read-NT write with 2 addr streams - valid with only -o option
To use this, we need to create an input file, just like we did for the loaded latency tests.
// Create a workload file
# cat <<EOF > PMem_PERTHREAD
#CPUs Traffic type seq or rand buffer size pmem or dram pmem path
0-23,48-71 W2 rand 100000 pmem /pmemfs0
EOF
// Run the test
# mlc --loaded_latency -d0 -o./PMem_PERTHREAD -t10 -T -Z
// Using the -w options, you can run additional tests (one test per PMem_PERTHREAD file), such as:
#CPUs Traffic type seq or rand buffer size pmem or dram pmem path
0-23,48-71 R seq 100000 pmem /pmemfs0
0-23,48-71 R rand 100000 pmem /pmemfs0
0-23,48-71 W2 seq 100000 pmem /pmemfs0
0-23,48-71 W2 rand 100000 pmem /pmemfs0
0-23,48-71 W5 seq 100000 pmem /pmemfs0
0-23,48-71 W5 rand 100000 pmem /pmemfs0
0-23,48-71 W6 seq 100000 pmem /pmemfs0
0-23,48-71 W6 rand 100000 pmem /pmemfs0
0-23,48-71 W7 seq 100000 pmem /pmemfs0
0-23,48-71 W7 rand 100000 pmem /pmemfs0
The above approaches can be extended to achieve a multitude of tests including, but not limited to: