Q) What Linux distro and version are you using? (cat /etc/-os-release)
Q) What Kernel is installed? (name -r)
Q) Having created the namespace, can you create either an xfs or ext4 filesystem on the /dev/pmemN device? If that works, then as far as the PMem emulation code goes, it's working as expected, so I suspect a different Kernel memory tracking issue is causing your observation - See Note[2] below.
Note [1]: If more than one persistent memory namespace is required, specify a memmap entry for each namespace. For example, "memmap=2G!12G memmap=2G!14G" will create two 2GB namespaces, one in the 12GB-14GB memory address offsets, the other at 14GB-16GB.
Note [2]: The assumption that memory is contiguously assigned may not be true depending on the hardware and OS implementations. ie: Socket 0 == 0-128GB and Socket 1 == 128GB-256GB is likely to be an invalid assumption
To validate the assumption in Note[2], and to ensure you chose the correct values for Socket1, take a look at `numactl -H` which will tell you if the CPUs are contiguously assigned (as they are on Dell), or smaller groups based on the Hyperthread layout (as they are on most platforms). Here's an example of HT grouping:
```
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 192111 MB
node 0 free: 127797 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 193510 MB
node 1 free: 165117 MB
node distances:
node 0 1
0: 10 21
1: 21 10
```
The detailed memory layout is shown by `lsmem -a`, eg:
```
# lsmem -a
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x000000007fffffff 2G online yes 0
0x0000000100000000-0x000000017fffffff 2G online yes 2
0x0000000180000000-0x00000001ffffffff 2G online yes 3
0x0000000200000000-0x000000027fffffff 2G online yes 4
0x0000000280000000-0x00000002ffffffff 2G online yes 5
0x0000000300000000-0x000000037fffffff 2G online yes 6
0x0000000380000000-0x00000003ffffffff 2G online yes 7
0x0000000400000000-0x000000047fffffff 2G online yes 8
[...snip...]
```
Or to group by NUMA node, use:
```
# lsmem --split node
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x000000007fffffff 2G online yes 0
0x0000000100000000-0x000000307fffffff 190G online yes 2-96
0x000001aa00000000-0x000001da7fffffff 194G online yes 852-948
[...snip...]
```
You can see both the start-end range for the block(s). You can obtain the CPU to Block mapping from `/sys/devices/system/node/node[0-9]`, eg:
(Note, the following is in hex, the lsmem output is in decimal)
```
# pwd
/sys/devices/system/node/node0
# cat memory*/phys_index
00000000
0000000a
0000000b
0000000c
0000000d
0000000e
0000000f
00000010
[...snip...]
# pwd
/sys/devices/system/node/node1
# cat memory*/phys_index | head
00000354 <-- 852 decimal
00000355
00000356
00000357
00000358
00000359
0000035a
0000035b
0000035c
0000035d
```
The phsy_index == block, so now we can see which memory ranges belong to which CPU core and thus map it to a NUMA node.
Using my Intel S2600WF server with 2 x 8260L CPUs, 192GiB DRAM and 1.5TiB PMem on each socket, we can calculate the correct values to emulate PMem on Socket 0 and Socket 1.
If I look at my e820 table, I can see there is 'usable' memory within the NUMA node regions
# dmesg | grep e820 | grep usable
[...snip...]
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000303fffffff] usable <-- Socket 0
[ 0.000000] BIOS-e820: [mem 0x000001aa60000000-0x000001da5fffffff] usable <-- Socket 1
Refer to the previous `lsmem --split node` output to see how we map the ranges to sockets.
In my case, it would be safe for me to choose to use some capacity within the 4GiB - 192GiB (Socket 0) and 1706GiB - 1897GiB (Socket 1). Notice this isn't contiguous memory addresses for Socket0 and Socket 1!!
Here's the free(1) output before I make any changes (376GiB is the magic number we'll use later)
```
# free -h
total used free shared buff/cache available
Mem: 376Gi 7.3Gi 286Gi 1.6Gi 83Gi 365Gi
Swap: 8.3Gi 0B 8.3Gi
```
After adding two memmap entries, my boot command line looks like this:
```
# cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.16.14-200.fc35.x86_64 root=/dev/mapper/fedora_pmemdev1-root ro resume=/dev/mapper/fedora_pmemdev1-swap
rd.lvm.lv=fedora_pmemdev1/root
rd.lvm.lv=fedora_pmemdev1/swap systemd.unified_cgroup_hierarchy=0 memmap=64G!12G memmap=64G!1760G
```
After a reboot, my memory now looks like this:
```
# dmesg | grep mem
[ 0.000000] reserve setup_data: [mem 0x0000000100000000-0x00000002ffffffff] usable
[ 0.000000] reserve setup_data: [mem 0x0000000300000000-0x00000012ffffffff] persistent (type 12) <-- Emulated PMem
[ 0.000000] reserve setup_data: [mem 0x0000001300000000-0x000000303fffffff] usable
[ 0.000000] reserve setup_data: [mem 0x0000003040000000-0x000000305fffffff] reserved
[ 0.000000] reserve setup_data: [mem 0x0000003060000000-0x000001aa5fffffff] persistent (type 7) <- Physical PMem
[ 0.000000] reserve setup_data: [mem 0x000001aa60000000-0x000001b7ffffffff] usable
[ 0.000000] reserve setup_data: [mem 0x000001b800000000-0x000001c7ffffffff] persistent (type 12)
<-- Emulated PMem
[ 0.000000] reserve setup_data: [mem 0x000001c800000000-0x000001da5fffffff] usable
[ 0.000000] reserve setup_data: [mem 0x000001da60000000-0x000003545fffffff] persistent (type 7) <- Physical PMem
```
Here's the free(1) output which lines up with my expectations (375GiB - 128GiB ~= 248GiB)
```
# free -h
total used free shared buff/cache available
Mem: 250Gi 6.7Gi 243Gi 2.0Mi 722Mi 242Gi
Swap: 8.3Gi 0B 8.3Gi
```
Hope this helps.