how to allocate a 4KB page with pmdk

125 views

Skip to first unread message

Wentao Huang

unread,

Oct 4, 2021, 3:04:31 PM10/4/21

to pmem

Hi, I would like to test the optane performance with a 4KB page.

I know that, by default, pmem_map_file function of pmdk library will allocate 2MB page.

And if the data is not 2MB-aligned, then the allocated memory will be arranged with 4KB page.

But I don't know how to make it not 2MB-aligned. I check with the output address of function "pmem_map_file", all of these addresses are 2MB-aligned.

Can anyone provide me with a code snippet to explain how to allocate a 4KB page?

Many thanks for the help.

Mason, Tony

unread,

Oct 4, 2021, 3:51:54 PM10/4/21

to Wentao Huang, pmem

TL;DR answer: use a Linux kernel without CONFIG_FS_DAX_PMD enabled.

Even if you have 4KB page alignment for your storage (which is what I did when doing the work for our paper about the impact of page size, see Unexpected Performance of Intel® Optane DC Persistent Memory (Journal Article) | DOE PAGES (nsf.gov)) you are right that you will get, by default, 2MB allocations whenever the alignment works. The only way I found to do it reliably on Linux was to build a kernel without the 2MB page support turned on – that’s controlled by the CONFIG_FS_DAX_PMD kernel option. If you look in the Linux kernel sources (fs/dax.c) you can observe how this disables returning anything other than 4KB pages.

In theory you can do it if you can force lots of storage fragmentation as well (this is actually how I discovered that there was quite a bit of impact about page sizes, depending upon workload) but that’s likely to take more time than just recompiling your kernel.

Another approach that I’ve used is to use devdax with 4KB page alignment set; last I checked the PMDK wouldn’t mmap the dax region directly but the change was fairly small (at least for an experiment, it would be more work to do it more broadly/generally). We did that to confirm we could reliably control for 4KB, 2MB, and 1GB pages. The latter yield beneficial results for workloads that exceed the working set size of the TLB, even though our CPU only had 4 1GB TLB slots – the cost of a TLB miss and page table walk is staggeringly high impact when switching from 4KB to 2MB pages, and still substantial when moving from 2MB to 1GB pages. Last I checked there was no FSDAX support for 1GB pages in the Linux kernel; I just looked at the current code base and it still only offers 4KB/2MB page support.

Tony Mason

--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/6e03a0c3-e008-43f5-b021-a6cc73e0e2e1n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages