How to detect programmatically whether a filesystem is DAX-enabled on Linux

295 views
Skip to first unread message

steve

unread,
Sep 26, 2019, 5:24:07 PM9/26/19
to pmem
Hi all,

In Windows, I can call GetVolumeInformationA and look at the FILE_DAX_VOLUME flag to see if a volume is DAX-enabled.
Is there a way to do that in Linux (C or C++)? I've been looking on the web for about half an hour and can't find anything.

Thanks.
------------
Steve Heller

Steve Scargall

unread,
Sep 26, 2019, 6:31:47 PM9/26/19
to pmem
Hi Steve,

On linux, we can use the 'mount -v' command to get the mounted filesystems and mount options.  

# mount -v | grep -i pmemfs
/dev/pmem0 on /pmemfs0 type ext4 (rw,relatime,seclabel,dax)

Programatically, you can use the statvfs() call to get information about a mount point and the mount options/flags.  The statfs() and fstatfs() syscalls also give you a pointer to the file system mount options.  

But that's probably not all the information you want.  You should check to see if the memory-mapped region is actually real persistent memory.  In PMDK, the libpmem library has the ability to query the memory mapping and tell you if it is or is not backed by persistent memory.  See the pmem_is_pmem and libpmem man pages. This is implemented in Windows and Linux through the is_pmem_detect() function defined here:


If you don't use PMDK, you can see how it's done and implement your own version.

A Linux example from the libpmem man page shows how you can detect pmem using the `is_pmem` boolean that gets populated upon successful mapping.  Then the app can take appropriate code paths for the flushing.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <libpmem.h>

/* using 4k of pmem for this example */
#define PMEM_LEN 4096

#define PATH "/pmem-fs/myfile"

int
main(int argc, char *argv[])
{
       
char *pmemaddr;
       
size_t mapped_len;
       
int is_pmem;

       
/* create a pmem file and memory map it */

       
if ((pmemaddr = pmem_map_file(PATH, PMEM_LEN, PMEM_FILE_CREATE,
                       
0666, &mapped_len, &is_pmem)) == NULL) {
               
perror("pmem_map_file");
               
exit(1);
       
}

       
/* store a string to the persistent memory */
       
strcpy(pmemaddr, "hello, persistent memory");

       
/* flush above strcpy to persistence */
       
if (is_pmem)
               
pmem_persist(pmemaddr, mapped_len);
       
else
               
pmem_msync(pmemaddr, mapped_len);

       
/*
         * Delete the mappings. The region is also
         * automatically unmapped when the process is
         * terminated.
         */

       
pmem_unmap(pmemaddr, mapped_len);
}


We do the same check using higher-level libraries such as libpmemobj, libpmemblk, libpmemlog, etc.  Source code examples are in the PMDK GitHub repo - https://github.com/pmem/pmdk/tree/master/src/examples

HTH
    Steve

steve

unread,
Sep 26, 2019, 6:47:14 PM9/26/19
to Steve Scargall, pmem
On Thu, 26 Sep 2019 15:31:47 -0700 (PDT), Steve Scargall <steve.s...@gmail.com> wrote:

>Hi Steve,
>
>On linux, we can use the 'mount -v' command to get the mounted filesystems
>and mount options.
>
># mount -v | grep -i pmemfs
>/dev/pmem0 on /pmemfs0 type ext4 (rw,relatime,seclabel,dax)
>
>Programatically, you can use the statvfs()
><http://man7.org/linux/man-pages/man3/statvfs.3.html> call to get
>information about a mount point and the mount options/flags. The statfs()
>and fstatfs() syscalls also give you a pointer to the file system mount
>options.

I don't see any information about DAX in the statvfs documentation (https://www.systutorials.com/docs/linux/man/3-statvfs/). What am I missing?

>But that's probably not all the information you want. You should check to
>see if the memory-mapped region is actually real persistent memory.

As far as I can tell, all I need to know is whether the volume is DAX-enabled, which I can get in Windows via GetVolumeInformationA.
I can't imagine any way that mapping a region of memory to a file on a DAX-enabled volume would not result in real persistent memory.
Is that possible, and if so, how could that happen?
------------
Steve Heller

Adrian Jackson

unread,
Sep 26, 2019, 6:58:46 PM9/26/19
to st...@steveheller.org, Steve Scargall, pmem
Might be available in the f_flag part of statvfs.

cheers

adrianj

--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/rdfqoetacdc92dkbtb22hkktocad5pqamq%404ax.com.

Andy Rudoff

unread,
Sep 26, 2019, 7:00:55 PM9/26/19
to pmem
Hi Steve,

What will you do differently based on the answer of whether it is DAX or not?  This comes up every now and then and the question usually indicates a misunderstanding of what DAX does exactly.  If you are making a decision on whether you can flush changes from user space, you don't want to check for DAX, you want to see if the file system allows a mapping with the MAP_SYNC flag.  If you are trying to determine if the page cache is being used, what will you do differently based on the answer?  Keep in mind that the dax mount option is currently the way that DAX mode is switched on for the entire file system, but there's movement towards making it a per-file flag so that's why understanding the exact semantics you're after is important.

Also remember that the answer is different for devdax devices and for user space flushing it is also important to check the ACPI property to see if CPU caches require flushing or not.  Roll all that complexity together and that's why we foten just recommend calling pmem_map_file() which digs through all the appropriate information on both Windows and Linux and for both FS dax and devdax...

-andy

steve

unread,
Sep 26, 2019, 7:05:24 PM9/26/19
to Andy Rudoff, pmem
On Thu, 26 Sep 2019 16:00:55 -0700 (PDT), Andy Rudoff <an...@rudoff.com> wrote:

>Hi Steve,
>
>What will you do differently based on the answer of whether it is DAX or
>not?

Hi Andy,

All I'm trying to do is make it possible to warn my user that he is using a non-DAX device if he asks me to check.
I've done that myself by accident a number of times and it is very annoying.
------------
Steve Heller

Andy Rudoff

unread,
Sep 26, 2019, 7:26:37 PM9/26/19
to pmem
Well, like I said, DAX means several things.  Just "checking for DAX" is almost certainly not what you want.

That said, finding the mount options used for a file system isn't that hard.  You could follow the steps these two commands follow:

# look up which file system a file lives on
$ df myfile
Filesystem      1K-blocks   Used Available Use% Mounted on
/dev/pmem1     1023068808 137776 970892276   1% /pmem1

# look up the mount options used for that file system
$ mount | grep /pmem1
/dev/pmem1 on /pmem1 type ext4 (rw,relatime,seclabel,dax)

If you don't want to execute those two commands, of course you can just do what they do programmatically to get the same information.

Just to repeat my earlier comment: the "dax" mount option is currently per file system, but we believe it will be changing to per-file so the way to perform this check will be different in the future.  There is also talk of adding an interface to check to see if the page cache is in use, but I don't believe that has shown up yet.

You sure you don't want to just call pmem_map_file() and look at the is_pmem flag?  :-)

-andy

steve

unread,
Sep 26, 2019, 7:30:13 PM9/26/19
to Andy Rudoff, pmem
On Thu, 26 Sep 2019 16:26:37 -0700 (PDT), Andy Rudoff <an...@rudoff.com> wrote:

>Well, like I said, DAX means several things. Just "checking for DAX" is
>almost certainly not what you want.
>
>That said, finding the mount options used for a file system isn't that
>hard. You could follow the steps these two commands follow:
>
># look up which file system a file lives on
>$ df myfile
>Filesystem 1K-blocks Used Available Use% Mounted on
>/dev/pmem1 1023068808 137776 970892276 1% /pmem1
>
># look up the mount options used for that file system
>$ mount | grep /pmem1
>/dev/pmem1 on /pmem1 type ext4 (rw,relatime,seclabel,dax)
>
>If you don't want to execute those two commands, of course you can just do
>what they do programmatically to get the same information.

That would be good but I can't find detailed documentation on how to do it and I'm not that expert at Linux C programming.

>Just to repeat my earlier comment: the "dax" mount option is currently per
>file system, but we believe it will be changing to per-file so the way to
>perform this check will be different in the future. There is also talk of
>adding an interface to check to see if the page cache is in use, but I
>don't believe that has shown up yet.
>
>You sure you don't want to just call pmem_map_file() and look at the
>is_pmem flag? :-)

Yes. I'm using the LLFIO library, not pmem.
------------
Steve Heller

Jan K

unread,
Sep 27, 2019, 3:46:18 AM9/27/19
to pmem
What comes to my mind is probably a way around, but it should give you
the desired result:
- stat / fstat a file you want to check
- get from the result (field st_dev) on which device the file sits -
extract major and minor
- read the file /proc/self/mountinfo
- look up a line with major:minor (third column)
- check if last column (options) contains dax

in console, this would be:
DEV=$(stat -c %d /path/to/file); awk '$3 ~
/^'$((DEV/0x100)):$((DEV%0x100))'$/ {if($NF~"(^|,)dax(,|$)") print
"yes"; else print "no"}' /proc/self/mountinfo

Regards,
Jan
> --
> You received this message because you are subscribed to the Google Groups
> "pmem" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pmem+uns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pmem/3biqoep9ilvlt9q9nhbijruhv9jbvkv5hc%404ax.com.
>

steve

unread,
Sep 27, 2019, 10:32:31 AM9/27/19
to Jan K, pmem
Hi Jan,

I need to do this in a C++ program so I can warn the user that he is trying to open a non-DAX file when he specifies in his calling arguments that he
wants to use only DAX files in certain roles where access time is critical.

E. g., in some modes I have an index primary file and and index secondary file. The primary file is loaded into memory on startup and saved to storage
at shutdown, both sequentially. Thus, it doesn't matter very much if it is on pmem or a fast SSD (e.g., Optane SSD). However, the secondary file is
accessed randomly, so it is important that it be on the fastest access storage, i.e., pmem, if possible. So if the user has enough pmem that he can
put the secondary index file on pmem, he can specify an option that implies "Don't run the program if the secondary index file isn't on pmem". With
the option set, if he makes this mistake, the program will tell him so he can rectify it.
------------
Steve Heller

Jan K

unread,
Sep 27, 2019, 11:16:22 AM9/27/19
to st...@steveheller.org, pmem
> I need to do this in a C++ program so I can warn the user that he is trying
> to open a non-DAX file when he specifies in his calling arguments that he
> wants to use only DAX files in certain roles where access time is critical.

That's quite obvious, you've put that in the title ;-)

stat is an ordinary C function (https://linux.die.net/man/2/stat), so just
   struct stat stats;
   stat("/path/to/some/file", &stats);
   unsigned major = major(stats.st_dev);
   unsigned minor = minor(stats.st_dev);

I have never head of a way to get mount options without reading some
file. And /proc/self/mountinfo has all the data you need.
If you want to avoid parsing a file by hand, then you can have
getmntent and hasmntopt do that for you. But then you need to know on
which mountpoint your file resides. I'm afraid that would yield more
lines of code than what I proposed.

Regards,
Jan

Jeff Moyer

unread,
Sep 27, 2019, 11:45:05 AM9/27/19
to Jan K, st...@steveheller.org, pmem
Jan K <jan.z....@gmail.com> writes:

>> I need to do this in a C++ program so I can warn the user that he is trying
>> to open a non-DAX file when he specifies in his calling arguments that he
>> wants to use only DAX files in certain roles where access time is critical.
>
> That's quite obvious, you've put that in the title ;-)
>
> stat is an ordinary C function (https://linux.die.net/man/2/stat), so just
>    struct stat stats;
>    stat("/path/to/some/file", &stats);
>    unsigned major = major(stats.st_dev);
>    unsigned minor = minor(stats.st_dev);
>
> I have never head of a way to get mount options without reading some
> file. And /proc/self/mountinfo has all the data you need.
> If you want to avoid parsing a file by hand, then you can have
> getmntent and hasmntopt do that for you. But then you need to know on
> which mountpoint your file resides. I'm afraid that would yield more
> lines of code than what I proposed.

Please re-read what Andy has written. DAX may not mean what you think
it means. Steve, you said you want to ensure the data is actually on
persistent memory. You can create a file system on persistent memory
and mount it without the dax mount option. At that point, the data
lives in persistent memory, but accesses go through the page cache.
Does that still meet your criteria for the index living on a low latency
device?

See? It's not at all clear what you think DAX implies. So to repeat
Andy's question, what are you trying to accomplish?

-Jeff

Steve

unread,
Sep 27, 2019, 1:31:29 PM9/27/19
to Jeff Moyer, Jan K, pmem
What I want to know is whether the file can be accessed randomly with sub-microsecond latency. On windows I can find that out by checking a flag returned from the system call I cited in a previous message.

Steve Heller

Dan Williams

unread,
Sep 27, 2019, 2:32:13 PM9/27/19
to Steve, Jeff Moyer, Jan K, pmem
It's simply broken for an application to make assumptions about
latency due to a "dax" flag. Here are a couple scenarios that would
break this assumption, I'm sure there are more:

* EXT4 and XFS have different responses to the dax mount option when
the underlying device fails the kernel's dax support checks. EXT4
fails to mount, XFS falls back to page cache with a warning in the
log. The mount option is unreliable for detecting DAX.

* The filesystem may need to do a significant amount of work to map a
block into the file. There are no guarantees that the block allocation
work and metadata operations complete with latency within the same
order of magnitude of a mapped access. The dax attribute is a not a
QOS guarantee.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/3867CB08-6E73-4319-9EDF-FA24D3416B0B%40steveheller.org.

Jeff Moyer

unread,
Sep 27, 2019, 2:59:10 PM9/27/19
to Dan Williams, Steve, Jan K, pmem
Dan Williams <dan.j.w...@gmail.com> writes:

> It's simply broken for an application to make assumptions about
> latency due to a "dax" flag.

Or any other flag. The OS doesn't, and *can't* guarantee anything about
access latency. But, we're probably focusing too much on the minutiae.
If I've understood the problem correctly, Steve wants a way to determine
that:

a) the file is stored on pmem
b) the kernel is involved with as little of the data path as possible

a) can be satisfied by looking at the device info in stat, and
correlating that with the device node name (is it pmemX or daxY).

b) is trickier. If you're only interested in loads/reads, then the dax
mount option may well be the best source of this info today (though I
hope that changes in the future). If you *also* need to perform stores
safely, then MAP_SYNC would be required, but that may incur extra
latency at page fault time. Without more information on how data is
made persistent, it's of course very difficult to be prescriptive, here.

My recommendation is to call mmap with the MAP_SYNC|MAP_SHARED_VALIDATE
flags. That's the safest and easiest way to get the answers you are
looking for, Steve.

> Here are a couple scenarios that would break this assumption, I'm sure
> there are more:
>
> * EXT4 and XFS have different responses to the dax mount option when
> the underlying device fails the kernel's dax support checks. EXT4
> fails to mount, XFS falls back to page cache with a warning in the
> log. The mount option is unreliable for detecting DAX.

In the xfs case, the dax mount option will not show up in /proc/mounts
if the fallback is triggered. Still, don't rely on this.

> * The filesystem may need to do a significant amount of work to map a
> block into the file. There are no guarantees that the block allocation
> work and metadata operations complete with latency within the same
> order of magnitude of a mapped access. The dax attribute is a not a
> QOS guarantee.

And those pages can become unmapped, as well. Again, the OS doesn't and
can't guarantee access latencies, for *so* many reasons.

Cheers,
Jeff

Steve

unread,
Sep 27, 2019, 4:53:33 PM9/27/19
to Jeff Moyer, Dan Williams, Jan K, pmem
Yes, thanks, Jeff. I want to know, when I start my program, whether I will be able to access existing data in a file without the intervention of the file system. If using mmap in that way will give me that information, that will solve my problem.

Steve Heller

Dan Williams

unread,
Sep 27, 2019, 6:08:46 PM9/27/19
to Steve, Jeff Moyer, Jan K, pmem
There's no guarantee that the filesystem won't intervene, even with MAP_SYNC. All that flag gives you is the confirmation that it is safe to simply flush caches to persist data.

Dan Williams

unread,
Sep 27, 2019, 6:09:53 PM9/27/19
to Steve, Jeff Moyer, Jan K, pmem
If you need explicit guarantees that no file system will intervene, that's device-dax.
Reply all
Reply to author
Forward
0 new messages