Hi Maximilian,
The pattern of your workload makes all the difference here.Β DAX means you don't use the page cache, much like opening a file O_DIRECT.Β The page cache is there for a reason, so if you really have a workload that runs better without the page cache, then DAX makes sense, but I've seen some people turn off the page cache, causing them to never find their data in DRAM, and then wonder why the performance didn't improve.
For pmem, there's a similar consideration.Β If you have a program that wants to update a persistent data structure like a tree or hash table, for storage those updates must cause full block I/O.Β For example, to update a single 8-byte pointer and make it persistent, SW must read a block from storage, make the change, then write it back.Β For most storage, that means moving 4k blocks just to update 8 bytes.Β If you use pmem with DAX/mmap/MAP_SYNC, that same update can be made without moving any blocks.Β The update writes the data to the persistence moving only the cache line that contains the data.Β Furthermore, it does so without calling into the kernel, so the overhead of doing kernel-based I/O is avoided.
I gave that example of a small update to make a point: as you move larger and larger blocks of data, the kernel overhead becomes less and less of an issue.Β For sufficiently large blocks of data, you'll find that the media becomes the bottleneck so accessing the same type of media via DAX/mmap/MAP_SYNC and accessing it in an NVMe device will approach the same performance as the block size gets larger.
Ultimately, the only concise answer to your question is to benchmark your workload to see which data path works better for it.
Hope that helps,
-andy