I agree the questions are simple. My answers are not as simple... sorry about that!
1. I understand that when I mmap it, I can bypass the page cache and directly access the pmem. But what if I use the filesystem read/write apis, would it bypass the page cache too?
The Linux file system community has been careful on what promises they make to applications. They want to provide the semantics expected by applications, but they don't want to prevent file systems from doing optimizations that are transparent to applications. In this way, you can think of DAX as being a "hint" to the file system, telling it the media allows direct access so the page cache is unnecessary. Technically, the file system could still decide to use the page cache, some of the time or all the time, as long as the semantics expected by applications are still met. In the current implementation, both ext4 and XFS will not use the page cache on successful DAX mounts, even when you use read() and write(). But I'm giving the long answer because I want to make it clear the file systems reserve the right to change how they implement this.
2. If the answer to the question1 is that we still bypass the page cache when using filesystem APIs, then using filesystem APIs, if I write a single byte, how much data the kernel writes? a whole block or just a byte? Can the kernel gurantee the atomocity of the block?
For user data, there was never a guarantee of block atomicity in POSIX. This is one of the most misunderstood facts about file systems. When an application writes a block of data, a system crash can tear that write -- you could see some old data, some new data, or something worse (on an allocating write, where someone is appending data to a file, I've seen the file containing a block of zeros instead of the user data because the crash happened after the file size changed but before the new data was flushed to the media). Applications should not be depending on write failure atomicity of user data -- POSIX never promised it. Only after a successful sync/fsync/msync, etc. does an application know the write is persistent. For memory-mapped files where the MAP_SYNC flag was successfully used, Linux extends this to allow flushing to persistence using user space instructions like CLWB. But any application that takes crash consistency seriously will use techniques like logging, checksumming (or both) to detect torn writes and recover from them after a crash.
Now that I've made that part clear, the answer to your first sentence is that if you are doing a write() to an already-allocated area of the file, the file system is free to use store instructions directly to the persistent data on media, and that's what both ext4 and XFS do. Of course, if it is an allocating write, like appending to a file or writing to a hole in a file, then a bunch of allocation/metadata logic will also happen as a result.
3. what a bout filesystem management APIs? such as falloc and ftruncate. When I use these filesystem management APIs, it would modify the content of inode. If the kernel write to the inode block like the normal block and it can not gurantee the atomicity of it, the filesystem might crash.
Both ext4 and XFS use journaling to provide consistency in the face of a crash.
-andy