Following is the explain for the RTT https://docs.kernel.org/driver-api/nvdimm/btt.html?highlight=btt: Consider a case where we have two threads, one doing reads and the other, writes. We can hit a condition where the writer thread grabs a free block to do a new IO, but the (slow) reader thread is still reading from it. In other words, the reader consulted a map entry, and started reading the corresponding block. A writer started writing to the same external LBA, and finished the write updating the map for that external LBA to point to its new postmap ABA. At this point the internal, postmap block that the reader is (still) reading has been inserted into the list of free blocks. If another write comes in for the same LBA, it can grab this free block, and start writing to it, causing the reader to read incorrect data. To prevent this, we introduce the RTT.The RTT is a simple, per arena table with ‘nfree’ entries. Every reader inserts into rtt[lane_number], the postmap ABA it is reading, and clears it after the read is complete. Every writer thread, after grabbing a free block, checks the RTT for its presence. If the postmap free block is in the RTT, it waits till the reader clears the RTT entry, and only then starts writing to it.
But when I review the libpmemblk and kernel btt code and the RTT logic, I think the logic can’t avoid the read/write same data blockers. We don't have any lock for the RTT that means: Every writer thread, after grabbing a free block, checks the RTT for its presence. If the RTT is not written by the read thread, it will start to write the ABA, now the read thread write the RTT and start to read data, that will read the dirty data. Please help explain?
In my mind, we must have some lock to assure the execution order of the RTT write in the read thread and RTT read in the write thread. In the logic, we considering to use the map_locks to assure the different threads write the same LBA, while we don’t have any lock to assure the read/write same ABA.
I would like to hear more from you and appreciate you to make me clear on the logic.
I think this is a corner case, it is hard to validate, but since we don’t know the detail execution order for every steps and I define the following steps and please take a look. We can see our step 12 and 10 are operating the same ABA.
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/a691ef37-9744-4122-8eb5-e78f0093a1c4n%40googlegroups.com.
Yes, I got it. You are right, read the btt_map again to confirm if the map is updated, if updated, read another location. If not, at least the rtt set correctly.
Each postmap will be divided into the lanes, in the picture btt_write(lba1), btt_write(lba2) can not be run in parallel (in the same lane).
The logic is right, but for read, it need read the btt_map twice.
Thank you very much for the explanation.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/a412d1e9-382b-4cfd-8a08-b5ed49f06d94n%40googlegroups.com.