The fstatfs for ZFS check was added in https://github.com/facebook/rocksdb/commit/8272a6de57ed701fb25bb660e074cab703ed3fe7 as part of https://github.com/facebook/rocksdb/pull/5183
The dynamic sync_file_range check was added in https://github.com/facebook/rocksdb/commit/2c9df9f9e5c757c8f368d0860e2da8adb63849a3 for https://github.com/facebook/rocksdb/pull/5416 , interestingly the code referenced https://github.com/postgres/postgres/commit/483520eca426fb1b428e8416d1d014ac5ad80ef4 uses a static bool not_implemented_by_kernel which is set to true when ENOSYS is returned, but I don't think this approach could be used with the RocksDB code here as the fd could be on different filesystems. Anyway the sync_file_range call is always nice and fast.
Here is a simulated reproducer using RocksDB 6.20 on CentOS8 using /dev/shm and strace to add a delay:
rm -rf /dev/shm/db_bench ; ./db_bench -benchmarks updaterandom -num 1 -db /dev/shm/db_bench > /dev/null ; strace -y -T -e trace=fstatfs -e inject=fstatfs:delay_enter=100ms ./db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true 2>&1 | egrep 'fstatfs|OpenDb' | awk '{print $1,$2,$3,$16}'
fstatfs(6</dev/shm/db_bench/MANIFEST-000008>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.099964>
fstatfs(7</dev/shm/db_bench/000008.dbtmp>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100016>
fstatfs(8</dev/shm/db_bench/000009.sst>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100020>
fstatfs(9</dev/shm/db_bench/MANIFEST-000010>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100012>
fstatfs(6</dev/shm/db_bench/000010.dbtmp>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100015>
fstatfs(6</dev/shm/db_bench/000005.log>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100016>
fstatfs(6</dev/shm/db_bench/000011.log>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100018>
fstatfs(10</dev/shm/db_bench/OPTIONS-000012.dbtmp>, {f_type=TMPFS_MAGIC, f_bsize=4096, <0.100008>
OpenDb: 814.075 milliseconds
I was thinking a compile time define (DO_NOT_CHECK_ZFS_SYNC_FILE_RANGE) around the code in
would be a simple fix for me as I would define it when building. Would a patch like this be of interest as a PR?
Cheers,
Peter (Stig) Edwards
--
You received this message because you are subscribed to the Google Groups "rocksdb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/b50c8d34-44ee-43f2-ba65-c5c9bbc599d0n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/481d0a10-f090-4afb-a160-e0e6657c404an%40googlegroups.com.