RocksDB put performance

714 views
Skip to first unread message

Adam ABED ABUD

unread,
Nov 23, 2020, 7:59:15 AM11/23/20
to rocksdb
Hello,

I'm trying to optimize the RocksDB put requests.

I have an SSD that is capable of sustaining more than 2 GB/s but the "put" throughput that I get is around 1000 MB/s. Is this expected? Or is there a way to get bandwidths as close as possible to the raw performance of a single drive?


My guess is that the flushing to disk is executed with a small block size. As an example from raw benchmarks of the drive I get around 1000 MB/s for a block size of 32 KB in sequential writing.


I modified a lot of option parameters but without any success.

Any feedback is more than welcome.


Further info: I'm using the C++ API on a linux machine.

Thanks,
Adam

Xiaofan Chen

unread,
Nov 23, 2020, 10:26:58 AM11/23/20
to rocksdb
watching

Siying Dong

unread,
Dec 1, 2020, 10:26:27 PM12/1/20
to Xiaofan Chen, rocksdb

By default, if you use direct I/O, the write size is 1MB, and without direct I/O, OS should buffer data. So it’s unlikely to be the reason.

Can you clarify what 1000 MB/s you got? Did you insert to RocksDB in a rate of 1000 MB/s? Or did you see from iostat that that disk does 1000 MB/s?

--
You received this message because you are subscribed to the Google Groups "rocksdb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/9dba6835-03cc-414d-88be-c5e5981e8ef4n%40googlegroups.com.

Adam ABED ABUD

unread,
Dec 2, 2020, 9:50:26 AM12/2/20
to rocksdb
Ahh that's interesting to know. I also tried with and without direct I/O and that did not change much the performance.

I got the throughput of 1000 MB/s by measuring, in a fixed amount of time (e.g. 60 seconds), the number of inserted keys into the database. I then multiplied this number by the size of the value and divided it by the time I executed the test.

I really don't understand where is the bottleneck in the system. Please let me know if you have any experience or suggestions.

Thanks a lot.

Siying Dong

unread,
Dec 2, 2020, 1:49:45 PM12/2/20
to Adam ABED ABUD, rocksdb

Well, 60 seconds are usually too short for measuring the sustainable write throughput and I do expect the number will drop after 60 seconds. The RocksDB layer usually introduce some write amplification, between 3 and 20. So if the drive can do 2 GB/s, we are usually happy with a logical ingestion rate of 200 MB/s or so if you never fine tune anything.

 

If you only insert in 60 seconds, the bottleneck is likely to be the speed RocksDB can insert into single DB, e.g. inserting to write buffer, writing to WAL, tracking writ ordering, etc. This limitation can be mitigated by having multiple RocksDB instances opened and shard writes into them.

Adam ABED ABUD

unread,
Dec 3, 2020, 2:53:59 AM12/3/20
to rocksdb
I also tried with longer testing times and it did not change anything.

I tried fine-tuning the system and apparently I cannot get a write amplification lower than 2-3.

Thanks a lot for your input!
Reply all
Reply to author
Forward
0 new messages