RocksJava Performance

Jon Groff

unread,

Feb 22, 2024, 12:18:39 PM2/22/24

to rocksdb

Greetings, In general, is there a way to coax RocksJava to utilize more processor = resources for better performance? I can elaborate on what I am doing, but = briefly, I am wrapping RocksJava to create a NoSQL triple store style DB = with a Category theoretic/functional flavor. My keys are a series of longs = derived from UUIDs and serialized Comparators. Sampling from VisualVM shows RocksJava 'get' soaking up a lot of time but perf mon says very little is = being used in the way of CPU and Disk. I have multithreading writing = multiple databases so I am invoking parallelism on my end. My key sizes = hover around 1k and I set up writes so keys have maximum variability in = starting bytes. I am using my own basic comparators derived from = AbstractComparator. My options look like this: Options options =3D new Options(); final Filter bloomFilter =3D new BloomFilter(10); final Statistics stats =3D new Statistics(); options.setComparator(comparator); try { options.setCreateIfMissing(true) .setStatistics(stats) .setWriteBufferSize(8 * SizeUnit.MB) .setMaxWriteBufferNumber(3) .setMaxBackgroundJobs(24) .setCompressionType(CompressionType.SNAPPY_COMPRESSION) .setCompactionStyle(CompactionStyle.LEVEL); } catch (final IllegalArgumentException e) { bloomFilter.close(); throw new RuntimeException(e); } options.setMemTableConfig( new HashSkipListMemTableConfig() .setHeight(4) .setBranchingFactor(4) .setBucketCount(2000000)); options.setMemTableConfig( new HashLinkedListMemTableConfig() .setBucketCount(100000)); options.setMemTableConfig( new VectorMemTableConfig().setReservedSize(10000)); options.setMemTableConfig(new SkipListMemTableConfig()); options.setTableFormatConfig(new PlainTableConfig()); // Plain-Table requires mmap read options.setAllowMmapReads(true); final BlockBasedTableConfig table_options =3D new BlockBasedTableConfig(); Cache cache =3D new LRUCache(64 * 1024, 6); table_options.setBlockCache(cache) .setFilterPolicy(bloomFilter) .setBlockSizeDeviation(5) .setBlockRestartInterval(10) .setCacheIndexAndFilterBlocks(true) .setBlockCacheCompressed(new LRUCache(64 * 1000, 10)); options.setTableFormatConfig(table_options); Thanks in advance, Jon Groff

Alan Paxton

unread,

Feb 22, 2024, 12:52:32 PM2/22/24

to rocksdb

Hi Jon,

One thing you might look at is your comparators. Java comparators are known to be A LOT less performant that native C++ ones, due to the high bridging costs. @see the comments in BytewiseComparator .. if you are able to native-ize the comparator, or find a way to encode your keys consistent with an existing native comparator, that might help a lot.

--Alan

Jon Groff

unread,

Feb 22, 2024, 2:03:48 PM2/22/24

to rocksdb

Hi Alan,

Thanks for the reply and suggestion. Unfortunately the system revolves around implementation of the Comparable interface and polymorphism in derived implementations so 'nativizing' isnt practical. BTW FYI the system is in github.com/neocoretechs/relatrix and github.com/neocoretechs/rocksack for future reference. I will look at the BytewiseComparator. The system is pretty performant on subclasses of keysets up to a point then write performance systematically degrades to near uselessness. As in; 4000 to 5000 writes per second in basic keyset class of 12 contiguous longs to around 1 write per second when writing around 6x subclasses of that class plus payloads of strings and integers of several hundred bytes. That's why I thought it might be a tuning issue. In general I am really happy with RocksJava (I am using v7.7.3) and this is my first real potential showstopper.

Thanks again,

Jon

Adam Retter

unread,

Feb 22, 2024, 6:59:58 PM2/22/24

to rocksdb

Hi Jon,

Alan is correct that Java Comparators in RocksDB can impose a significant overhead. This is due to having to cross the JNI Java to/from C++ language boundary for comparing each value for storage in the SST.

However, may I suggest before we worry about Comparators, that you use some profiling tools to find out where your bottleneck is exactly. I would suggest that you need to profile both Java (YourKit Java Profiler, AsyncProfiler, etc.) and Native (C++) code (perf, grpof, Valgrind, Intel VTune, etc) - in this way you will be able to confirm if the problem is the Java Comparator API for RocksJava (i.e. crossing the boundary), the performance of your comparators themselves, or something else...

Kind regards. Adam.

Jon Groff

unread,

Feb 22, 2024, 9:26:47 PM2/22/24

to rocksdb

Hi Adam,

Thanks for the response and suggestions. As I mentioned in the first rather dense posting I utilized Java VisualVM sampling to determine the bottleneck was in RocksJava 'get'. Granted, it's not the best profiler but it does show where the most time is spent in the Java code. It looks like about 5% of the time is being spent in the comparators down the line to RocksJava AbstractComparator, etc. I am using Externalizable serialized objects so I can control the key order in the classes. My custom comparator is derived from AbstractComparator and performs serialization and deserialization of the ByteBuffers, then invokes the Comparator compareTo. Nothing fancy or complex, just comparing long values after reading the bytes and converting. I rearranged the order of the 12 long keys in the triple store index such that the most significant values are stored out in the byte key of RocksDb first, and that seems to help a bit in the initial writing up to a point, then the bottlenecking resumed at larger volumes of data. I also messed with options and get some initial improvement but again, larger volumes of data bogged down. Is there a definitive and verbose tuning guide somewhere?

Thanks,

Jon

Jon Groff

unread,

Feb 22, 2024, 9:50:21 PM2/22/24

to rocksdb

Greetings,

It's possible I am misinterpreting the sampling. I was looking at this:

java.io.ObjectInputStream.readObject () 33,096 ms (9.2%) 33,096 ms (9.2%)

But it turns out that you have to manually expand everything to get any idea of actual totals. I am still hoping for a definitive tuning guide.

MY quandary is that the code is just not using much in the way of resources. I can see it being slow if it was eating all available disk and CPU bit its

limping along at around 5% CPU and barely peaking at a few megabytes per second disk write. I am using a 12 core threadripper 4.3 Ghz Lenovo tower

with SSD, so its not a supercomputer. I am using multithreading writing multiple databases and still resource utilization in minimal. This relates to my original question of how to coax RocksJava to utilize more resources.

Thanks,

Jon

Adam Retter

unread,

Feb 23, 2024, 3:29:40 AM2/23/24

to Jon Groff, rocksdb

> As I mentioned in the first rather dense posting I utilized Java VisualVM sampling to determine the bottleneck was in RocksJava 'get'.

Yes, understood.

> Granted, it's not the best profiler but it does show where the most time is spent in the Java code. It looks like about 5% of the time is being spent in the comparators down the line to RocksJava AbstractComparator, etc.

I am not sure that is fine-grained enough to know if the issue is the
JNI overhead or not. There is a lot going on behind "get(...)".

> I am using Externalizable serialized objects so I can control the key order in the classes. My custom comparator is derived from AbstractComparator and performs serialization and deserialization of the ByteBuffers, then invokes the Comparator compareTo. Nothing fancy or complex, just comparing long values after reading the bytes and converting.

If the issue does reveal itself to be JNI overhead for comparator's
implemented in Java - Perhaps a different approach would be to have an
internal and an external key. The internal key would be used for the
key in the primary index that you store in RocksDB, instead of the
user implementing Comparable, they could implement a different
Interface, for example InternalKey, that interface would have a single
method like InternalKey#toInternalKey() that returns a ByteBuffer (or
byte[]) of a key that must be ordered in a byte-wise manner - in that
way you need not have custom comparators in RocksDB and you could use
the native (C++) in-built bytewise comparator that is very fast.

Another option may be to do something with GraalVM to compile Java
comparators into native code - but I haven't looked into options here
in any detail...

> I rearranged the order of the 12 long keys in the triple store index such that the most significant values are stored out in the byte key of RocksDb first, and that seems to help a bit in the initial writing up to a point, then the bottlenecking resumed at larger volumes of data.

I am not sure I follow! In this approach have you gotten rid of Java
Comparators? If so, and you still see a performance issue then this
indicates that the problem is not your custom comparators or the JNI
overhead, but something else!

> I also messed with options and get some initial improvement but again, larger volumes of data bogged down. Is there a definitive and verbose tuning guide somewhere?

There is no comprehensive guide I am afraid. As this is a embeddable
low-level library, how to tune it is quite application specific.

> Thanks,
> Jon
>
> On Thursday, February 22, 2024 at 3:59:58 PM UTC-8 adam....@googlemail.com wrote:
>>
>> Hi Jon,
>>
>> Alan is correct that Java Comparators in RocksDB can impose a significant overhead. This is due to having to cross the JNI Java to/from C++ language boundary for comparing each value for storage in the SST.
>>
>> However, may I suggest before we worry about Comparators, that you use some profiling tools to find out where your bottleneck is exactly. I would suggest that you need to profile both Java (YourKit Java Profiler, AsyncProfiler, etc.) and Native (C++) code (perf, grpof, Valgrind, Intel VTune, etc) - in this way you will be able to confirm if the problem is the Java Comparator API for RocksJava (i.e. crossing the boundary), the performance of your comparators themselves, or something else...
>>
>> Kind regards. Adam.
>>
>> On Thursday 22 February 2024 at 19:03:48 UTC jonat...@earthlink.net wrote:
>>>
>>>
>>> Hi Alan,
>>> Thanks for the reply and suggestion. Unfortunately the system revolves around implementation of the Comparable interface and polymorphism in derived implementations so 'nativizing' isnt practical. BTW FYI the system is in github.com/neocoretechs/relatrix and github.com/neocoretechs/rocksack for future reference. I will look at the BytewiseComparator. The system is pretty performant on subclasses of keysets up to a point then write performance systematically degrades to near uselessness. As in; 4000 to 5000 writes per second in basic keyset class of 12 contiguous longs to around 1 write per second when writing around 6x subclasses of that class plus payloads of strings and integers of several hundred bytes. That's why I thought it might be a tuning issue. In general I am really happy with RocksJava (I am using v7.7.3) and this is my first real potential showstopper.
>>> Thanks again,
>>> Jon
>>> On Thursday, February 22, 2024 at 9:52:32 AM UTC-8 al...@evolvedbinary.com wrote:
>>>>
>>>> Hi Jon,
>>>>
>>>> One thing you might look at is your comparators. Java comparators are known to be A LOT less performant that native C++ ones, due to the high bridging costs. @see the comments in BytewiseComparator .. if you are able to native-ize the comparator, or find a way to encode your keys consistent with an existing native comparator, that might help a lot.
>>>>
>>>> --Alan
>>>>
>>>> On Thursday 22 February 2024 at 17:18:39 UTC jonat...@earthlink.net wrote:
>>>>>
>>>>> Greetings, In general, is there a way to coax RocksJava to utilize more processor = resources for better performance? I can elaborate on what I am doing, but = briefly, I am wrapping RocksJava to create a NoSQL triple store style DB = with a Category theoretic/functional flavor. My keys are a series of longs = derived from UUIDs and serialized Comparators. Sampling from VisualVM shows RocksJava 'get' soaking up a lot of time but perf mon says very little is = being used in the way of CPU and Disk. I have multithreading writing = multiple databases so I am invoking parallelism on my end. My key sizes = hover around 1k and I set up writes so keys have maximum variability in = starting bytes. I am using my own basic comparators derived from = AbstractComparator. My options look like this: Options options =3D new Options(); final Filter bloomFilter =3D new BloomFilter(10); final Statistics stats =3D new Statistics(); options.setComparator(comparator); try { options.setCreateIfMissing(true) .setStatistics(stats) .setWriteBufferSize(8 * SizeUnit.MB) .setMaxWriteBufferNumber(3) .setMaxBackgroundJobs(24) .setCompressionType(CompressionType.SNAPPY_COMPRESSION) .setCompactionStyle(CompactionStyle.LEVEL); } catch (final IllegalArgumentException e) { bloomFilter.close(); throw new RuntimeException(e); } options.setMemTableConfig( new HashSkipListMemTableConfig() .setHeight(4) .setBranchingFactor(4) .setBucketCount(2000000)); options.setMemTableConfig( new HashLinkedListMemTableConfig() .setBucketCount(100000)); options.setMemTableConfig( new VectorMemTableConfig().setReservedSize(10000)); options.setMemTableConfig(new SkipListMemTableConfig()); options.setTableFormatConfig(new PlainTableConfig()); // Plain-Table requires mmap read options.setAllowMmapReads(true); final BlockBasedTableConfig table_options =3D new BlockBasedTableConfig(); Cache cache =3D new LRUCache(64 * 1024, 6); table_options.setBlockCache(cache) .setFilterPolicy(bloomFilter) .setBlockSizeDeviation(5) .setBlockRestartInterval(10) .setCacheIndexAndFilterBlocks(true) .setBlockCacheCompressed(new LRUCache(64 * 1000, 10)); options.setTableFormatConfig(table_options); Thanks in advance, Jon Groff
>

> --
> You received this message because you are subscribed to a topic in the Google Groups "rocksdb" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/rocksdb/KdvflzBNzJA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to rocksdb+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/82fcbe94-19bf-4340-9a24-d3565c57093fn%40googlegroups.com.

--
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Reply all

Reply to author

Forward