Using HugeCollections for off-heap collections?

258 views
Skip to first unread message

Otis Gospodnetic

unread,
Apr 23, 2014, 12:13:53 AM4/23/14
to cqengine...@googlegroups.com
Hi,

I'm new to CQEngine, haven't tried it yet, but I was recently looking at https://github.com/OpenHFT/HugeCollections (check https://github.com/OpenHFT/HugeCollections/wiki for some performance / memory numbers) and..... this also made me wonder if CQEngine would benefit from using off-heap collections provided by HugeCollections project?


What do you think?

Thanks,
Otis

John Smith

unread,
Apr 23, 2014, 9:29:34 AM4/23/14
to cqengine...@googlegroups.com
Niall mentioned something about this. But he's a busy man :)

Niall

unread,
Apr 23, 2014, 1:28:01 PM4/23/14
to cqengine...@googlegroups.com
:)

Hi Otis,

There was some earlier discussion about HugeCollections: https://groups.google.com/d/msg/cqengine-discuss/5GiVQLGKk4U/VHTT8bAuQLIJ
It's fair to say CQEngine would benefit a lot from off heap collections. There is some support to do that right now (as discussed there) but it requires a lot of "manual wiring" as you can see. This is definitely an area where I'd welcome any enhancements and patches :D

HTH,
Niall

Winarto Zhao

unread,
Apr 28, 2014, 8:04:26 AM4/28/14
to cqengine...@googlegroups.com
Niall,

I tried to wire CQEngine with OpenHFT HugeCollection as the underlying Set, however randomly CQEngine's ResultSet is empty while being iterated. It just happens randomly and I can't really figure out when it will be empty and when it won't be empty.

Do you know what it might happen? It doesn't behave that way if the underlying set is from JDK Set.

Do I also need to wire the Map used for indexing with HugeCollection?

Winarto Zhao

unread,
Apr 29, 2014, 6:15:04 AM4/29/14
to cqengine...@googlegroups.com
Niall,

Further investigation shows that if the IndexCollection is being queried by multiple threads at a time (by using Java 8's parallel stream), the one or more of the ResultSet will produce empty.

Could you help to investigate?

Regards,
Winarto

Niall Gallagher

unread,
Apr 29, 2014, 2:35:34 PM4/29/14
to cqengine...@googlegroups.com

Hi Winarto,

I'm away at the moment so can't check this until next week.

But in general:
- CQEngine expects that all of the data structures wired in, support multithreaded access
- If you wire in a custom implementation, ensure that it is thread safe. If the implementation is not inherently thread safe though, you can make it thread safe by wrapping access to it in a read-write lock.
- It's important to do this on write paths to prevent corruption by multiple competing writes (obviously). But it's also important to do this on the read paths to attain volatile semantics, so that reading threads actually see data written by the writing threads. Without this, reading threads may not see objects written by other threads.

- CQEngine ResultSets are not designed to be iterated by more than one thread in parallel (but of course multiple threads may be iterating different resultsets in parallel). So when setting up your parallel streams, ensure that the stream is parallelized after a single threaded stage which iterates the ResultSet. That is a single reader, feeds objects to a pool of multiple worker threads.

HTH,
Niall

Sent from my HTC

--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suminda Dharmasena

unread,
Jun 18, 2014, 12:05:55 PM6/18/14
to cqengine...@googlegroups.com, Peter Lawrey, Peter Lawrey
Adding Peter to the thread who is behind OpenHFT in case he might be able to give more insight and contribute to the conversation.

Peter Lawrey

unread,
Jun 18, 2014, 4:04:40 PM6/18/14
to cqengine...@googlegroups.com
SharedHashMap in HugeCollections is thread safe and concurrent across processes.  It is also off heap, updated synchronously and persisted (written to disk)  The fastest time to update between processes is 40 ns, but more typical use cases are around 400 ns.

Niall Gallagher

unread,
Jun 19, 2014, 9:29:30 AM6/19/14
to cqengine...@googlegroups.com
SharedHashMap looks very promising thanks Peter.

There was an earlier discussion about using HugeCollections with CQEngine. Long story short, it is currently possible, but basically the user must avoid storing objects in the IndexedCollection and store foreign keys instead.

The current approach when indexes are on-heap, is that indexes just store object references pointing to the original objects in the collection. But when indexes are off-heap, object references would be serialized such that each index would persist a complete copy of the indexed object, which would be much slower and use more disk space than necessary. So basically I'm thinking about adding support to configure an "object resolver" which maps objects to foreign keys so the foreign keys can be serialized instead.

The two items on the CQEngine road map are: transaction isolation using MVCC instead of locking, and better support for off-heap collections.
Reply all
Reply to author
Forward
0 new messages