New to CQEngine

671 views
Skip to first unread message

ri...@thrymr.net

unread,
Mar 25, 2016, 6:12:05 AM3/25/16
to cqengine-discuss
Thanks for creating such a cool project. Knowing that it is used in snapdeal.com, gives a greater degree of comfort. I just discovered about this and trying to do a small proof of concept.
I have two queries
1. How does it work when it comes when the same application runs in multiple nodes, is clustering supported out of the box, if not what is the process ?
2. Can someone please point me to a tutorial or documentation on how to use this along with an ORM. We use play framework (Java) and use EBean as a simple ORM layer.

Thanks in advance.
-Rishi

Niall Gallagher

unread,
Mar 25, 2016, 12:17:26 PM3/25/16
to cqengine...@googlegroups.com
Hi Rishi,

Thanks. Well it’s not documented but CQEngine is in production in many more companies than those I mentioned, especially FinTech, cloud, and telecoms companies.

CQEngine isn’t distributed or clustered by default, but you can integrate it with various distributed caches or clustering solutions if you wish as follows.

The idea is to think about replicating your data and querying it as separate problems:

- Problem 1 is Replication: you have a collection of objects you want to replicate between machines, and you want to be notified on each machine about objects being added or removed from the replicated collection on any machine. 
—> Solution: Nearly all distributed caches provide a “cache listener” feature to facilitate this. Take a look at as Ehcache, Hazelcast, Infinispan, Coherence etc.

- Problem 2 is Querying: you want to use CQEngine to be able to search for objects in the replicated collection.
—> Solution: Create a separate instance of CQEngine IndexedCollection on each machine, and use the listener functionality provided by your distributed cache of choice, which notifies of object additions or removals in the cluster, to add and remove the same objects to/from the IndexedCollection on the local machine. So you can the search the replicated copy of the collection via the CQEngine IndexedCollection on every machine.

Note that the consistency of the set of objects in the IndexedCollection between machines, would rely entirely on the consistency guarantees provided by your distributed cache of choice. There are various tradeoffs and caveats in this space - and it’s why I’d prefer to allow CQEngine to integrate with various third party caches than to recommend a particular one.

To answer your second question (ORM): I don’t really know how ORM integration could be documented per-se as it’s quite simple. For example if you are using Hibernate, just get Hibernate to load a List of Car objects from your database table “Cars”. Then call indexedCollection.addAll(cars); 

Hope that helps!
Niall


--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Winarto Zhao

unread,
May 3, 2017, 11:27:12 AM5/3/17
to cqengine-discuss
The only problem with CQEngine in cluster is that each instance of IndexCollection has to have the entire collection. Meaning if the whole collection of objects can fit into single jvm memory, then the cluster can works. But if the whole collection can't fit into single jvm memory and needs to be spread across jvm, then CQEngine can't index the objects in other jvm.

I used to do some POC with that a few years ago, but I still can't find solution. Not sure if there is one now.

bugfoot

unread,
May 5, 2017, 3:26:50 PM5/5/17
to cqengine-discuss
Why is separate indexing a problem if you can later retrieve the ResultSets individually and "merge" them? I'm not even sure if latency is a problem in this case... does it take longer time to retrieve the results from multiple collections or their union collection?

Niall

unread,
May 29, 2017, 5:24:53 PM5/29/17
to cqengine-discuss
Hi Winarto,

I recall we had a similar conversation a few years back - and back then you had integrated CQEngine with Hazelcast: https://groups.google.com/d/msg/cqengine-discuss/F6P_HfLEhnc/kCxOZgqzRhEJ

CQEngine itself does not have any built-in support for sharding.
However I believe that sharding should be a separate concern, handled by other libraries or frameworks.

CQEngine's IndexedCollection is an implementation of java.util.Set.
So it should be trivial for any sharding library or framework to integrate with it. (One example would be to use the approach I posted above.)

There is more to sharding than the partitioning and replication of data throughout the cluster though. For example if the data is sharded within the cluster, then the application will need to take care to route queries for a particular partition of the data to the correct shard.

I have actually used CQEngine in a replicated sharded environment like that before, but the sharding approach was completely unique to the application. So this is a reason why I'd prefer to allow CQEngine to be integrated with arbitrary sharding solutions or frameworks, than to try to build one into CQEngine itself.

I'm open to suggestions if any of this could be improved though!

Best regards,
Niall

Winarto

unread,
May 30, 2017, 2:32:53 PM5/30/17
to cqengine-discuss
Hi Niall,

Yes I totally agree that sharding is not a concern for CQEngine. CQEngine is totally awesome with its indexing and super fast query. However its data is limited to the amount of memory allocated to the JVM it is running on. I've read that you've improved CQEngine to run on the off-heap, with some performance trade-off.

When I did POC CQEngine with Hazelcast a few years ago, I managed to shard the collection through out multiple JVM, however the index itself remains in the same JVM and it contains the whole collections. That's something that I've not been able to crack at that time and the POC time was over. I believe if the index itself can be sharded across JVM (of course with the ability to translate the index query through Hazelcast, perhaps it could work.

Cheers,
Wins

--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to a topic in the Google Groups "cqengine-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cqengine-discuss/0E3pURuJGio/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cqengine-discu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--

Regards,
Winarto

Niall

unread,
May 30, 2017, 7:19:43 PM5/30/17
to cqengine-discuss
Yes there is support to store the collection in off-heap memory or on disk now. However there are some performance tradeoffs (because following on-heap object references is simply orders of magnitude faster than having to do IO or serialization/deserialization). Pros and cons!
To unsubscribe from this group and all its topics, send an email to cqengine-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--

Regards,
Winarto

Reply all
Reply to author
Forward
0 new messages