When to use Solr vs. Orient Lucene index?

Thelonius Buddha

unread,

Jul 6, 2015, 6:18:50 PM7/6/15

to orient-...@googlegroups.com

We have some needs around basic text search of internal documents. From what I'm reading, it appears OrientDB can handle much of the basic searching needs. Can anyone provide insight on when I may need to include or migrate to Solr for text search? Here are some of my concerns:

1. The ability to update quickly with a tremendous amount of meta-data for each document.

2. Distributed indices over multiple machines

3. A good query parser, so I don't have to translate "solr vs orientdb" into "solr AND orientdb" manually.

I appreciate your insights,

b

Patrick Hoeffel

unread,

Jul 7, 2015, 9:48:14 AM7/7/15

to orient-...@googlegroups.com

I worked on a prototype project that used Solr external to OrientDB. It worked, but the overall performance was better than I feared it might be, but still not quite as good as I had hoped. I will say, however, that we managed to take almost 20,000 Solr document IDs, format all of them into a single OSQL query (OrientDB 2.0.x), and have OrientDB return correct results.

One of the biggest challenges with this strategy, for us, was that Solr's internal boosting and weighting capabilities become more difficult to manage since you have to maintain your Solr result set in memory while you're waiting for the OrientDB results, then you have to merge the two sets to get a final weighted result to present to the user.

If I had it to do over again, I think I would probably push to use the internal Lucene index instead of Solr, provided the solution was not going to extend outside of a single machine. If your strategy requires a cluster, then I think Solr is probably your only option. In that case I would put as much of the text as I could into Solr and use OrientDB to store almost nothing but the relationships in order to minimize traversal time.

It's a hard problem, and I wish you best of luck with it.

-Patrick

Thelonius Buddha

unread,

Jul 7, 2015, 2:31:48 PM7/7/15

to orient-...@googlegroups.com

I sure do love actual experience. Thanks!

Your note about the cluster brings up something I totally overlooked. I'm supposed OrientDB can't do the distributed index. The documentation on the Lucene integration is pretty sparse.

Patrick Hoeffel

unread,

Jul 7, 2015, 2:46:04 PM7/7/15

to orient-...@googlegroups.com

My guess is that OrientDB will index in Lucene whatever data lands on a given node in the cluster. If every node is a complete replica of all the data, then the local Lucene index should have all you need. If, however, the actual data is sharded across multiple machines in the cluster (like Solr does), then your local Lucene index will only contain part of the data. I haven't really studied the OrientDB distributed data model, so I can't say for sure, but based on what I read yesterday about Hazelcast (which is the technology that underlies the OrientDB multi-master distributed database capability, and which is excellent technology in its own right), I would guess that each node is a complete replica. Again, though, please confirm this for yourself.

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/i_R19rqDWnQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.