Integrating Neo4j with Apache Solr (Lucene)

942 views
Skip to first unread message

Graph01

unread,
Oct 22, 2012, 4:19:18 PM10/22/12
to ne...@googlegroups.com
I am seeking a way for doing traversals on a neo4j graph which is dynamically created from billions of Indexed Solr document Fields.
The relationships that should be created are for example:
Document FieldA Value - [Document text answers the solr query XYZ:XYZ] -> Document FieldB Value

I want to have thousands of rules like that and additionally, I want Neo4j to be consistent (most of the time) with the data in Solr.
The wanted Queries XYZ are not known in advance.

I have already written a Solr RequestHandler that generates these relationships according to user request.
But, what is missing is the ability to trace back easily from the data in Neo4j to the data in Solr from which it came.
Another problem is that each insertion from Solr to Neo4j is very time consuming. 
I seek a solution for somehow to integrate both engines more seamlessly.
Anyone knows of what is the best approach for achieving this?

Michael Hunger

unread,
Oct 22, 2012, 4:48:34 PM10/22/12
to ne...@googlegroups.com
What is the use-case for this approach?

How do you approach the insert so far? Can you share some details / code ? To find out what's time consuming.
Which integration do you use? Java embedded, or remote server?

Can you have a "reference" to the solr document? if so you can put it as property onto the neo4j node or relationship.

HTH

Michael
> --
>
>

Yevgeni Nogin

unread,
Oct 22, 2012, 5:26:17 PM10/22/12
to ne...@googlegroups.com
One use case for example is inducing a graph over all the metadata and full-text content keywords of talkbacks, wall posts and such...
And that is for doing better analytics for questions like finding a friend which have written about subject X to someone who studies in your school...

The application user adds "lucene query" logic for retrieving the messages that are representing the relationships he wants to create between the nodes.
Then, the graph is periodically induced from the document store (based on Solr) and the user is able to quey it with Cypher.

I use the remote server with plugins.
Currently I extended the org.apache.solr.handler.component.SearchComponent and overriden the process function in order to gather a recursive faceting (see http://wiki.apache.org/solr/HierarchicalFaceting) and sends the relationships to Redis. Then, a simple Neo4j plugin pulls the data out out of redis.

Thanks for the quick response,

YN

--



Tero Paananen

unread,
Oct 22, 2012, 6:30:33 PM10/22/12
to ne...@googlegroups.com
We've done similar things by very loose coupling the systems by
triggering events to sync external indexes. We use Solr on one of our
apps, and Elasticsearch on another.

In our case it isn't quite critical that the external indeces are 100%
in sync in real-time or even all the time, so we could live with an
occasional hiccup with the syncing functionality. We run housekeeping
scripts to make sure they don't get too badly out of sync.

Essentially what we have is a simple listener pattern that sends HTTP
calls over to Solr/Elasticsearch whenever certain actions occur
(create, delete, update) successfully. On our Ruby apps we're farming
the "events" out to be processed by workers running on their own
servers.

We've had it working pretty successfully even under a very heavy load.

-TPP
Reply all
Reply to author
Forward
0 new messages