Scaling search

38 views
Skip to first unread message

Axel Morgner

unread,
May 24, 2012, 2:18:25 PM5/24/12
to ne...@googlegroups.com
Hi all,

in our project team we recently discussed how to build a really scalable search solution for our Neo4j-based project.

We already have a search engine in place, using integrated lucene for all kinds of search tasks (f.e. finding the start node for traversals, or general purpose search across multiple properties), neo4j-spatial for distance search etc.. At the moment, our backend is running fine out of a single-instance server, using embedded Neo4j. Our REST backend is designed to create composed JSON responses with arbitrary structure aggregated from the underlying nodes and relationships. We're happily servíng about 1,000 requests/sec on a single server.

But we asked ourselves how to scale that up to let's say about 100,000 req/sec or more?

We discussed two scenarios:

#1 Set up an external search cluster (f.e. with Solr, or AES), asynchronously indexing composed 'documents' (resembling the most common response formats), and routing all search requests to the cluster. The search response would either contain all information needed to create a the search result list in all frontends.

#2 Replicate the main application with the full-featured search (and comprehensive JSON response composition) using embedded Neo4j in slave or r/o mode, accessing the same db files over a NAS, or using the HA protocol.

We would appreciate discussing pros and cons, or even share some experience on this!

Thank you and greetings
Axel






Daniel Corbett

unread,
May 24, 2012, 5:06:24 PM5/24/12
to ne...@googlegroups.com
I think the answer to that question depends on the queries themselves, which would determine yor cache hit ratio, along with the update rate of the data. If the update rate is high and/or the queries are mostly unique then your cache hit ratio will be low and the cache will do you little good. Therefore you are better off with the slave search databases. I am implementing a scheme where updates are fed through an event bus with a pub sub protocol. It's easy too scale this by adding as many slaves search databases as desired.

Daniel Corbett

unread,
May 24, 2012, 5:06:24 PM5/24/12
to ne...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages