Axel Morgner
unread,May 24, 2012, 2:18:25 PM5/24/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ne...@googlegroups.com
Hi all,
in our project team we recently discussed how to build a really scalable search solution for our Neo4j-based project.
We already have a search engine in place, using integrated lucene for all kinds of search tasks (f.e. finding the start node for traversals, or general purpose search across multiple properties), neo4j-spatial for distance search etc.. At the moment, our backend is running fine out of a single-instance server, using embedded Neo4j. Our REST backend is designed to create composed JSON responses with arbitrary structure aggregated from the underlying nodes and relationships. We're happily servíng about 1,000 requests/sec on a single server.
But we asked ourselves how to scale that up to let's say about 100,000 req/sec or more?
We discussed two scenarios:
#1 Set up an external search cluster (f.e. with Solr, or AES), asynchronously indexing composed 'documents' (resembling the most common response formats), and routing all search requests to the cluster. The search response would either contain all information needed to create a the search result list in all frontends.
#2 Replicate the main application with the full-featured search (and comprehensive JSON response composition) using embedded Neo4j in slave or r/o mode, accessing the same db files over a NAS, or using the HA protocol.
We would appreciate discussing pros and cons, or even share some experience on this!
Thank you and greetings
Axel