Scaling Out Rexster

James Thornton

unread,

Jul 21, 2011, 8:05:45 PM7/21/11

to gremli...@googlegroups.com

I have been looking a different ways of scaling Rexster.

One way would be to configure the stack to use the DB's native high availability features (if possible). It should be easy to do this for Neo4j because it looks like it's simply a matter of using the HighlyAvailableGraphDatabase class instead of the EmbeddedGraphDatabase class, and Marko is looking into adding it to Blueprints (https://github.com/tinkerpop/blueprints/issues/134). And read-only slaves are easy to do with Neo4j (http://wiki.neo4j.org/content/Online_Backup_HA).

A more generic approach might be to run multiple instances of Rexster, and use a message bus to write to all instances simultaneously.

For example, you could use RabbitMQ to write to two instances of Rexster while storing a "message ID" as a DB property. When Rexster completes the write, it returns an ack to acknowledge the write, and if the client crashes or the ack gets lost, you'll be able to prevent duplicate writes because the last "message ID" will stored as a property (see https://groups.google.com/forum/#!topic/rabbitmq-discuss/6b3H413IKxk).

For this message-bus approach to work, you would need to be able to create unique indices on properties (similar to this https://github.com/tinkerpop/rexster/issues/140), and Rexster would need to include ack messages in its response.

Has anyone tried something like this before?

Gary Berger (gaberger)

unread,

Jul 21, 2011, 8:23:42 PM7/21/11

to gremli...@googlegroups.com, gremli...@googlegroups.com

Hey James. I always thought that Zed's Mongrel2 approach with 0MQ would be a good model.

http://mongrel2.org/

G

Sent from my iPhone

stephen mallette

unread,

Jul 22, 2011, 8:48:57 AM7/22/11

to Gremlin-users

James, regarding the thought of RabbitMQ/ack, you mention that
"Rexster would need to include ack messages in its response". Perhaps
you were abbreviating a bit, but is that necessarily how it would
work? I haven't ever worked with RabbitMQ, but is it capable of
directly publishing to a REST-based service? I would guess that
Rexster would not be a direct consumer of messages. Wouldn't you have
a separate consumer responsible for reading messages and writing them
to Rexster? That consumer would in turn be responsible for the ack as
it would know if Rexster succeeded or not.

I had some other thoughts on the topic, but I'll see if you can get me
on the same page as you are before I go any deeper.

Best regards,

Stephen

On Jul 21, 8:23 pm, "Gary Berger (gaberger)" <gaber...@cisco.com>
wrote:

> Hey James. I always thought that Zed's Mongrel2 approach with 0MQ would be a good model.
>
> http://mongrel2.org/
>
> G
>
> Sent from my iPhone
>

> On Jul 21, 2011, at 8:06 PM, "James Thornton" <james.thorn...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I have been looking a different ways of scaling Rexster.
>
> > One way would be to configure the stack to use the DB's native high availability features (if possible). It should be easy to do this for Neo4j because it looks like it's simply a matter of using the HighlyAvailableGraphDatabase class instead of the EmbeddedGraphDatabase class, and Marko is looking into adding it to Blueprints (https://github.com/tinkerpop/blueprints/issues/134). And read-only slaves are easy to do with Neo4j (http://wiki.neo4j.org/content/Online_Backup_HA).
>
> > A more generic approach might be to run multiple instances of Rexster, and use a message bus to write to all instances simultaneously.
>

> > For example, you could use RabbitMQ to write to two instances of Rexster while storing a "message ID" as a DB property. When Rexster completes the write, it returns an ack to acknowledge the write, and if the client crashes or the ack gets lost, you'll be able to prevent duplicate writes because the last "message ID" will stored as a property (seehttps://groups.google.com/forum/#!topic/rabbitmq-discuss/6b3H413IKxk).
>
> > For this message-bus approach to work, you would need to be able to create unique indices on properties (similar to thishttps://github.com/tinkerpop/rexster/issues/140), and Rexster would need to include ack messages in its response.

James Thornton

unread,

Jul 22, 2011, 12:58:06 PM7/22/11

to gremli...@googlegroups.com

Hey Stephen -

Yes, there would need to be a consumer. I have never worked with RabbitMQ either, but creating a consumer doesn't look like it would be too involved -- see http://gavinroy.com/the-attention-deficit-disorder-guide-to-rabbi and http://www.rabbitmq.com/api-guide.html .

However, it may be simpler to write to Rexster directly and have it write out to a mirror on another instance (like you were talking about in email).

What are your thoughts?

- James

stephen mallette

unread,

Jul 22, 2011, 3:42:56 PM7/22/11

to gremli...@googlegroups.com

I think the RabbitMQ approach makes sense with an external consumer
which writes to the various Rexster servers. It would be nice if you
could host that consumer directly in Rexster, which is something I'd
thought about some time ago. It would be nice if there was an
ExtensionPoint that allowed you start a process or something on
Rexster start-up. I haven't really thought it all through, but it
seemed like a useful feature.

I've been wondering to what extent being able to configure EventGraph
in Rexster would help with replication:

https://github.com/tinkerpop/blueprints/wiki/Event-Implementation

Not sure if there is a good way to guarantee an update this way
though...at least not without putting RabbitMQ (or the like) in the
mix.

James Thornton

unread,

Jul 22, 2011, 4:36:51 PM7/22/11

to gremli...@googlegroups.com

A good starting point would be to identify the desired goals, such as...

* write redundancy

* read-only slaves

* hot failover

* sync slaves as they come back online

* etc...

Is there generic way to do this for the different types of databases Blueprints supports -- such as embedded vs distributed (e.g. InfiniteGraph)?

James Thornton

unread,

Jul 22, 2011, 6:03:44 PM7/22/11

to gremli...@googlegroups.com

On Thursday, July 21, 2011 7:23:42 PM UTC-5, gabe...@cisco.com wrote:

Hey James. I always thought that Zed's Mongrel2 approach with 0MQ would be a good model.

Hi Gabe -

How does that work?

- James

Reply all

Reply to author

Forward