Stateful components in Vertigo 1.0

70 views
Skip to first unread message

Jordan Halterman

unread,
Feb 2, 2015, 4:06:04 AM2/2/15
to vertx-...@googlegroups.com
Hey all,

I'm moving into development of Vertigo 1.0 again now, and I want to outline what I'm working on now. While there are a ton of high level features to be considered for the future, many fundamental components of the messaging framework remain incomplete, and I intend to focus on those aspects only for the time being.

After many long months working mostly by myself and a bit with my brother, Copycat (http://github.com/kuujo/copycat) is now feature complete and undergoing significant testing in distributed environments. While I anticipate it will take another month or so to clean up, I am now beginning to integrate it into Vertigo 1.0. I wanted to talk a bit about Copycat's design as it relates to Vertigo and how I plan to integrate the two frameworks.

Over the past few months Copycat was reconstructed from the ground up to facilitate the integration of a consistent distributed log into any synchronous or asynchronous environment. This design was created largely with Vertigo in mind with the goal of supporting fault tolerant, stateful Vertigo components.

The concept of stateful Vertigo components via Copycat is simple. Vertigo component instances are implemented on top of Copycat's state logs. State logs are strongly consistent, Raft replicated logs designed for persisting and replicating state based on an ordered series of commands representing some state.

In the context of Vertigo, this means each Vertigo component instance may consist of one or many replicas. Copycat's election algorithm elects a leader among the set of component replicas, and messages sent to the component are sent to that leader instance. When a message is received on a component's input port, the message is logged - either in memory or on disk - and replicated to a quorum of the component's replicas prior to being applied to the port's message handler. Additionally, Vertigo (via Copycat) will periodically take a snapshot of the component's state and compact the log as necessary. If a component fails, a new leader will be elected among the component's replicas and new messages will be logged and replicated via that leader. If the component does not support replication or all the replicas fail, once the component is restarted Vertigo (via Copycat) will install the component's last snapshot and replay events to rebuild the component's state. This works because Copycat guarantees that entries committed to the state log will appear in the same order on each of the component's replicas.

Copycat is designed to facilitate this type of state management while maintaining high throughput by supporting partitioned resources within the Copycat cluster. By partitioning Copycat resources among multiple component instances, Vertigo should still be able to achieve excellent throughput with the right configuration. In order to prove this, I'm going to benchmark partitioned state machines in Copycat on EC2 next weekend.

It will take a lot of careful planning and design to get each of these pieces to fit together well, but I'm confident it can be done in a fairly timely manner (as compared to Copycat itself). I'll keep you posted on the progress of the messaging layer in the coming weeks.
Reply all
Reply to author
Forward
0 new messages