Storm 0.7.0 release candidate available

111 views
Skip to first unread message

nathanmarz

unread,
Feb 9, 2012, 5:04:51 PM2/9/12
to storm-user
A release candidate is now available for Storm 0.7.0. The release
candidate contains a ton of bug fixes and improvements in addition to
what was available in the development release from a couple weeks ago.
I'd also like to point out that this release contains contributions
from six people besides myself – so keep the pull requests coming!

Storm 0.7.0-rc is available for download from the downloads page:
https://github.com/nathanmarz/storm/downloads

Storm 0.7.0-rc is available from Maven under the version "0.7.0-rc".

There are two major new features in Storm 0.7.0. The first is
transactional topologies, which is a higher level abstraction on top
of Storm's primitives that let you get fully fault-tolerant, exactly
once messaging semantics for your computations. This is a really big
deal. The documentation is here: https://github.com/nathanmarz/storm/wiki/Transactional-topologies

Transactional topologies require a more sophisticated input source
than something like Kestrel or RabbitMQ. storm-contrib contains a
TransactionalSpout implementation for Apache Kafka (
https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka ).
There are versions available for both Kafka 0.6 and Kafka 0.7.

Another big new feature is component-specific configurations. These
let you set configurations on a per-spout or per-bolt basis. So you
can do things like only set TOPOLOGY_DEBUG on just a single bolt in
the topology, or set different values for TOPOLOGY_MAX_SPOUT_PENDING
for different spouts in the topology. You can also do things like have
a spout register serializers within the spout implementation and those
serializers will be available to the whole topology. More information
on this wiki page: https://github.com/nathanmarz/storm/wiki/Configuration

There are some breaking API changes in this release:

1. All bolt and spout interfaces (IRichBolt, IRichSpout, IBasicBolt,
etc) have a new method getComponentConfiguration on them. The easiest
fix is to change your bolts/spouts from implementing the interface to
extending one of BaseRichBolt, BaseRichSpout, etc. These base classes
provide empty implementations for commonly unused methods (like
cleanup and getComponentConfiguration).

2. The isDistributed method has been removed from spouts. This
functionality has been subsumed by component-specific configurations.
To force a spout to execute as a single task regardless of the
parallelism set for it, use code like this:
http://www.google.com/url?sa=D&q=https://github.com/nathanmarz/storm/blob/master/src/jvm/backtype/storm/transactional/TransactionalSpoutCoordinator.java%23L145&usg=AFQjCNEFj_WsTA55b8UasPd-7dKQcB7j5g

Additionally, the usage of IRichBolt's has been deprecated for DRPC
topologies. It's recommended that you use the much simpler BatchBolts
instead. See the reach implementation in storm-starter for an example
of this:
http://www.google.com/url?sa=D&q=https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/ReachTopology.java&usg=AFQjCNHzQUk5mdgDBzKNmbONZjQ6y2N2pQ

3. The interface for CustomStreamGrouping has changed to receive a
List<Object> rather than a Tuple.

Here are the full notes for this release:

* Transactional topologies: a new higher level abstraction that
enables exactly-once messaging semantics for most computations.
Documented on the wiki.
* Component-specific configurations: Can now set configurations on a
per-spout or per-bolt basis.
* New batch bolt abstraction that simplifies the processing of
batches in DRPC or transactional topologies. A new batch bolt is
created per batch and they are automatically cleaned up.
* Introduction of base classes for various bolt and spout types.
These base classes are in the backtype.storm.topology.base package and
provide empty implementations for commonly unused methods
* CoordinatedBolt generalized to handle non-linear topologies. This
will make it easy to implement a non-linear DRPC topology abstraction.
* Can customize the JVM options for Storm UI with new ui.childopts
config
* BigIntegers are now serializable by default
* All bolts/spouts now emit a system stream (id "__system").
Currently it only emits startup events, but may emit other events in
the future.
* Optimized tuple trees for batch processing in DRPC and
transactional topologies. Only the coordination tuples are anchored.
OutputCollector#fail still works because CoordinatedBolt will
propagate the fail to all other tuples in the batch.
* CoordinatedBolt moved to backtype.storm.coordination package
* Clojure test framework significantly more composable
* Massive internal refactorings and simplifications, including
changes to the Thrift definition for storm topologies.
* Optimized acking system. Bolts with zero or more than one consumer
used to send an additional ack message. Now those are no longer sent.
* Changed interface of CustomStreamGrouping to receive a List<Object>
rather than a Tuple.
* Added "storm.zookeeper.retry.times" and
"storm.zookeeper.retry.interval" configs (thanks killme2008)
* Added "storm help" and "storm help {cmd}" to storm script (thanks
kachayev)
* Logging now always goes to logs/ in the Storm directory, regardless
of where you launched the daemon (thanks haitaoyao)
* Improved Clojure DSL: can emit maps and Tuples implement the
appropriate interfaces to integrate with Clojure's seq functions
(thanks schleyfox)
* Added "ui.childopts" config (thanks ddillinger)
* Bug fix: OutputCollector no longer assumes immutable inputs
* Bug fix: DRPC topologies now throw a proper error when no DRPC
servers are configured instead of NPE (thanks danharvey)
* Bug fix: Fix local mode so multiple topologies can be run on one
LocalCluster
* Bug fix: "storm supervisor" now uses supervisor.childopts instead
of nimbus.childopts (thanks ddillinger)
* Bug fix: supervisor.childopts and nimbus.childopts can now contain
whitespace. Previously only the first token was taken from the string

Danny

unread,
Feb 9, 2012, 6:03:43 PM2/9/12
to storm-user
Great job nathan. keep up the great work
> parallelism set for it, use code like this:http://www.google.com/url?sa=D&q=https://github.com/nathanmarz/storm/...
>
> Additionally, the usage of IRichBolt's has been deprecated for DRPC
> topologies. It's recommended that you use the much simpler BatchBolts
> instead. See the reach implementation in storm-starter for an example
> of this:http://www.google.com/url?sa=D&q=https://github.com/nathanmarz/storm-...

Sam Stokes

unread,
Feb 10, 2012, 6:02:45 PM2/10/12
to storm-user
> Transactional topologies require a more sophisticated input source
> than something like Kestrel or RabbitMQ. storm-contrib contains a
> TransactionalSpout implementation for Apache Kafka (https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka).
> There are versions available for both Kafka 0.6 and Kafka 0.7.

Hey Nathan, transactional topologies sound really useful. I'm curious
what features they require from the underlying event source that mean
they can't be implemented on simple message queues. Is it just a
question of atomically acking or rejecting a bunch of events at once?
(RabbitMQ does actually have some transaction support, although I
think the semantics are a bit subtle and I'm not sure if anyone uses
them.)

--
Sam

Nathan Marz

unread,
Feb 13, 2012, 2:32:35 AM2/13/12
to storm...@googlegroups.com
In the implementation that's available in 0.7.0-rc, you need:

1. Ability to ack a batch of tuples atomically
2. Ability to replay a batch you already emitted

Even if RabbitMQ allows for #1, I doubt that it can do #2. Transactional topologies never "ack" tuples, they simply move on to the next batch. That is, you wouldn't be able to do something like ack the tuples in the commit phase, because some other portion of the commit phase might fail and you'll need to replay all those tuples.

Kafka is a natural fit for transactional topologies because you consume tuples by specifying an offset into a stream. If you ever need to replay, you can just go back to that offset.

--
Twitter: @nathanmarz
http://nathanmarz.com

Reply all
Reply to author
Forward
0 new messages