Storm Community Contributions (Re-Post)

533 views
Skip to first unread message

P. Taylor Goetz

unread,
Nov 8, 2011, 10:53:26 PM11/8/11
to storm...@googlegroups.com
(Sorry to the re-post, I inadvertently replied to a thread, when I meant to start a new thread :) )

Nathan/Storm Community,

I know it's still relatively early since Storm's release into the "Open Source Wild," but I also think that's the perfect time to establish (or at least start thinking about establishing) some guidelines regarding submitting contributions to the project.

The questions I would have are:

- What is the preferred method of submitting/proposing contributions? (i.e. github fork + pull request; post/email patch file; github issue + patch file)
- How are source code (i.e. storm source) vs. contribs (spout/bolt plugins, add-ons, etc.) handled?

One big question would be package/module naming. I think it would be relatively obvious that if one were submitting a fix or update to the main storm source, they would follow the established package naming conventions (i.e. "backtype.storm.*" -- though personally I would prepend with a "com." ;) ).

But what about contributions, such as spouts, bolts, and utilities?

I'm lucky enough to work for a company that both embraces the use of OSS, and is willing and eager to contribute back. We're seriously considering including storm in our production stack. In support of that we've developed a number of components that we're eager to release back to the community:

Storm JMS Spout (Ready for Community Release)
-A generic JMS spout. To use it you implement an interface that turns a javax.jms.Message object into a backtype.storm.tuple.Values object, and declare the fields it outputs. A second interface implementation tells the spout how to connect to JMS (a supplied spring implementation is easiest, but other options are relatively easy) when deployed.

- It supports the Storm guaranteed delivery model (configurable on a per-message basis). It leverages JMS queue/topic  message acknowledgement capabilities to support Storm's anchor+ack/fail model. 

- So far we've tested it against ActiveMQ, Weblogic JMS, and Oracle AQ (It's coded strictly to the JMS API, so theoretically any JMS implementation should work).


Storm JMS Bolt (Ready for Community Release)
- Analog to the JMS Spout, that publishes received Tuples to a JMS destination.
- You provide a class that converts a backtype.storm.Tuple to a javax.jms.Message, it does the rest (via configuration).

Storm Cassandra Bold (Testing)
- Configurable bolt implementation that persists back type.storm.Tuple objects to a Cassandra column family based on a supplied mapping object.

Storm Maven Plugin (Under Development)
- Maven 2/3 plugin to package/deploy/run a topology either locally, or to a cluster.


I'd like to get this stuff out there for review, use, and reciprocal contributions. But I'd also like to avoid fragmentation within the storm community.

- Taylor

Nathan Marz

unread,
Nov 9, 2011, 1:04:56 AM11/9/11
to storm...@googlegroups.com
First off, the spouts/bolts you guys have produced sound awesome and I'm looking forward to seeing them open-sourced.

With regards to contributing to Storm, the preferred approach is Github forks + pull requests. There's actually already been a number of contributions already using this approach. I'm going to write a wiki page this week about contributing to Storm.

As for contrib code, I think the right approach is to have a "storm-contrib" project, with a subfolder for each contributed module. Everyone who contributes a module will have commit access to that project but will be expected only to commit to their module. The advantage of centralization is the ease for users for finding a wide variety of library code. Of course, if you want to manage your module yourself under your own account I have no problem with that -- submitting modules to storm-contrib is wholly the decision of the module-writer.

To Taylor and everyone else reading this -- what are your thoughts on this approach?

-Nathan

--
Twitter: @nathanmarz
http://nathanmarz.com

P. Taylor Goetz

unread,
Nov 11, 2011, 1:29:04 AM11/11/11
to storm...@googlegroups.com
Nathan/Storm Community,

I've released an initial version of "storm-jms" here: https://github.com/ptgoetz/storm-jms

For now it just includes our spout. The bolt is a placeholder for now until I have time to migrate that code.

I haven't gotten around to creating a readme yet, but the code is fairly well javadoc'ed.

For now the java package structure is in "backtype.storm.contrib.jms.*"

The maven gropupId/artifactId is "backtype.storm.contrib/storm-jms"


Quick Start:
---------------------
* Download/Install/run apache active-mq (default settings should be fine)
* Run java class "backtype.storm.contrib.jms.example.ExampleJmsTopology" ()


Key classes/interfaces:

Comments/questions/pull requests are more than welcome!

- Taylor

Patrick Houk

unread,
Dec 8, 2011, 2:20:44 PM12/8/11
to storm-user
Hi,

I think I see a problem with the JMS Spout's implementation of Storm's
guaranteed delivery API. This is just based on looking at the JMS
documentation; I haven't tested it out, and I am also not very
familiar with JMS, so hopefully I am missing something.

The javadoc for JMS message acknowledgement (see [1]) says:

"Acknowledges all consumed messages of the session of this consumed
message.
...
A client may individually acknowledge each message as it is consumed,
or it may choose to acknowledge messages as an application-defined
group (which is done by calling acknowledge on the last received
message of the group, thereby acknowledging all messages consumed by
the session.)"

Doesn't this mean that *all* un-acked messages that have been consumed
by the Spout will be ack'd to the JMS broker whenever Storm ack's
*any* Message? In other words, there seem to be two incompatible
definitions of "ack": Storm acks an individual message, but JMS acks
everything received so far.

Perhaps configuring the Spout to use a transactional session helps?
Again looking at the javadoc, I don't think it would. It actually
doesn't seem like there is a point in having Storm track tuples for
messages received on a transactional session, since the JMS ack will
just be a no-op anyway:

"Calls to acknowledge are ignored for both transacted sessions and
sessions specified to use implicit acknowledgement modes."

Maybe it is not possible to implement Storm's guaranteed messaging on
top of standard JMS?

- Pat

[1] http://docs.oracle.com/javaee/6/api/javax/jms/Message.html#acknowledge()

On Nov 11, 1:29 am, "P. Taylor Goetz" <ptgo...@gmail.com> wrote:
> Nathan/Storm Community,
>
> I've released an initial version of "storm-jms" here:https://github.com/ptgoetz/storm-jms
>
> For now it just includes our spout. The bolt is a placeholder for now until I have time to migrate that code.
>
> I haven't gotten around to creating a readme yet, but the code is fairly well javadoc'ed.
>
> For now the java package structure is in "backtype.storm.contrib.jms.*"
>
> The maven gropupId/artifactId is "backtype.storm.contrib/storm-jms"
>
> Quick Start:
> ---------------------
> * Download/Install/run apache active-mq (default settings should be fine)
> * Run java class "backtype.storm.contrib.jms.example.ExampleJmsTopology" ()
>
> Key classes/interfaces:

> JmsSpout (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)
> JmsProvider (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)
> JmsTupleProducer (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)


>
> Comments/questions/pull requests are more than welcome!
>
> - Taylor
>
> On Nov 9, 2011, at 1:04 AM, Nathan Marz wrote:
>
>
>
>
>
>
>
> > First off, the spouts/bolts you guys have produced sound awesome and I'm looking forward to seeing them open-sourced.
>
> > With regards to contributing to Storm, the preferred approach is Github forks + pull requests. There's actually already been a number of contributions already using this approach. I'm going to write a wiki page this week about contributing to Storm.
>
> > As for contrib code, I think the right approach is to have a "storm-contrib" project, with a subfolder for each contributed module. Everyone who contributes a module will have commit access to that project but will be expected only to commit to their module. The advantage of centralization is the ease for users for finding a wide variety of library code. Of course, if you want to manage your module yourself under your own account I have no problem with that -- submitting modules to storm-contrib is wholly the decision of the module-writer.
>
> > To Taylor and everyone else reading this -- what are your thoughts on this approach?
>
> > -Nathan
>

P. Taylor Goetz

unread,
Dec 8, 2011, 10:17:56 PM12/8/11
to storm...@googlegroups.com
Hi Pat,

Thanks for your input. You bring up some interesting and valid points.

In general, when using the JMS Bolt (in its current state), you'd likely want to keep the 'jmsTransactional" property to it's default value of 'false', mainly because the spout does not currently expose the javax.jms.Session object that would allow you to perform transaction management operations such as commit(), rollback(), etc.

I've seen differing behavior based on the session transactional property, which is why I chose to expose it as an option for the JMS bolt.

That may have been a wrong decision, and one worthy of reconsideration. And I will look into it further.

My initial goal was to support message (and thus storm tuple) replay when a tuple resulting from a JMS message failed processing within a topology.

To that end using AUTO_ACKNOWLEDGE (you don't care if tuples are acked or not) and CLIENT_ACKNOWLEDGE (you DO care if tuples are asked) work as expected as long as you leave the "jmsTransactional" property to the default of 'false'.

The use cases I've dealt with so far have mainly been along the lines of "process a message (and ack it), or fail it so it will be replayed." This model maps very easily into storms model.

When you start talking about using transactional JMS sessions (acks are ignored, and you manage the commit/rollback of multiple JMS messages yourself), things get a bit more complex. In that case you would have to consider the (default) 30-second timeout for anchored tuples, as well as the logic for performing commits/rollbacks on a transactional JMS session.

I'd be interested in hearing more about your use case, and how JMS transactional sessions come into play.

To answer your ultimate question of "Is it possible to implement Storm's guaranteed messaging on top of standard JMS?" -- I believe the answer is yes.

The "storm-jms" implementation for now covers guaranteed processing. It does not, however, provide ways to deal with complex commit/rollback workflows involving transactional JMS sessions.

Again, a solid use case would help a lot in terms of figuring out how best to handle transactional JMS sessions. My initial reaction is to disable transactional sessions.

Thanks for the feedback!

I look forward to hearing your opinions.

- Taylor

Reply all
Reply to author
Forward
0 new messages