I think I see a problem with the JMS Spout's implementation of Storm's
guaranteed delivery API. This is just based on looking at the JMS
documentation; I haven't tested it out, and I am also not very
familiar with JMS, so hopefully I am missing something.
The javadoc for JMS message acknowledgement (see [1]) says:
"Acknowledges all consumed messages of the session of this consumed
message.
...
A client may individually acknowledge each message as it is consumed,
or it may choose to acknowledge messages as an application-defined
group (which is done by calling acknowledge on the last received
message of the group, thereby acknowledging all messages consumed by
the session.)"
Doesn't this mean that *all* un-acked messages that have been consumed
by the Spout will be ack'd to the JMS broker whenever Storm ack's
*any* Message? In other words, there seem to be two incompatible
definitions of "ack": Storm acks an individual message, but JMS acks
everything received so far.
Perhaps configuring the Spout to use a transactional session helps?
Again looking at the javadoc, I don't think it would. It actually
doesn't seem like there is a point in having Storm track tuples for
messages received on a transactional session, since the JMS ack will
just be a no-op anyway:
"Calls to acknowledge are ignored for both transacted sessions and
sessions specified to use implicit acknowledgement modes."
Maybe it is not possible to implement Storm's guaranteed messaging on
top of standard JMS?
- Pat
[1] http://docs.oracle.com/javaee/6/api/javax/jms/Message.html#acknowledge()
On Nov 11, 1:29 am, "P. Taylor Goetz" <ptgo...@gmail.com> wrote:
> Nathan/Storm Community,
>
> I've released an initial version of "storm-jms" here:https://github.com/ptgoetz/storm-jms
>
> For now it just includes our spout. The bolt is a placeholder for now until I have time to migrate that code.
>
> I haven't gotten around to creating a readme yet, but the code is fairly well javadoc'ed.
>
> For now the java package structure is in "backtype.storm.contrib.jms.*"
>
> The maven gropupId/artifactId is "backtype.storm.contrib/storm-jms"
>
> Quick Start:
> ---------------------
> * Download/Install/run apache active-mq (default settings should be fine)
> * Run java class "backtype.storm.contrib.jms.example.ExampleJmsTopology" ()
>
> Key classes/interfaces:
> JmsSpout (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)
> JmsProvider (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)
> JmsTupleProducer (https://github.com/ptgoetz/storm-jms/blob/master/src/main/java/backty...)
>
> Comments/questions/pull requests are more than welcome!
>
> - Taylor
>
> On Nov 9, 2011, at 1:04 AM, Nathan Marz wrote:
>
>
>
>
>
>
>
> > First off, the spouts/bolts you guys have produced sound awesome and I'm looking forward to seeing them open-sourced.
>
> > With regards to contributing to Storm, the preferred approach is Github forks + pull requests. There's actually already been a number of contributions already using this approach. I'm going to write a wiki page this week about contributing to Storm.
>
> > As for contrib code, I think the right approach is to have a "storm-contrib" project, with a subfolder for each contributed module. Everyone who contributes a module will have commit access to that project but will be expected only to commit to their module. The advantage of centralization is the ease for users for finding a wide variety of library code. Of course, if you want to manage your module yourself under your own account I have no problem with that -- submitting modules to storm-contrib is wholly the decision of the module-writer.
>
> > To Taylor and everyone else reading this -- what are your thoughts on this approach?
>
> > -Nathan
>
Thanks for your input. You bring up some interesting and valid points.
In general, when using the JMS Bolt (in its current state), you'd likely want to keep the 'jmsTransactional" property to it's default value of 'false', mainly because the spout does not currently expose the javax.jms.Session object that would allow you to perform transaction management operations such as commit(), rollback(), etc.
I've seen differing behavior based on the session transactional property, which is why I chose to expose it as an option for the JMS bolt.
That may have been a wrong decision, and one worthy of reconsideration. And I will look into it further.
My initial goal was to support message (and thus storm tuple) replay when a tuple resulting from a JMS message failed processing within a topology.
To that end using AUTO_ACKNOWLEDGE (you don't care if tuples are acked or not) and CLIENT_ACKNOWLEDGE (you DO care if tuples are asked) work as expected as long as you leave the "jmsTransactional" property to the default of 'false'.
The use cases I've dealt with so far have mainly been along the lines of "process a message (and ack it), or fail it so it will be replayed." This model maps very easily into storms model.
When you start talking about using transactional JMS sessions (acks are ignored, and you manage the commit/rollback of multiple JMS messages yourself), things get a bit more complex. In that case you would have to consider the (default) 30-second timeout for anchored tuples, as well as the logic for performing commits/rollbacks on a transactional JMS session.
I'd be interested in hearing more about your use case, and how JMS transactional sessions come into play.
To answer your ultimate question of "Is it possible to implement Storm's guaranteed messaging on top of standard JMS?" -- I believe the answer is yes.
The "storm-jms" implementation for now covers guaranteed processing. It does not, however, provide ways to deal with complex commit/rollback workflows involving transactional JMS sessions.
Again, a solid use case would help a lot in terms of figuring out how best to handle transactional JMS sessions. My initial reaction is to disable transactional sessions.
Thanks for the feedback!
I look forward to hearing your opinions.
- Taylor