Upgrading from 0.7x to 0.8x

237 views
Skip to first unread message

Tom Brown

unread,
Aug 6, 2012, 6:15:08 PM8/6/12
to storm...@googlegroups.com
I know there are some significant conceptual shifts between storm
0.7.x and 0.8.x. There is also the addition of the Trident project
(which supersede both transactional and DRPC topologies).

What I haven't found is a good list detailing best-practices for
upgrading from 0.7.x to 0.8.x. Has anyone yet performed that upgrade?
What did you have to change in your spouts/bolts? Do you have to
completely grok the new conceptual changes and make fundamental
changes to your topologies, or do they mostly just work?

Thanks in advance!

--Tom

ivolo

unread,
Aug 6, 2012, 6:34:52 PM8/6/12
to storm...@googlegroups.com
Hey Tom,

I've upgraded today. The main issue I ran into was the Kryo upgrade from 1.x to 2.x. This shouldn't cause problems for you if you don't have any custom serializers. I had one for JodaTime's DateTime class, so I had to upgrade from https://github.com/magro/kryo-serializers/blob/master/src/main/java/de/javakaffee/kryoserializers/jodatime/JodaDateTimeSerializer.java to https://github.com/magro/kryo-serializers/blob/kryo2/src/main/java/de/javakaffee/kryoserializers/jodatime/JodaDateTimeSerializer.java . Other than, it went pretty smooth. 

I didn't change anything in the spouts or the bolts, but am also a little curious whether I missed something. I looked in storm-starter, and it doesnt look like the bolts and spouts changed. 

I am not completely done testing yet, and will let you know if I run into any other issues. 

- Ilya

Shrijeet Paliwal

unread,
Aug 6, 2012, 6:40:03 PM8/6/12
to storm...@googlegroups.com
Tom & Ivolo, 
Depending on case, spouts and bolts may need change if you launch additional threads that use the OutputCollector without explicit locking in your application code. 

OutputCollector is no longer thread safe. 
--
Shrijeet

Barry Hart

unread,
Aug 6, 2012, 6:56:21 PM8/6/12
to storm...@googlegroups.com
I updated a Python topology and didn't have to make any changes at all.

Barry

Nathan Marz

unread,
Aug 9, 2012, 2:34:15 AM8/9/12
to storm...@googlegroups.com
As mentioned, the only API changes were the Kryo serialization and changing OutputCollector to be non-thread safe.

Other than making OutputCollector non thread-safe, the performance improvements are completely internal. And the addition of the executors abstraction was designed to be backwards compatible.


On Mon, Aug 6, 2012 at 3:56 PM, Barry Hart <barrywa...@gmail.com> wrote:
I updated a Python topology and didn't have to make any changes at all.

Barry



--
Twitter: @nathanmarz
http://nathanmarz.com

jason

unread,
Aug 29, 2012, 9:10:22 PM8/29/12
to storm...@googlegroups.com
Does the data stored in the Tuple's values list need to be thread safe now too?  We are seeing some rare ConcurrentModificationExceptions occuring (see stack trace below).  We have an object called ValuesList that contains a HashMap (and some primitives).  The ConcurrentModificationException occurs when ValuesList's toString function is called when the tuple is getting marked as failed.  We are using the localOrShuffle grouping and I have a hunch that after the tuple is passed within the same JVM it asynchronously is marked as failed and this causes a race condition where the data in the tuple is being modified by our bolt while its toString() method is being called by another thread.  Has anyone else seen something like this?

java.lang.RuntimeException: java.util.ConcurrentModificationException
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:82)
at backtype.storm.utils.DisruptorQueue.consumeBatch(DisruptorQueue.java:43)
at backtype.storm.disruptor$consume_batch.invoke(disruptor.clj:53)
at backtype.storm.daemon.executor$fn__3948$fn__3989$fn__3990.invoke(executor.clj:421)
at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:390)
at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:409)
at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:408)
at java.util.AbstractMap.toString(AbstractMap.java:518)
at java.lang.String.valueOf(String.java:2902)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at com.xxxxx.yyyyyy.core.data.ValuesList.toString(ValuesList.java:169)
at clojure.core$str.invoke(core.clj:497)
at clojure.core$fn__5241.invoke(core_print.clj:94)
at clojure.lang.MultiFn.invoke(MultiFn.java:167)
at clojure.core$pr_on.invoke(core.clj:3266)
at clojure.core$print_map$fn__5292.invoke(core_print.clj:197)
at clojure.core$print_sequential.invoke(core_print.clj:58)
at clojure.core$print_map.invoke(core_print.clj:200)
at clojure.core$fn__5295.invoke(core_print.clj:204)
at clojure.lang.MultiFn.invoke(MultiFn.java:167)
at clojure.core$pr_on.invoke(core.clj:3266)
at clojure.lang.Var.invoke(Var.java:419)
at clojure.lang.RT.print(RT.java:1717)
at clojure.lang.RT.printString(RT.java:1696)
at clojure.lang.APersistentMap.toString(APersistentMap.java:20)
at clojure.core$str.invoke(core.clj:497)
at clojure.core$str$fn__3777.invoke(core.clj:501)
at clojure.core$str.doInvoke(core.clj:503)
at clojure.lang.RestFn.invoke(RestFn.java:460)
at backtype.storm.daemon.executor$fail_spout_msg.invoke(executor.clj:280)
at backtype.storm.daemon.executor$fn$reify__3952.expire(executor.clj:340)
at backtype.storm.utils.RotatingMap.rotate(RotatingMap.java:56)
at backtype.storm.daemon.executor$fn__3948$tuple_action_fn__3955.invoke(executor.clj:345)
at backtype.storm.daemon.executor$mk_task_receiver$fn__3940.invoke(executor.clj:312)
at backtype.storm.disruptor$clojure_handler$reify__1584.onEvent(disruptor.clj:43)
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:79)
... 6 more

Thanks,

--Jason

Nathan Marz

unread,
Aug 30, 2012, 1:57:02 AM8/30/12
to storm...@googlegroups.com
You should treat values in tuples as immutable, as they will definitely be touched by other threads. Sometimes they're not serialized at all – when the target task is in the same worker process.
Reply all
Reply to author
Forward
0 new messages