Proposed change to improve ShellBolt performance

Barry Hart

unread,

Aug 6, 2012, 1:00:15 PM8/6/12

to storm...@googlegroups.com

In doing some performance testing with bolts written in Python, I found that it was taking ~7.1 seconds to emit 18,000 tuples. In looking at the code involved, I see that after every tuple emitted, storm.py is reading task IDs from Storm. In my bolt, I was not using the task IDs, so I decided to try and make them optional.

I made an experimental change to the handleEmit() function in ShellBolt.java as follows:

if(task==null) {

Object need_task_ids = action.get("need_task_ids");

List<Integer> outtasks = _collector.emit(stream, anchors, tuple);

if (need_task_ids == null || ((Boolean) need_task_ids).booleanValue())

_pendingWrites.put(outtasks);

} else {

_collector.emitDirect((int)task.longValue(), stream, anchors, tuple);

}

With this change (and a corresponding change to storm.py to set need_task_ids=false), the time to emit 18,000 tuples decreased to 2.8 seconds, for a 250% performance improvement. Are there any down sides to this change? Would it be a good change to put in the main Storm code base? Let me know, and I will be happy to create a pull request.

On a related note, has anyone looked into the possibility of implementing Storm's multilanguage support using ZeroMQ instead of stdout/stdin? It might create a big performance boost, especially in combination with the change above.

Barry

Nathan Marz

unread,

Aug 11, 2012, 6:21:21 PM8/11/12

to storm...@googlegroups.com

I think this is a good idea. Do you think it's better to make it configurable on a request by request basis, like this, or to configure whether task ids should be sent during initialization of the multilang subprocess?

--
Twitter: @nathanmarz
http://nathanmarz.com

Barry Hart

unread,

Aug 28, 2012, 3:34:54 PM8/28/12

to storm...@googlegroups.com

It's probably better to do it at initialization, as you suggest. I did it the other way mainly as a proof of concept.

I have an open pull request related to this -- would you like me to rework the code and submit an updated pull request?

Barry

Reply all

Reply to author

Forward