If you’d like more values for comparison (thanks for your values), my PoC:
- source json with average 300B per line, read at max 600MB/sec
- spout with a separate thread and an internal queue to split the reading of the file and the nextTupple emit
- bolt that just counts messages and bytes.
the best I can reach are around 50K msg/sec and 20MB/sec (160Mbps). It obviously degrades if more machines are involved due to the serialization and network.
Without the thread and queue I was getting:
- reading one line and emitting, even with a buffered reader, and it went down to KB/sec
- reading lines and emitting on a loop (as I’m not expecting acks), would go up to 300MB/sec, but die with out of memory in about 5 seconds. As the memory was allocated inside zeromq, manually, it is quite a funny error because GC can’t do anything about it.
- Hence the middle ground of the thread concentrated on the IO and put on queue, and the nexttupple just taking out of it.