This is essentially what DRPC and transactional topologies provide
you. They use CoordinatedBolt for determining how many tuples were
sent where and when a task has received all the tuples for a
particular batch.
If you're just dealing with a large amount of finite input, a system
like Hadoop is a better choice because it provides fine-grained fault-
tolerance within the computation of a single job.
If you just want to process "small" batches of messages (you don't
mind having to restart the entire computation on the batch from
scratch), then Storm works fine. You should be able to use
transactional topologies for this:
https://github.com/nathanmarz/storm/wiki/Transactional-topologies