Write idempotence

197 views
Skip to first unread message

Tom Brown

unread,
Aug 28, 2012, 4:39:32 PM8/28/12
to storm...@googlegroups.com
I have a topology that reads from a Kafka topic and writes to a
different Kafka topic. At times, we end up with duplicate records in
the output topic because a transient failure while processing a tuple
will cause a batch of records to be replayed from the spout.

I can switch this to a transactional topology if I have to, but I
don't know how to make Kafka writes idempotent. Has anybody else dealt
with something like this?

--Tom

Nathan Marz

unread,
Aug 28, 2012, 4:42:30 PM8/28/12
to storm...@googlegroups.com
This is a great question for the Kafka mailing list. I don't think it's possible now, but I would love for Kafka to support a feature where you can append to a specific partition using your own transaction id, and have it automatically ignore your append if the txid is the same as the last write.
--
Twitter: @nathanmarz
http://nathanmarz.com

Ted Dunning

unread,
Aug 28, 2012, 4:43:41 PM8/28/12
to storm...@googlegroups.com
Or if it maintained a bounded and reasonably small FIFO of txid's and ignored the write if any match.

Evan Chan

unread,
Aug 29, 2012, 3:55:20 PM8/29/12
to storm...@googlegroups.com
Kafka isn't designed for idempotency, as its designed for high speed linear disk accesses.    I think you'd have to add an additional layer on top.

To make Kafka idempotent, you'd almost want a different storage layer, maybe Cassandra, where you can do idempotent writes based on the message or batch ID.

Brian O'Neill

unread,
Aug 29, 2012, 4:00:04 PM8/29/12
to storm...@googlegroups.com
+1, we're using… Kafka -> Cassandra. (for idempotent writes)

We have some upgrades to the Storm Cassandra bolt that make this even easier.  

They are available here:

But we're waiting on fix for:

The bolt presently has conflicting dependencies w/ Storm. (due to the use of Astyanax)

-brian

---

Brian O'Neill

Lead Architect, Software Development

Apache Cassandra MVP

 

Health Market Science

The Science of Better Results

2700 Horizon Drive  King of Prussia, PA  19406

M: 215.588.6024 @boneill42    

healthmarketscience.com


This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

Nathan Marz

unread,
Aug 30, 2012, 2:02:37 AM8/30/12
to storm...@googlegroups.com
It would be a very simple addition to Kafka, as Storm would be responsible for serializing writes to the partitions across txids. Kafka would just need to remember what the last txid written to that partition was. Then you could easily do idempotent writes to Kafka via Storm. The txid information would have to be atomically associated with the messages for this to work at all.  
Reply all
Reply to author
Forward
0 new messages