Support for Kafka messages restore

Ashwini Mhatre (asmhatre)

unread,

Jun 13, 2017, 4:40:51 AM6/13/17

to secor...@googlegroups.com, Keshava Bharadwaj H P (kehp), Rohan Pandit (rohpandi)

Hi,

I am using secor to take backup of kafka messages on S3.

I also need to use this tool to restore messages back to kafka from s3.

Do you have any support to provide restore mechanism with secor tool.

If yes,

Could you share code base of that feature so that I can use it.

Thank you.

Regards,

Ashwini

hc...@pinterest.com

unread,

Jun 13, 2017, 1:41:36 PM6/13/17

to secor-users, ke...@cisco.com, rohp...@cisco.com, asmh...@cisco.com

Secor was mostly dealing with saving kafka messages in S3. For the restore to work, you can write a custom producer to produce the message from S3 file back to S3. The tricky part if the file and message order of S3 file/messages. In order to preserve the FIFO order of kafka messages, you need to generate S3 file/messages in the same order as incoming kafka messages. Secor has an OffsetMessageParser which preserves the offset order in the S3 file, you can use that. But the part of reading S3 file and publish to Kafka needs to be written, you can take a look at SequenceFileReaderWriterFactory to see how to read the file.

hc...@pinterest.com

unread,

Jun 13, 2017, 1:44:57 PM6/13/17

to secor-users, ke...@cisco.com, rohp...@cisco.com, asmh...@cisco.com

Also there are few utility tool: LogFilePrinterMain, LogFileVerifierMain and TestLogMessageProducerMain should be able to give you the skeleton code on reading log file and publishes back to kafka.

Ashwini Mhatre (asmhatre)

unread,

Jun 20, 2017, 5:30:41 AM6/20/17

to hc...@pinterest.com, secor-users, Keshava Bharadwaj H P (kehp), Rohan Pandit (rohpandi), Pandi Pitchai (ppitchai)

Hi,

To enable restore using Secor I have written custom producer which is able to restore messages on each topic.

But how to ensure commit log backup and restore with this tool.

Is there any feature in Secor which enable taking backup and restore of commit log so that current kafka consumer of old kafka cluster will read messages after last committed offset from restore kafka cluster(new kafka cluster)?

Please let me know if secor has feature of maintaining commitlogs.

Thank you.

Regards,

Ashwini

Henry Cai

unread,

Jun 20, 2017, 1:02:55 PM6/20/17

to Ashwini Mhatre (asmhatre), secor-users, Keshava Bharadwaj H P (kehp), Rohan Pandit (rohpandi), Pandi Pitchai (ppitchai)

What's your use case? If your new kafka cluster is just replaying the old kafka cluster, you don't really need secor, you can use kafka's mirror maker tool to setup the mirror between two kafka cluster.

If the use case is replaying some very old kafka logs (which were already deleted in the original kafka cluster) for forensic analysis, do your client know which exact offset on the old cluster they need to replay from? If they do, the s3 files you saved through secor is organized by offsets (each filename has the starting offseting), you can find the file to replay from (not exact, to be exact you would need a custom code to filter out the messages which is before that offset in that file).

Ashwini Mhatre (asmhatre)

unread,

Jun 22, 2017, 3:49:19 AM6/22/17

to Henry Cai, secor-users, Keshava Bharadwaj H P (kehp), Rohan Pandit (rohpandi), Pandi Pitchai (ppitchai)

Hi,

Our use case is disaster recovery for kafka cluster.

For example ,we have one kafka cluster in AWS which goes down . In this case we (kafka cluster consumers) should able to restore messages from brand new kafka cluster.

-this brand new kafka cluster recovers from backup messages stored in s3 by secor.

Will this be possible with sécor tool.

Thank you.

Regards,

Ashwini

Henry Cai

unread,

Jun 22, 2017, 2:00:28 PM6/22/17

to Ashwini Mhatre (asmhatre), secor-users, Keshava Bharadwaj H P (kehp), Rohan Pandit (rohpandi), Pandi Pitchai (ppitchai)

Do you want to restore the new kafka cluster back to a certain time in history? If that's the case, you would need to know the original offset number in the original cluster and upload s3 files to the new cluster based on this offset number. If your downstream consumer don't need exact-once semantics, you can start uploading from an earlier time/offset. Your downstream consumer needs to switch to listen to the new cluster from the beginning of the kafka queue (the offset # in the old and new kafka cluster is going to be different, you cannot reuse that number for your downstream consumer)

But if you just need to keep an online kafka backup cluster, you don't really need Secor/S3 in the picture. You can use kafka mirror maker tool to directly link the the backup kafka cluster to the primary kafka cluster. It's easier to setup that setup. The downside is you cannot rewind the backup kafka cluster to a very old point in history without the long-term S3 storage, and it's a bit hard for your downstream consumer to know where in the backup kafka queue they should point to when the switch-over happens. They most likely has to play conservative by rewinding to a point several minutes earlier than the cut-over time.

i...@ianduffy.ie

unread,

Jul 21, 2017, 2:17:48 PM7/21/17

to secor-users, hc...@pinterest.com, ke...@cisco.com, rohp...@cisco.com, ppit...@cisco.com, asmh...@cisco.com

Hi Ashwini,

> To enable restore using Secor I have written custom producer which is able to restore messages on each topic.

Any chance you could share your code for this?

Reply all

Reply to author

Forward