Where are Kafka consumer offsets stored?

311 views

Skip to first unread message

Abraham Sultan

unread,

Mar 1, 2021, 1:40:47 PM3/1/21

to Druid User

I've read in multiple conversations that Druid stores consumer offsets internally rather than using Kafka's mechanism. For example here: https://groups.google.com/g/druid-user/c/sNOyoXR5WDE.

I'm curious of where in particular does Druid store this information?

More specifically this is the scenario we are trying to solve for: We are migrating from one Druid cluster that ingests data from a Kafka topic to another new cluster. The new cluster uses a different metadata store and deep storage account but has access to the source Kafka topic. Both clusters are in the same version (0.18).

The steps we are following look something like this:
- Stop ingestion in the source cluster

- Stop all services in the source cluster to avoid changes to the metadata store and deep storage

- Copy segments over to new deep storage bucket

- Copy metadata druid_segments table from old cluster to new cluster and update load specs to match new location
- Start services in new cluster

- Resume ingestion.

The last step in the process is what we are a bit worried about. We are not sure if the new cluster is going to try to ingest from the beginning or will it know to resume from where it left off? What do we need to do to make sure we can resume as needed and avoid duplicate data or loss of data?

Thanks in advance

Peter Marshall

unread,

Mar 10, 2021, 4:52:58 AM3/10/21

to Druid User

Hey Abraham - just got around to answering this in https://groups.google.com/g/druid-user/c/WEfJLxrYjI4 - sorry for the delay!

FYI the topic offset in the MDDB is updated once the data is published fully - so you will for sure need to do more than just copy the segments table...

I'm not sure there's necessarily a doc that says how to migrate everything - you might want to ask around in the ASF Slack channel as I think there are a number of people who have done such things...