I've read in multiple conversations that Druid stores consumer offsets internally rather than using Kafka's mechanism. For example here:
https://groups.google.com/g/druid-user/c/sNOyoXR5WDE.
I'm curious of where in particular does Druid store this information?
More specifically this is the scenario we are trying to solve for: We are migrating from one Druid cluster that ingests data from a Kafka topic to another new cluster. The new cluster uses a different metadata store and deep storage account but has access to the source Kafka topic. Both clusters are in the same version (0.18).
The steps we are following look something like this:
- Stop ingestion in the source cluster
- Stop all services in the source cluster to avoid changes to the metadata store and deep storage
- Copy segments over to new deep storage bucket
- Copy metadata druid_segments table from old cluster to new cluster and update load specs to match new location
- Start services in new cluster
- Resume ingestion.
The last step in the process is what we are a bit worried about. We are not sure if the new cluster is going to try to ingest from the beginning or will it know to resume from where it left off? What do we need to do to make sure we can resume as needed and avoid duplicate data or loss of data?
Thanks in advance