Is it possible to delete a schema registered in SchemaRegistry ?

5,384 views
Skip to first unread message

Mathieu Garstecki

unread,
Sep 2, 2015, 12:26:52 PM9/2/15
to Confluent Platform
Hello,

I'm trying to create automated tests around schema upgrades with Kafka and SchemaRegistry.

I'm looking for a way to delete a schema for a given topic in the registry, so I can start afresh for each test run. Alternatively, deleting all schemas for a given topic could work too. Is there any way (even a hack, this is just for tests) that could allow me that ?

Thanks,

Mathieu Garstecki

Geoff Anderson

unread,
Sep 2, 2015, 12:49:34 PM9/2/15
to confluent...@googlegroups.com
Hi Mathieu,

Great to hear! By the way, if you are interested in how we've been writing automated system tests, consider checking out ducktape, a command-line tool and library for system testing:

https://github.com/confluentinc/ducktape (command line tool and library to help with system tests)

https://github.com/confluentinc/muckrake (some of our confluent platform system tests live here)

On to your question:

There's no build-in way to clear out schemas, but since you're in a development environment where it's ok to break things, there are a couple ways to do this which all boil down to the same two steps:
Note that both option A and option B clear out *all* schemas (not just schemas per topic).

(1) get rid of Kafka data in the "_schemas" topic
(2) get rid of persistent zookeeper data storing the upper bound on the current batch

Option A
This is the more careful option. It's best to shut you schema registries down first.
(1) Bounce your brokers with delete.topic.enable=true in the server.properties file
     Delete the _schemas topic: kafka/bin/bin/kafka-topics.sh --zookeeper <ZOOKEEPER_CONNECT> --topic _schemas --delete
(2) bin/kafka-run-class.sh kafka.tools.ZooKeeperMainWrapper -server <ZOOKEEPER_CONNECT> delete /schema_registry/schema_id_counter

Option B (the nuclear option - beware, this will destroy all of your persistent kafka and zookeeper data):
If you are just messing with a dev cluster on your local machine, this is fine
(1) Remove kafka log directories foreach broker (you can move these instead of destroying to be slightly more careful) - this is the "log.dirs" property in your server.properties file.
(2) Remove zookeeper log directories - this is "dataDir" in your zookeeper.properties file


Hope this helps,
Geoff

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/c9c5dd4e-ab30-4458-8525-4b46f4182524%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mathieu Garstecki

unread,
Sep 2, 2015, 5:43:58 PM9/2/15
to confluent...@googlegroups.com
Thank you Geoff,

Am I right in believing that Option A will stop us from decoding Avro data stored in Kafka ?
As I understand it, Avro data is stored with the schema ID, and the IDs might change if we delete and recreate the schemas.

Thank you also for the pointers to the test frameworks, I won't be able to use them right now, but I'll keep them in mind whenever I have more time available.

Regards,

Mathieu

Geoff Anderson

unread,
Sep 2, 2015, 8:12:28 PM9/2/15
to confluent...@googlegroups.com
Hi Mathieu,

That's right - removing schema data like this will break decoding of existing data stored in Kafka, so your tests may need to include logic to populate Kafka if possible.

Thanks,
Geoff



Andy Chambers

unread,
Dec 18, 2015, 9:13:27 PM12/18/15
to Confluent Platform


On Wednesday, September 2, 2015 at 9:49:34 AM UTC-7, Geoffrey Anderson wrote:
Hi Mathieu,

Great to hear! By the way, if you are interested in how we've been writing automated system tests, consider checking out ducktape, a command-line tool and library for system testing:

https://github.com/confluentinc/ducktape (command line tool and library to help with system tests)

https://github.com/confluentinc/muckrake (some of our confluent platform system tests live here)

On to your question:

There's no build-in way to clear out schemas, but since you're in a development environment where it's ok to break things, there are a couple ways to do this which all boil down to the same two steps:
Note that both option A and option B clear out *all* schemas (not just schemas per topic).

(1) get rid of Kafka data in the "_schemas" topic
(2) get rid of persistent zookeeper data storing the upper bound on the current batch

Option A
This is the more careful option. It's best to shut you schema registries down first.
(1) Bounce your brokers with delete.topic.enable=true in the server.properties file
     Delete the _schemas topic: kafka/bin/bin/kafka-topics.sh --zookeeper <ZOOKEEPER_CONNECT> --topic _schemas --delete
(2) bin/kafka-run-class.sh kafka.tools.ZooKeeperMainWrapper -server <ZOOKEEPER_CONNECT> delete /schema_registry/schema_id_counter

Unless I'm mistaken, I think this method would require that re-registering the schemas must be done in the same order as the original registration to ensure data is backwards compatible.

Is my understanding correct?

Cheers,
Andy

Andy Chambers

unread,
Dec 18, 2015, 9:14:26 PM12/18/15
to Confluent Platform
Ooops. Just read the rest of the thread. Sorry
Reply all
Reply to author
Forward
0 new messages