How can I make my schema-registry _schema topic more resilient?

336 views
Skip to first unread message

C Mcc

unread,
Oct 14, 2016, 12:57:06 PM10/14/16
to Confluent Platform
Hi,
I've probably made a classic newbie error ... so perhaps someone on here can point me in the right direction, so that I can avoid this in the future.
As per my compose file below (not all shown) I had various confluent components (kafka-connect connectors, schema-registry) as well as some ui components from landoop working nicely, where I could view my topics and schemas.

I was then experimenting with a setting on the kafka container ... and knowing that I had volumes on both zookeeper and kafka containers .. I was under the impression that I could dispose of those containers (stop and remove them) and run up new ones and I would be fine.

However, when I ran up the new ones ... it seems all my schemas were gone ... the '_schema' topic is still in kafka ... but it only shows some NOOP items in the *.log file.
Have I lost all my schemas? ...or are they somehow retrievable again from the *.index file?  ... and if I have lost them, what should I be doing to prevent this from happening again?

/home/data/kafka/kafka-data/_schemas-0# ls -ltr
total 4
-rw-r--r-- 1 root root 10485760 Oct 14 11:56 00000000000000000081.index
-rw-r--r-- 1 root root      186 Oct 14 12:29 00000000000000000081.log

Thanks,
Colum


services:
  zookeeper:
    image: confluentinc/cp-zookeeper:3.0.1
    container_name: cp-zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 32181
      ZOOKEEPER_TICK_TIME: 2000
    volumes:
      - /home/data/zookeeper/zk-data:/var/lib/zookeeper/data
      - /home/data/zookeeper/zk-txn-logs:/var/lib/zookeeper/log
    network_mode: "host"
  kafka:
    image: confluentinc/cp-kafka:3.0.1
    container_name: cp-kafka
    environment:      
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: localhost:32181
      KAFKA_DELETE_TOPIC_ENABLE: 'true'
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:39092
    volumes:
      - /home/data/kafka/kafka-data:/var/lib/kafka/data
    network_mode: "host"
    depends_on:
      - zookeeper
  schema-registry: 
    image: confluentinc/cp-schema-registry:3.0.1
    container_name: cp-schema-registry
    environment: 
      SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: localhost:32181
      SCHEMA_REGISTRY_HOST_NAME: localhost
      SCHEMA_REGISTRY_LISTENERS: http://localhost:8081
    network_mode: "host"

Ewen Cheslack-Postava

unread,
Oct 18, 2016, 1:56:05 AM10/18/16
to Confluent Platform
The schemas are stored in Kafka, so if you discard all your ZK and Kafka storage your schemas will indeed all be discarded. However, if you've discarded all your data as well, then presumably the lost schemas shouldn't be an issue since you have no data left encoded with them?

If you wanted to maintain your schemas but discard the rest of your data, you'd need to delete all non-schema topics.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/069b7a5b-f474-4cd7-a87e-f3b5672803ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

C Mcc

unread,
Oct 19, 2016, 9:03:52 AM10/19/16
to Confluent Platform
Ewen,
Thanks for your reply.
The point is that I didn't discard my ZK and kafka storage, not knowingly, at least.  That is why I had 'volumes' linking those topics' storage back to my host system from the relevant containers, as described above.

I suppose what I'm seeking clarification on, and something I can't find in the docs ... is the _schemas topic subject to some sort of default lifetime like other topics in Kafka?  I was under the impression that since it's a crucial element of the schema-registry, which is intended to maintain schema changes over time, that the '_schemas' topic would have much more resilience, especially if I'm going to the trouble of linking '/var/lib/zookeeper/data' and '/var/lib/kafka/data' back to a host, which should theoretically have given me some resilience when a container is removed and recreated.

Thanks,
Colum

Ewen Cheslack-Postava

unread,
Oct 25, 2016, 1:42:08 AM10/25/16
to Confluent Platform
It's created as a compacted topic with a single partition (as long as you allow the schema registry to create it). It'll also be created with a target replication factor (defaulting to 3) unless that cannot be achieved with the number of available brokers; in that case it'll warn you in the log, but still allow you to proceed. These settings should not result in data getting deleted accidentally.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen
Reply all
Reply to author
Forward
0 new messages