Kafka source and UPSERTs

78 views
Skip to first unread message

Mathieu

unread,
Feb 28, 2021, 4:07:18 PM2/28/21
to Druid User
Hello there,

I'm totally new to Druid, and I'd like to know if it could fit our need.
We have a kafka-streams app producing time-series data we are currently storing in SQL DB (actually timescaleDB). We envisage to change this storage for, maybe, druid.

This kafka-streams app produces outputs in the form of KTables in kafka jargon, where each record represent an update on a given key. Which means the same key can appear several times.
To store this data, we thus need to work in UPSERT mode. From what I read in druid docs, it does not seem that druid can consume a kafka topic with this UPSERT mechanism.

What's the proper way of doing this in Druid ?
Folks using kafka streams here, how do you do ?

Thanks in advance
Mathieu

Itai Yaffe

unread,
Mar 1, 2021, 3:21:27 AM3/1/21
to druid...@googlegroups.com
Hey Mathieu,
Welcome to the Druid community :)

While I haven't used Kafka Streams myself, by reading your description, I think the way to achieve what you're looking for is using Druid's roll-up capability (see https://druid.apache.org/docs/latest/tutorials/tutorial-rollup.html).
That way, at ingestion time:
  1. When an event with a new key comes along, a new record will be created in Druid.
  2. When an event with an existing key comes along, Druid will update the existing record by adding the values associated with that key, to the current values.
This essentially gives you the UPSERT mode I believe you're looking for.

As per the next step, of how to actually ingest the data into Druid - Druid supports a few ingestion methods (https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods).
In your case, I think Kafka ingestion (https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html) would make sense, so basically:
  1. Write the output records from your Kafka Streams app to a Kafka topic
  2. Have Druid read the data from Kafka
Hope that makes sense, let me know if you have any further questions.

Good luck!

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/e90f7795-0c31-45d8-813f-76fc3b9dd830n%40googlegroups.com.

nilden tutalar

unread,
Mar 2, 2021, 7:47:05 AM3/2/21
to druid...@googlegroups.com
Hi,

One way is to send this Time-Series data to a topic. You can create an out topic in Kafka Streams then feed this topic to Druid. It will work.
Best,
Nilden

Reply all
Reply to author
Forward
0 new messages