Custom state store for Kafka Stream aggregations

1,314 views
Skip to first unread message

Alexander Jipa

unread,
Jun 9, 2016, 5:16:28 PM6/9/16
to Confluent Platform

Hello,

Asked the same question in mailing list, but still no comment...


According to http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple:

“In terms of implementation Kafka Streams stores this derived aggregation in a local embedded key-value store (RocksDB by default, but you can plug in anything).”

 

So I tried running the world count example on my Windows machine (for local test) and got an error because RocksDB is not available for windows.

I thought it would be easy to switch to an in-memory store.

But after awhile I’ve figured out that the KStream aggregation implementation doesn’t allow that.

It looks like aggregateByKey (and thus countByKey) is always using a persistent store.

More over that it looks like there’s no way to change the default persistent store…

 

Even though I was more or less capable of achieving the goal using manual wiring of a Source, a Producer and a Sink – it doesn’t make it for an easy coding…

 

The questions that I have are:

-         -  Is there a plan of providing a persistent store support for Kafka Streams on Windows?

-         -  Is there a plan of providing KStream API to specify a custom store/factory for aggregations?

-         -  Is there a way of changing the default persistent store from RocksDB?











---

Guozhang Wang

unread,
Jun 11, 2016, 2:49:43 PM6/11/16
to Confluent Platform
Hello,

I think one of us (Eno) has replied on the open source mailing list:

-----

I haven't tried Kafka Streams on Windows but did notice that Microsoft has merged code into github to make RocksDB available on Windows. Perhaps this is useful:
https://blogs.msdn.microsoft.com/bingdevcenter/2015/07/22/open-source-contribution-from-bing-rocksdb-is-now-available-in-windows-platform/

Thanks,
Eno

-----

As you noticed, since in the source code we use RocksDB JNI interface it does not work directly on Windows. I think there are some manners to use other interfaces for it on Windows it may not be incorprorated in Kafka Streams yet.

And I agree that moving forward we should allow users to configure which stores to use in the high-level Streams DSL, I just filed a JIRA to keep track of it but honestly speaking we do not have a concrete plan / timeline to add that feature yet.


Guozhang

Alexander Jipa

unread,
Jul 5, 2016, 1:31:08 PM7/5/16
to Confluent Platform
Hello,
I was able to find that thread too but it doesn't answer the question really.
It just says it's not available yet and we don't know when it will be available...

My first question was rather about a working solution for now, e.g. any other store available for Kafka Streams with Windows...

Also what about the remaining two questions?

 -  Is there a plan of providing KStream API to specify a custom store/factory for aggregations?

-         -  Is there a way of changing the default persistent store from RocksDB?


Matthias J. Sax

unread,
Jul 5, 2016, 2:25:39 PM7/5/16
to confluent...@googlegroups.com
Hi,

currently, using DSL there is no way to plug-in a different store. The
only way to use a custom store is via DSL methods transform(),
transformValues(), and process(). However, all three require UDF code,
ie, are similar to low level Processor API.

-Matthias
> <http://docs.confluent.io/2.1.0-alpha1/streams/architecture.html#state> (RocksDB
> by default, but you can plug in anything).”
>
>
>
> So I tried running the world count example on my Windows machine
> (for local test) and got an error because RocksDB is not
> available for windows.
>
> I thought it would be easy to switch to an in-memory store.
>
> But after awhile I’ve figured out that the KStream aggregation
> implementation doesn’t allow that.
>
> It looks like aggregateByKey (and thus countByKey) is always
> using a persistent store.
>
> More over that it looks like there’s no way to change the
> default persistent store…
>
>
>
> Even though I was more or less capable of achieving the goal
> using manual wiring of a Source, a Producer and a Sink – it
> doesn’t make it for an easy coding…
>
>
>
> The questions that I have are:
>
> - - Is there a plan of providing a persistent store
> support for Kafka Streams on Windows?
>
> - - Is there a plan of providing KStream API to specify
> a custom store/factory for aggregations?
>
> - - Is there a way of changing the default persistent
> store from RocksDB?
>
>
>
>
>
>
>
>
>
>
>
> ---
>
> --
> You received this message because you are subscribed to the Google
> Groups "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to confluent-platf...@googlegroups.com
> <mailto:confluent-platf...@googlegroups.com>.
> To post to this group, send email to confluent...@googlegroups.com
> <mailto:confluent...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/1f620965-fad1-4734-bab7-335eacfc2585%40googlegroups.com
> <https://groups.google.com/d/msgid/confluent-platform/1f620965-fad1-4734-bab7-335eacfc2585%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

signature.asc
Reply all
Reply to author
Forward
0 new messages