One Kafka cluster limitation for Kafka Streams applications

1,615 views
Skip to first unread message

Anca Sarb

unread,
Apr 7, 2016, 8:50:18 AM4/7/16
to Confluent Platform
Hi all,

The developer guide for Kafka Streams (http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html) states that there is currently a limitation in that a Kafka Streams application can only talk to a single Kafka cluster but also mentions that in the future, Kafka Streams will be able to support connection to different Kafka clusters for reading input streams and/or writing output streams. Is this feature going to be included in the 0.10.0 open source release, which is targeted for mid April?

Thank you for good work around Kafka Streams.

Anca

Guozhang Wang

unread,
Apr 7, 2016, 9:36:36 PM4/7/16
to Confluent Platform
Anca,

We do not have plans to include supporting multiple clusters in the first release of Kafka Streams in 0.10.0.0 yet.

Would you like to share your use case of stream processing across clusters for me to better understand the motivation of this feature request?

Guozhang

Anca Sarb

unread,
Apr 8, 2016, 4:20:23 AM4/8/16
to Confluent Platform
Hi Guozhang,

Thanks for your reply. 

Regarding our use case, we're trying to synchronize two internal systems(each with their own Kafka clusters) by consuming records published by our system on one Kafka cluster and publishing a consolidated message on the Kafka cluster of the downstream system.

Do you have any suggestions on how to best go about achieving this? If possible, we'd like to still make use of Kafka streams library, as it's quite neat! Or do you recommend using the KafkaProducer/KafkaConsumer API instead?

Anca

Guozhang Wang

unread,
Apr 8, 2016, 3:16:33 PM4/8/16
to Confluent Platform
How complex is the consolidation process?

One thing you can do though, is to that you can use the customizable process() call in Kafka Streams, such that after the consolidation, you can use another embedded producer client in your processor to send to the different destination cluster. And when Kafka Streams add multi-cluster support, you can simply get rid of that customized processor at the end of your topology.


Guozhang

Saravanan Tirugnanum

unread,
Apr 21, 2017, 8:20:16 AM4/21/17
to Confluent Platform
Hi Guozhang

Is this Multi Cluster support feature available in 0.10.2.0. Or do we still need to write custom producer client and add to processor.
In that case , we will not have a sink at all our in topology builder. Hope thats fine.

Regards
Saravanan

Eno Thereska

unread,
Apr 21, 2017, 11:25:45 AM4/21/17
to Confluent Platform
The Multi cluster feature is not yet available in 0.10.2.0 and it's unlikely it will be available in 0.11 either (in 3 or so months). It's on our radar though, and it sounds useful. We could use some help from the community if anyone is interested in picking this up.

Thanks
Eno

Saravanan Tirugnanum

unread,
Apr 26, 2017, 2:17:14 PM4/26/17
to Confluent Platform
Thanks Eno.. I am happy to help contribute for this.. Can you pls help guide how to start

Eno Thereska

unread,
Apr 27, 2017, 12:42:41 PM4/27/17
to Confluent Platform
I think this will require what we call a KIP: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals. It's a form of a design proposal that you can put forth and then the community provides feedback. We can provide some guidance if needed too.

Thanks, this is much appreciated,
Eno

m.mohamed....@gmail.com

unread,
Dec 31, 2017, 7:50:28 AM12/31/17
to Confluent Platform
Hi,

I have a great need of the talking to multiple clusters feature.
Would you please tell me if it is supported on the 1.0.0 version ?

Thanks
Mohamed

Matthias J. Sax

unread,
Dec 31, 2017, 1:06:41 PM12/31/17
to confluent...@googlegroups.com
This feature is not supported yet and there is no concrete roadmap atm
either.

It's recommended to write the output to a topic in the source cluster
and replicate the data into the target cluster.

Note:
- the output topic in the source cluster can have a quite short
retention time, as it is only used as intermediate "buffer" while the
data is safely stored with larger retention time in the target cluster;
thus, memory overhead can be minimized
- for replicating the data you can use MirrorMaker that ships with
Apache Kafka (or other third party tools for cross-cluster replication)


Hope this helps.


-Matthias

On 12/31/17 4:50 AM, m.mohamed....@gmail.com wrote:
> Hi,
>
> I have a great need of the talking to multiple clusters feature.
> Would you please tell me if it is supported on the 1.0.0 version ?
>
> Thanks
> Mohamed
>
>
> Le jeudi 27 avril 2017 18:42:41 UTC+2, Eno Thereska a écrit :
>
> I think this will require what we call a
> KIP: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> <https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals>.
> <http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html>)
> states that there is currently a
> limitation in that a Kafka Streams
> application can only talk to a single
> Kafka cluster but also mentions that in
> the future, Kafka Streams will be able
> to support connection to different Kafka
> clusters for reading input streams
> and/or writing output streams. Is this
> feature going to be included in the
> 0.10.0 open source release, which is
> targeted for mid April?
>
> Thank you for good work around Kafka
> Streams.
>
> Anca
>
> --
> You received this message because you are subscribed to the Google
> Groups "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to confluent-platf...@googlegroups.com
> <mailto:confluent-platf...@googlegroups.com>.
> To post to this group, send email to confluent...@googlegroups.com
> <mailto:confluent...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/d32f2bf9-de9c-4f1f-b1ae-baf198e86f94%40googlegroups.com
> <https://groups.google.com/d/msgid/confluent-platform/d32f2bf9-de9c-4f1f-b1ae-baf198e86f94%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

m.mohamed....@gmail.com

unread,
Apr 26, 2018, 2:45:44 PM4/26/18
to Confluent Platform
Thank you Matthias

I try to write the output to a topic in the source cluster and replicate the data into the target cluster using MirrorMaker. However, I get this error message:
[2018-04-26 17:42:22,003] ERROR Error when sending message to topic output-1-func003 with key: 9 bytes, value: 8 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org
.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for output-1-func003-0: 30013 ms has passed since batch creation plus linger time
I get this error 30 seconds after lunching a program that uses kafka streams to produce messages in the topic output-1-func003. The message is a long number which is sent every 5 seconds. After googling the error, I understood that the sending frequency may be the cause of the error. So, as recommended, I changed the configuration of the "linger.msand "batch.size" of MirrorMaker producer. However, this didn't solve the problem. 


This is the command I use to lunch MirrorMaker
bin/kafka-mirror-maker.sh \
--consumer.config config/sourceClusterConsumer.config \
--producer.config config/targetClusterProducer.config \
--whitelist=output-1-func003
This is the content of my sourceClusterConsumer.config
bootstrap.servers=localhost:9092
client.id=func003.Consumer
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
group.id=func003.Consumer
partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
This is the content of my targetClusterProducer.config
bootstrap.servers=192.168.10.5:9092
client.id=func003.Producer
key.serializer=org.apache.kafka.common.serialization.StringDeserializer
value.serializer=org.apache.kafka.common.serialization.LongDeserializer
compression.type=lz4
batch.size=65536


May you please help me ?

Thanks
Mohamed

Matthias J. Sax

unread,
Apr 30, 2018, 5:35:53 AM4/30/18
to confluent...@googlegroups.com
You might hit a bug that is addressed via KIP-91.

As a workaround, try to increase parameter `request.timeout.ms` (default
is 30 seconds)


-Matthias
> > an email to confluent-platf...@googlegroups.com
> <javascript:>
> > <mailto:confluent-platf...@googlegroups.com
> <javascript:>>.
> > To post to this group, send email to confluent...@googlegroups.com
> <javascript:>
> > <mailto:confluent...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/confluent-platform/d32f2bf9-de9c-4f1f-b1ae-baf198e86f94%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to confluent-platf...@googlegroups.com
> <mailto:confluent-platf...@googlegroups.com>.
> To post to this group, send email to confluent...@googlegroups.com
> <mailto:confluent...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/048c7878-5d0d-4ef3-b752-4ef28ce983ef%40googlegroups.com
> <https://groups.google.com/d/msgid/confluent-platform/048c7878-5d0d-4ef3-b752-4ef28ce983ef%40googlegroups.com?utm_medium=email&utm_source=footer>.
signature.asc
Reply all
Reply to author
Forward
0 new messages