Kafka connect standalone vs distributed

1,501 views
Skip to first unread message

Skrzypek, Jonathan

unread,
Jul 13, 2016, 6:43:11 AM7/13/16
to confluent...@googlegroups.com

Hi,

 

Reading through the kafka connect documentations I'm wondering what's the difference in terms of consumer groups between standalone and distributed.

It seems that to run multiple consuming processes on different hosts the distributed setup is recommended.

But it looks like the standalone mode creates a kafka consumer group anyways, so if you run multiple standalone instances, doesn't it allow you to scale in the same way ?

Each standalone process will consume from one or more partitions right ?

 

Also, in the distributed mode, can multiple workers or tasks consume from the same partition ?

 

Thanks

 

 

Dustin Cote

unread,
Jul 13, 2016, 10:54:48 AM7/13/16
to confluent...@googlegroups.com
Hi Jonathan,

The biggest difference between standalone and distributed mode is the fact that the workers know about each others existence in distributed mode, allowing you to have some fault tolerance and coordination between workers:

If you run multiple standalone instances, you would have to coordinate them yourself and you would have additional resource overhead to manage as well.

In terms of what can consume from the same partition, it's more about the connectors than the workers themselves.  The logical separation looks like this:

You can then imagine a second connector in that diagram as an independent entity just like an independent consumer group would work.  The different logical breakdowns for connect are described here:


-Dustin

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/F6A8DEE6B40A30419B89834C7CFF43F4032DCD00%40gsdgeup01env2.firmwide.corp.gs.com.
For more options, visit https://groups.google.com/d/optout.



--
Dustin Cote

Skrzypek, Jonathan

unread,
Jul 13, 2016, 11:16:59 AM7/13/16
to confluent...@googlegroups.com

Thanks !

In my case I have a single simple task that I want to parallelize leveraging partitions, they don’t have to share state or do anything on rebalance or failover, so I guess running as many standalone instances as partitions does the job.

The underlying consumer api with the consumer group should coordinate which consumers gets data from which partition.

Reply all
Reply to author
Forward
0 new messages