kafka connect vs regular kafka consumers/producers

baaa

unread,

Nov 17, 2016, 8:15:58 AM11/17/16

to Confluent Platform

Hi

I have multiple use cases in which I need to take data from a kafka topic and ingest them to some db or search engine.

Some use cases require some modification on the data before its ingested, some use cases don't require this.

I can either right regular consumers that will consume the topic and ingest to engine X , or I can integrate it with kafka connect.

So I am wondering - in kafka designers and developers vision - what should be done? what are the main differences?

Thanks.

Gwen Shapira

unread,

Nov 17, 2016, 1:18:04 PM11/17/16

to confluent...@googlegroups.com

I see three main differences between Connect APIs and the "standard"
Consumer/Producer APIs:

1. Push vs. Pull: Producer pushes to Kafka. Connect Source pulls data
from an external system. Consumers pull data from Kafka. Kafka Sinks
push data to an external system. It may seem like semantic quibbling,
but the API differences have an impact on the focus of the code you
write.

2. Batteries included: Connect takes care of configuration management,
REST API for management, offset management, HA, etc. If the way we do
these things doesn't match your requirements, then Connect is a bad
fit. But if it does, you just saved tons of time and effort.

3. Ecosystem: Connect APIs has started a small industry of people
writing connectors. We have around 50 of them. If your external
systems already have connectors, you don't need to write any code.
Just download, deploy and run.

Hope this helps?

Gwen

> --
> You received this message because you are subscribed to the Google Groups
> "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to confluent-platf...@googlegroups.com.
> To post to this group, send email to confluent...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/56bc6d70-7fe5-4f56-861c-5c82beeb2146%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

baaa

unread,

Nov 19, 2016, 12:20:04 PM11/19/16

to Confluent Platform

Yeah this helps.

Can you elaborate or point to other source on 2?

"Connect takes care of configuration management,
REST API for management, offset management, HA, etc. If the way we do
these things doesn't match your requirements, then Connect is a bad
fit. But if it does, you just saved tons of time and effort.
"

Thanks.

> email to confluent-platform+unsub...@googlegroups.com.

Ewen Cheslack-Postava

unread,

Nov 21, 2016, 1:40:25 PM11/21/16

to Confluent Platform

On Sat, Nov 19, 2016 at 9:20 AM, baaa <dan...@gmail.com> wrote:

Yeah this helps.
Can you elaborate or point to other source on 2?
"Connect takes care of configuration management,
REST API for management, offset management, HA, etc. If the way we do
these things doesn't match your requirements, then Connect is a bad
fit. But if it does, you just saved tons of time and effort.
"

This is related to Gwen's third point about not writing any code. Connect is a system you deploy where the framework makes some assumptions: it controls where your connector configurations live (in a Kafka topic that you specify), it controls how offsets are stored (in Kafka, whereas for source connectors you could implement something different in a custom solution), it handles the implementation of all the fault tolerance/HA features (but that means you don't have tight integration with other solutions at different levels of the stack such as Kubernetes/Mesos/etc). All of these things make it really easy to get up and running and mean you generally don't have to write any code to get data flowing. But if you're looking for, e.g., some specific integration in how you manage HA, want a Thrift-based interface to interact with your cluster, or prefer low-level control over how you track which data has been copied already, Connect is not going to easily provide that for you.

However we made the choices we did because they work across a *very* broad set of use cases. In general they should be a great fit for most environments and applications.

-Ewen

Thanks.

> email to confluent-platform+unsubscribe@googlegroups.com.

> To post to this group, send email to confluent...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/56bc6d70-7fe5-4f56-861c-5c82beeb2146%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/1812a12c-a3e2-4f98-a66c-508be5b6b9e7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Thanks,
Ewen

Sriram KS

unread,

Aug 5, 2018, 7:24:57 PM8/5/18

to Confluent Platform

Hi,

While going through the working of Kafka Connect, why does a connector instance creates a separate Consumer Group to connect with kafka.

Why cant the entire Kafka Connect cluster connect as a single COnsumer Group or atleast create a higher level abstraction to get consumer group as part of connector config?