Sink Connector vs Consumer

532 views
Skip to first unread message

Chris Stromberger

unread,
Mar 17, 2017, 3:02:01 PM3/17/17
to confluent...@googlegroups.com
Relative newbie, needing to pull records from a Kafka queue, do some minor transformation and then save to Cassandra. I'm confused by the "connect vs consumer" choice. Wondering if there are definite use cases that call for one over the other. From what I can tell, a sink connector uses the consumer and all its built-in consumer group partition management under the hood. With the consumer group management, it seems like using a consumer directly is a fairly simple and robust option. What are the clear use cases or considerations for choosing connect vs consumer for this kind of situation?

Thanks,
Chris

ha...@confluent.io

unread,
Mar 17, 2017, 3:31:00 PM3/17/17
to Confluent Platform
This should help to highlight the differences but basically you get a lot of prebuilt common code for free when you start with connect API instead of consumer api.

https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/

Since you mention transforms I should also highlight that in 0.10.2 there are now configurable transforms in Connect so you could just take the prebuilt Cassandra sink connector, configure a transform and run it with no coding required.

-hans

Chris Stromberger

unread,
Mar 18, 2017, 11:48:53 AM3/18/17
to confluent...@googlegroups.com
Thanks, will take a look. I'm still confused on how you run a connector. I see some references in the Confluent docs to a "Connect cluster" and submitting your config via rest api to start your connect workers. But I also see things in the docs like "Workers lend themselves well to being run in containers in managed environments". So is a Connect cluster required, or can I fire up independent instances of my connector anywhere? Maybe I'm getting confused on the terms, worker vs connector etc?

When thinking about consumers, I can just "start up x independent instances of my consumer" and it seems very clear. 


--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/b0188fd2-b174-4e60-86d8-e2a579b39d8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ryan Pridgeon

unread,
Mar 18, 2017, 12:28:49 PM3/18/17
to confluent...@googlegroups.com
Chris, 

The terminology is definitely confusing. I like to think of it as such; 

Connector: Template for a job you wish perform on your collection of workers.
Worker: The actual node(JVM instance) performing the work you have requested 

When you make a request to the workers REST endpoint you are more or less filling out this 'template'. The worker will then execute the work requested after 'rendering' the template. 

If you'd like you can then ask the work to execute different job which may or may not be of the same 'type'(connector/template). Once the request has been received it will then run that job in parallel with the first one you submitted. 

As with anything workers are confined to some finite set of resources. Should you find that your workers are beginning running low on resources you can add another worker to the group. This work will then be spread evenly across the number of available workers allowing you to add additional jobs. 

Hopefully that helps, 
Ryan

On Sat, Mar 18, 2017 at 11:48 AM, Chris Stromberger <chris.st...@gmail.com> wrote:
Thanks, will take a look. I'm still confused on how you run a connector. I see some references in the Confluent docs to a "Connect cluster" and submitting your config via rest api to start your connect workers. But I also see things in the docs like "Workers lend themselves well to being run in containers in managed environments". So is a Connect cluster required, or can I fire up independent instances of my connector anywhere? Maybe I'm getting confused on the terms, worker vs connector etc?

When thinking about consumers, I can just "start up x independent instances of my consumer" and it seems very clear. 
On Fri, Mar 17, 2017 at 2:30 PM, <ha...@confluent.io> wrote:
This should help to highlight the differences but basically you get a lot of prebuilt common code for free when you start with connect API instead of consumer api.

https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/

Since you mention transforms I should also highlight that in 0.10.2 there are now configurable transforms in Connect so you could just take the prebuilt Cassandra sink connector, configure a transform and run it with no coding required.

-hans

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages