Snappy Compression Config - Kafka connect

1,910 views
Skip to first unread message

shrk

unread,
May 17, 2016, 5:30:50 AM5/17/16
to Confluent Platform
Hi ,

Is there any direct method for setting compression codec preferably snappy compression in Kafka connect - Sink Connector .

Or what is the best way to set the compression methodology in kafka connect to store the kafka messages to the local file system disk which are snappy compressed.

Do we have direct method to set the compression or should we use snappy library externally to compress the data and push to the file ( snappy format )

Regards,
shrk

Liquan Pei

unread,
May 17, 2016, 12:50:36 PM5/17/16
to confluent...@googlegroups.com
Hi Shrk,

Kafka connect don't provide compression related code. Although Kafka Connect depends on the clinets jar which includes Compressor, but I don'd encourage you to use that may not meet the specific needs of your connector.

Thanks,
Liquan


--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/0b987789-9121-4b68-95a8-3f33234d7852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download

Ewen Cheslack-Postava

unread,
May 19, 2016, 12:15:46 AM5/19/16
to Confluent Platform
I would suggest handling this at the Kafka level -- if you want data produced to Kafka that is snappy compressed, just configure the producers and consumers in Connect to use snappy compression. You can override any client settings as described here: http://docs.confluent.io/2.0.1/connect/userguide.html#overriding-producer-consumer-settings

-Ewen


For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

shrk

unread,
May 30, 2016, 8:37:10 AM5/30/16
to Confluent Platform
Hi Ewen,

I tried setting compression codec at the producer level , but even after overwriting the properties in Kafka Connect worker properties the records fetched from the SinkTask put method is not the snappy compressed data . Can I know how can I fetch the snappy data ? 

Thanks,
Shrk


On Thursday, 19 May 2016 09:45:46 UTC+5:30, Ewen Cheslack-Postava wrote:
I would suggest handling this at the Kafka level -- if you want data produced to Kafka that is snappy compressed, just configure the producers and consumers in Connect to use snappy compression. You can override any client settings as described here: http://docs.confluent.io/2.0.1/connect/userguide.html#overriding-producer-consumer-settings

-Ewen
On Tue, May 17, 2016 at 9:50 AM, Liquan Pei <liq...@confluent.io> wrote:
Hi Shrk,

Kafka connect don't provide compression related code. Although Kafka Connect depends on the clinets jar which includes Compressor, but I don'd encourage you to use that may not meet the specific needs of your connector.

Thanks,
Liquan

On Tue, May 17, 2016 at 2:30 AM, shrk <nkravi...@gmail.com> wrote:
Hi ,

Is there any direct method for setting compression codec preferably snappy compression in Kafka connect - Sink Connector .

Or what is the best way to set the compression methodology in kafka connect to store the kafka messages to the local file system disk which are snappy compressed.

Do we have direct method to set the compression or should we use snappy library externally to compress the data and push to the file ( snappy format )

Regards,
shrk

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

Ewen Cheslack-Postava

unread,
May 30, 2016, 7:42:59 PM5/30/16
to Confluent Platform
Tasks are layered on top of the consumer, which decompresses messages. In fact, due to the way Kafka handles compression (compressing a set of messages in one block for efficiency w/ small messages), you generally cannot access individual compressed messages. This isn't a limitation of Kafka Connect -- this is also how the consumer works. I think you might be able to get at the compressed message set via the old simple consumer interfaces, but there are no plans to expose that in Kafka Connect.

-Ewen

On Mon, May 30, 2016 at 5:37 AM, shrk <nkravi...@gmail.com> wrote:
Hi Ewen,

I tried setting compression codec at the producer level , but even after overwriting the properties in Kafka Connect worker properties the records fetched from the SinkTask put method is not the snappy compressed data . Can I know how can I fetch the snappy data ? 

Thanks,
Shrk


On Thursday, 19 May 2016 09:45:46 UTC+5:30, Ewen Cheslack-Postava wrote:
I would suggest handling this at the Kafka level -- if you want data produced to Kafka that is snappy compressed, just configure the producers and consumers in Connect to use snappy compression. You can override any client settings as described here: http://docs.confluent.io/2.0.1/connect/userguide.html#overriding-producer-consumer-settings

-Ewen
On Tue, May 17, 2016 at 9:50 AM, Liquan Pei <liq...@confluent.io> wrote:
Hi Shrk,

Kafka connect don't provide compression related code. Although Kafka Connect depends on the clinets jar which includes Compressor, but I don'd encourage you to use that may not meet the specific needs of your connector.

Thanks,
Liquan

On Tue, May 17, 2016 at 2:30 AM, shrk <nkravi...@gmail.com> wrote:
Hi ,

Is there any direct method for setting compression codec preferably snappy compression in Kafka connect - Sink Connector .

Or what is the best way to set the compression methodology in kafka connect to store the kafka messages to the local file system disk which are snappy compressed.

Do we have direct method to set the compression or should we use snappy library externally to compress the data and push to the file ( snappy format )

Regards,
shrk

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

dori.w...@fyber.com

unread,
Jan 2, 2018, 7:34:42 AM1/2/18
to Confluent Platform
Hi 

So confluent s3 connector cannot save the data compressed? 

This is a little problematic, we use kafka to save 9T json dailt data  on S3 , currently, we are using "secor" solution but we want to move on to kafka connector 
and as we have large amount of data we want to continue to save the data compressed (gz)

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

Konstantine Karantasis

unread,
Jan 17, 2018, 10:22:02 AM1/17/18
to confluent...@googlegroups.com

There's a PR in progress to add compression for json text files in Confluent's S3 sink connector. Should be merged soon. 


-Konstantine



To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.
--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages