Kafka in production

1,049 views
Skip to first unread message

Stuart Wong

unread,
Jan 5, 2016, 5:18:04 PM1/5/16
to Confluent Platform
Hi all,

We are seeking to implement an enterprise Data Streaming Pipeline using the Confluent Platform. We're wondering how others have deployed and are using Kafka in production in an AWS environment? Is there a centralized Kafka cluster? If so, how are clients connecting to the cluster and the cluster being secured (pre Kafka 0.9.0/CP 2.0.0)? Is anyone using the REST Proxy in production? If so, how is this being secured?

I appreciate any guidance that can be provided.

Thanks,
Stuart

Alex Loddengaard

unread,
Jan 5, 2016, 5:51:09 PM1/5/16
to confluent...@googlegroups.com
Hi Stuart,

Below are some AWS tips. The list could go on and on, so I tried to mention pieces that are most important, and also address your specific questions:
  • The number of Kafka clusters depends on the use case, in particular if the application is cross-datacenter. In a lot of cases that are single datacenter, a single Kafka cluster in one region, spread across multiple availability zones, is a good configuration. This setup would survive an entire availability zone going down.
  • EBS performs better than instance storage, but costs more. Be sure that the Kafka broker log is on a different EBS volume than the OS.
  • You can secure the cluster (including the REST Proxy) by using AWS VPC, which should also give you a more stable network.
  • Make sure ZooKeeper nodes are using elastic IPs. Kafka brokers can use elastic IPs as well, so long as this doesn't get too costly for you
I hope this helps. Let me know if I can address other specific questions.

Alex

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/b1ff76e1-5752-41bc-bbc5-6edfdef4873c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Alex Loddengaard | Solutions Architect | Confluent
Download Apache Kafka and Confluent Platform: www.confluent.io/download

Ben Davison

unread,
Jan 6, 2016, 11:51:36 AM1/6/16
to confluent...@googlegroups.com
Thanks for the information Alex (We are also doing the same thing)

On your last point, you can use a route53 cname for each of your kafka/zookeeper instances to make it a little bit nicer for other teams.

Thanks,

Ben


For more options, visit https://groups.google.com/d/optout.




This email, including attachments, is private and confidential. If you have received this email in error please notify the sender and delete it from your system. Emails are not secure and may contain viruses. No liability can be accepted for viruses that might be transferred by this email or any attachment. Any unauthorised copying of this message or unauthorised distribution and publication of the information contained herein are prohibited.

7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.
Registered in
England and Wales. Registered No. 04843573.

Stuart Wong

unread,
Jan 6, 2016, 1:20:34 PM1/6/16
to Confluent Platform
Thanks for your reply Alex.

We've gone through those points during our discussions and setup. Our concerns are more around the specifics of addressing the challenges as since pre-0.8 Kafka does not provide authentication/authorization, questions are being asked as to how others are doing:

1. Managing VPC peering since if the Kafka cluster is in it's own VPC, then each connecting client in it's own VPC must be paired.
2. If we are not doing VPC peering then it's "open to the world" and how are we then going to manage the SG and ACLs?
3. REST Proxy in production given limitations around consumer specific settings such as offset specification, and being stateful.

So though we are aware of others using Kafka for specific uses and the 1st best practice you mentioned, other than LinkedIn we've not seen other enterprise uses for Kafka as a DataHub or Streaming Pipeline where multiple lines of business / orgs in different VPCs are using Kafka within it's own VPC.

We are also using R53 CNAMES for ZK instead of EIP since (to our knowledge) the latter would require the ZK ensemble be in a public subnet and the communication would have to go out and then back in (for ZK, Kafka and any other ZK clients). We've a similar setup for the Kafka Brokers as well.

Thanks.

On Tuesday, January 5, 2016 at 4:51:09 PM UTC-6, Alex Loddengaard wrote:
Hi Stuart,

Below are some AWS tips. The list could go on and on, so I tried to mention pieces that are most important, and also address your specific questions:
  • The number of Kafka clusters depends on the use case, in particular if the application is cross-datacenter. In a lot of cases that are single datacenter, a single Kafka cluster in one region, spread across multiple availability zones, is a good configuration. This setup would survive an entire availability zone going down.
  • EBS performs better than instance storage, but costs more. Be sure that the Kafka broker log is on a different EBS volume than the OS.
  • You can secure the cluster (including the REST Proxy) by using AWS VPC, which should also give you a more stable network.
  • Make sure ZooKeeper nodes are using elastic IPs. Kafka brokers can use elastic IPs as well, so long as this doesn't get too costly for you
I hope this helps. Let me know if I can address other specific questions.

Alex
On Tue, Jan 5, 2016 at 2:18 PM, Stuart Wong <cgs....@gmail.com> wrote:
Hi all,

We are seeking to implement an enterprise Data Streaming Pipeline using the Confluent Platform. We're wondering how others have deployed and are using Kafka in production in an AWS environment? Is there a centralized Kafka cluster? If so, how are clients connecting to the cluster and the cluster being secured (pre Kafka 0.9.0/CP 2.0.0)? Is anyone using the REST Proxy in production? If so, how is this being secured?

I appreciate any guidance that can be provided.

Thanks,
Stuart

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Alan Gawthorpe

unread,
Mar 17, 2016, 5:59:28 AM3/17/16
to Confluent Platform
Hi Alex,

Quick question re availability zones:
  • Is Kafka "happy" with latencies between AZs in AWS?  I thought the general guidance was to keep clusters within a single DC - is an AWS AZ a "special" case?
  • If I understand correctly Kafka depends on Zookeeper.  My understanding of Zookeeper is that it is pointless to split a Zookeeper over two AZs - you would end up with n nodes in each AZ giving 2n in total.  To tolerate F failures Zookeeper needs 2F+1 nodes.  Loss of an AZ would take out too many nodes resulting in failure of Zookeeper.  That presumably renders Kafka useless at that point? 
Thanks,
Alan

On Tuesday, 5 January 2016 22:51:09 UTC, Alex Loddengaard wrote:
Hi Stuart,

Below are some AWS tips. The list could go on and on, so I tried to mention pieces that are most important, and also address your specific questions:
  • The number of Kafka clusters depends on the use case, in particular if the application is cross-datacenter. In a lot of cases that are single datacenter, a single Kafka cluster in one region, spread across multiple availability zones, is a good configuration. This setup would survive an entire availability zone going down.
  • EBS performs better than instance storage, but costs more. Be sure that the Kafka broker log is on a different EBS volume than the OS.
  • You can secure the cluster (including the REST Proxy) by using AWS VPC, which should also give you a more stable network.
  • Make sure ZooKeeper nodes are using elastic IPs. Kafka brokers can use elastic IPs as well, so long as this doesn't get too costly for you
I hope this helps. Let me know if I can address other specific questions.

Alex
On Tue, Jan 5, 2016 at 2:18 PM, Stuart Wong <cgs....@gmail.com> wrote:
Hi all,

We are seeking to implement an enterprise Data Streaming Pipeline using the Confluent Platform. We're wondering how others have deployed and are using Kafka in production in an AWS environment? Is there a centralized Kafka cluster? If so, how are clients connecting to the cluster and the cluster being secured (pre Kafka 0.9.0/CP 2.0.0)? Is anyone using the REST Proxy in production? If so, how is this being secured?

I appreciate any guidance that can be provided.

Thanks,
Stuart

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Stuart Wong

unread,
Mar 17, 2016, 11:04:26 AM3/17/16
to Confluent Platform
Hi Alan,

You referenced Alex specifically but thought I'd throw in my own understanding, analysis and setup here:

- While it's always best to have things in the same AZ since you can better reason about network latencies and such, spreading across AZs is highly recommended from an availability perspective (which you probably know). For this reason we use multiple AZs for all our services (as much as possible) and have not seen any issues. AZs within the same region are fairly close and as such you can be fairly deterministic in reasoning the network latencies (and hence setting appropriate, and acceptable timeout values). Both ZooKeeper and Kafka are fine in a multi-AZ setup within the same region (we've been running quite some time now with no issues). What doesn't really work (without long timeouts which cause their own issues) is across regions since the network latencies can and will fluctuate a bit and will affect your throughput as well.

- Using 2 AZs is not recommended as depending on which AZ went down ZooKeeper might not have quorum (so no writes). You need to spread across 3 Azs to actually achieve a good HA solution, as that way you can survive the loss of any single AZs. This is our standard clustering setup. Of course not all regions have 3 AZs so there is that caveat. Spreading Kafka across 2 AZs is okay; you can have partial service failure for the brokers that are in the failed AZ, or you can manage the replication across the brokers in different AZs and avoid the downtime (I'm not recommending this only mentioning the possibility). We currently have 5 x m4.xlarge Kafka brokers spread across 3 AZs and assume the risk that an ISR set may all be within the same AZ (better partial outage than full).

Hope this helps.

Alex Loddengaard

unread,
Mar 21, 2016, 5:59:28 PM3/21/16
to confluent...@googlegroups.com
Hi Alan,

What Stuart said :)

Also, a little more on how ZooKeeper behaves with failure: https://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios

Alex

Alex

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.



--
Alex Loddengaard | Solutions Architect | Confluent
Download Apache Kafka and Confluent Platform: www.confluent.io/download

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages