Overriding configurations in connect-standalone.properties

1,086 views
Skip to first unread message

Tushar Sudhakar Jee

unread,
Apr 21, 2017, 2:21:40 PM4/21/17
to Confluent Platform
Hi,
I am trying to override producer and consumer configurations to get a baseline for average latency.
I mentioned all changes in connect-standalone.properties on all Nodes N1, N2 and N3 where N1 is source node and N2 ,N3 sink nodes.
Also data from the text file present on N1 is pushed into the topic(part20rep1) which is pulled by N2 and N3.

connect-standalone.properties
bootstrap.servers=10.0.7.122:9092, 10.0.11.33:9092, 10.0.6.85:9092
#monitoring classes
consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter

key.converter.schemas.enable=false
value.converter.schemas.enable=false


internal.key.converter=org.apache.kafka.connect.json.JsonConverter

internal.value.converter=org.apache.kafka.connect.json.JsonConverter

internal.key.converter.schemas.enable=false

internal.value.converter.schemas.enable=false


offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000


producer.acks=1
producer.compression.type=lz4
consumer.session.timeout.ms=300000
consumer.request.timeout.ms=310000
heartbeat.interval.ms= 60000
session.timeout.ms= 200000
max.poll.interval.ms=Integer.MAX_VALUE
max.poll.records=500
 

-Thanks,
Tushar

Ewen Cheslack-Postava

unread,
Apr 21, 2017, 11:13:39 PM4/21/17
to Confluent Platform
Tushar,

I'm not sure I understand the question. From the config you provided, it looks like you have followed the section of our docs that describes overriding producer and consumer configs: http://docs.confluent.io/current/connect/userguide.html#overriding-producer-consumer-settings

What is the issue you're facing? Is something not working as expected? It does look like a few of the overrides are not properly prefixed (e.g. the ones at the end like max.poll.records, which only applies to consumers, so should have a "consumer." prefix).

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/401fd5e2-cf5f-4da7-8b07-ed4f3f4cfce9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tushar Sudhakar Jee

unread,
Apr 24, 2017, 9:00:21 AM4/24/17
to Confluent Platform
Hi Ewen,

I am trying to get performance numbers(preferably high throughput and low latency) for the setup described above.
At the current moment I am getting neither.

So from your reply what I understand is that instead of making changes in consumer.properties or server.properties or producer.properties independently , I could add changes in connect-standalone.properties with appropriate prefixes to tweak configurations in the setup.
Is that correct?

Secondly,
What is the correct way to get performance/baseline numbers for a kafka connector in the above setup and how do I optimize it ? 

-
Tushar
To post to this group, send email to confluent...@googlegroups.com.

Ewen Cheslack-Postava

unread,
Apr 24, 2017, 10:59:33 PM4/24/17
to Confluent Platform
On Mon, Apr 24, 2017 at 6:00 AM, Tushar Sudhakar Jee <tshrsu...@gmail.com> wrote:
Hi Ewen,

I am trying to get performance numbers(preferably high throughput and low latency) for the setup described above.
At the current moment I am getting neither.

So from your reply what I understand is that instead of making changes in consumer.properties or server.properties or producer.properties independently , I could add changes in connect-standalone.properties with appropriate prefixes to tweak configurations in the setup.
Is that correct?

To clarify, consumer.properties, server.properties, and producer.properties are not related at all to Kafka Connect. Obviously filenames are arbitrary, but Kafka Connect only cares about a) worker settings (e.g. in connect-standalone.properties) and b) connector settings (in standalone mode, these will be the additional .properties files provided on the command line, such as connect-file-source.properties).

I suspect you are referring to the other filenames because they are used elsewhere:
* server.properties is only used for Kafka brokers. Kafka Connect workers are independent processes from brokers
* producer.properties and consumer.properties are used by some tools to override producer and consumer settings. Connect does not work like this; you can override these properties using prefixed properties in the worker configuration: http://docs.confluent.io/current/connect/userguide.html#overriding-producer-consumer-settings

Re: your final question about prefixed settings in connect-standalone.properties, you are correct that this is how you would tweak producer and consumer settings. Note that within a datacenter, Kafka ships with defaults that are usually good for performance. You can always tweak them, of course, but it would be good to get some baseline tests & numbers first to determine if the setup really isn't performing as well as it could.
 

Secondly,
What is the correct way to get performance/baseline numbers for a kafka connector in the above setup and how do I optimize it ? 

This is a pretty open ended question and in many cases will depend on the specific connector. Can you be more concrete about what connectors you are running, what their configs are, latency and throughput between the systems being connected, etc?

-Ewen
 

-
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Tushar Sudhakar Jee

unread,
Apr 26, 2017, 3:59:18 PM4/26/17
to Confluent Platform
Ewen,
So I ran the benchmark numbers with respect to



and did get much better results.

Single Producer thread, No replication
1162628.470446 records/sec (110.88 MB/sec)

Single producer thread, async 3x replication
1236032.829032 records/sec (117.88 MB/sec)

Single producer thread, 3x synchronous replication
892363.156110 records/sec (85.10 MB/sec)

Three producers, 3x async replication
Aggregate: 3675805 records/sec (350MB/sec)


My question would be that for the same setup using FileStreamSourceConnector to push data from input text file to a topic I am unable to see data/packets move through network using dstat.

Further am I wrong in using the FileStreamSourceConnector to write data to topic? 

Would using Kafka Producer like 
bin/kafka-console-producer.sh --broker-list f3:9092,f4:9092,f5:9092 --topic Top < /root/kafka_2.11-0.8.2.2/test_data  

amount to the same thing ?
-
Tushar

-
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages