Hi all,
Thanks to Chris i have optimized the embedded debezium without throughput impact, i'm using the following configuration:
DEBEZIUM_FIXED_INCOME_MIN_BATCH_SIZE=50000
DEBEZIUM_FIXED_INCOME_MAX_BATCH_SIZE=20000000
DEBEZIUM_FIXED_INCOME_DEFAULT_BATCH_SIZE=500000
DEBEZIUM_FIXED_INCOME_MIN_SLEEP_TIME=1000
DEBEZIUM_FIXED_INCOME_MAX_SLEEP_TIME=3000
DEBEZIUM_FIXED_INCOME_SLEEP_TIME_DEFAULT=1500
DEBEZIUM_FIXED_INCOME_POLL_INTERVAL_MS=5000
DEBEZIUM_FIXED_INCOME_CACHE_SIZE=100000
DEBEZIUM_FIXED_INCOME_MAX_QUEUE_SIZE=49152
DEBEZIUM_FIXED_INCOME_STREAM_MAX_BATCH_SIZE=16384
DEBEZIUM_FIXED_INCOME_FETCH_SIZE=50000
From this point on things start getting hard, we have a high volume at 11 am that keeps till 3 pm, the lag from source keeps growing until hits 2 hours. Sometimes after some time it recovers but in most of it it kept this lag till the end of the day.
I export the metrics and these are the values at this time interval:
last_duration_of_fetch_query_in_milliseconds = 49 seconds
average_batch_processing_throughput = 84
total_parse_time_in_milliseconds = 5.84min
queue_remaining_capacity = min of 48475 and max 49152
max_duration_of_fetch_query_in_milliseconds = 3.44 min
last_duration_of_fetch_query_in_milliseconds = 52 seconds
Is there a way to improve the configuration without interfere in the throughput? As far as i can see the queue is not being filled enough is that a problem? Or should i improve the batch size?