Load Testing || KsqlDB Pull query response time

Anup Tiwari

unread,

Oct 11, 2020, 6:40:28 AM10/11/20

to ksqldb-users

Hi Team,

Actually i am working on optimising pull query response time and while doing load testing i have observed below results w.r.t. respective configuration.

Below are Jmeter configs :-

jmeter.sh -JtargetConcurrency=100 -JrampUpTime=10 -JrampUpStepsCount=10 -JholdTargetRateTime=130 -JtargetThroughput=6000

-- ===================== Observation 1 ===============================

Configs :

ksql.streams.num.stream.threads=1
ksql.streams.cache.max.bytes.buffering=400000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

-- ===================== Observation 2 ===============================

Configs :

ksql.streams.num.stream.threads=5
ksql.streams.cache.max.bytes.buffering=400000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

-- ===================== Observation 3 ===============================

Configs :

ksql.streams.num.stream.threads=5
ksql.streams.cache.max.bytes.buffering=2147483648 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

-- ===================== Observation 4 ===============================

Configs :

ksql.streams.num.stream.threads=1
ksql.streams.cache.max.bytes.buffering=800000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

Here i have 3 node KsqlDB cluster and i can see that memory and CPU both are under 25% utilisation. So i have 2 questions :-

1. May I know why increasing the value of "ksql.streams.num.stream.threads" is impacting pull query performance(Increasing response time) ?

2. How to reduce response time further i.e. 95th pct / 99th pct ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,

Oct 12, 2020, 12:39:59 PM10/12/20

to ksqldb-users

Hi Anup,

For 1), in general increasing the number of streams threads increases the amount of CPU spent processing the existing topologies and spent doing disk IO as it saves to state stores. If you're running everything on a machine with one disk, it's possible that increasing the number of threads increases the seeks that you're doing simultaneously and hurts the performance of reads done during a pull query. If you want to minimize interference, only run queries that are necessary to build the state store you're querying with the pull query. Also, as you've done, increasing the cache will help minimize this IO penalty, which it has.

2) With our benchmarking, depending on how much data we produce alongside our pull queries, we're able to get p99s with sub 50ms latencies under high throughput. Are you running each KSQL node on a separate physical machine and do they share the machine with other processes that might be using disk? Also, I would not set ksql.streams.commit.interval.ms since I think decreasing from the default 2000 can have negative performance impacts.

Alan

Anup Tiwari

unread,

Oct 12, 2020, 6:39:19 PM10/12/20

to Alan Sheinberg, ksqldb-users

Hi Alan,

W.r.t point 1,

(i) Actually I have multiple disks but ksqldb state store is writing on single disk/machine with 250 GB space (GP2 SSD) and 750 iops.

(ii) Also I am running only those queries that are necessary to build the state store I am querying with the pull query.

(iii) Also I haven't seen significant improvement when I increase cache size as you can see in results shown in trail mail(observation 4).

(iv) Also could you please describe your point more ? -->

"If you're running everything on a machine with one disk, it's possible that increasing the number of threads increases the seeks that you're doing simultaneously and hurts the performance of reads done during a pull query. "

W.r.t point 2,

(i) Yes I am running each KSQL node on a separate physical machine and they DO NOT share the machine with other processes and are dedicated for this job only.

(ii) Will try setting ksql.streams.commit.interval.ms to default and run benchmarking again. May I know how reducing this to default can impact performance? I did it because I wanted updates to be reflected asap.

(iii) could you please share your benchmarking results?

Just wanted to mention one last things i.e. increasing threads doesn't translates into more cpu utilisation in my case and I can see that cpu and memory utilisations are under 20 % for 100 rps.

--
You received this message because you are subscribed to the Google Groups "ksqldb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-users/3067b2cf-13d4-4ac2-b49d-f7a132fce6b6n%40googlegroups.com.

Anup Tiwari

unread,

Oct 13, 2020, 3:41:12 AM10/13/20

to Alan Sheinberg, ksqldb-users

Also, I set commit.interval to default(2000 ms) and reset the load testing for 100 Rps but can't see improvement in 90 / 95 / 99 pct. Please find below metrics :-

Regards,
Anup Tiwari

Anup Tiwari

unread,

Oct 14, 2020, 9:32:52 PM10/14/20

to Alan Sheinberg, ksqldb-users

Hi Alan / Team,

Could you please check this once and guide me?

Anup Tiwari

unread,

Oct 19, 2020, 1:10:29 AM10/19/20

to Alan Sheinberg, ksqldb-users

Hi Team,

Could you please check this once and guide me here ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,

Oct 19, 2020, 1:07:20 PM10/19/20

to Anup Tiwari, ksqldb-users

Hi Anup,

Sorry for the delay. It can be a bit challenging to troubleshoot performance.

What kind of machines are you using to run these benchmarks? Are they cloud instances or are they on prem machines? What kind of CPU and how much memory is available to the process? Depending on how you're benchmarking, things can sometimes be cpu bound, so this can end up being a factor.

Are you running KSQL in a cluster with multiple nodes? Are the nodes on the same local network? If you are doing pull queries that get forwarded between nodes, there can be a big performance penalty if the networking isn't configured to be very fast between nodes in a cluster. You can test this by pinging one node from the other.

Thanks,

Alan

Anup Tiwari

unread,

Oct 20, 2020, 4:08:13 AM10/20/20

to Alan Sheinberg, ksqldb-users

Hi Alan,
Thanks for the response. Please find answers to your questions :-

We are on aws and cluster configuration is as described in the table.

Service	Instance Type	Config	Number of Instances	Disk Size
KsqlDB	r5.2xlarge	8 CPU / 64 GB RAM	3	400 GB / Node (GP2)

These machines are in the same VPC and while load testing we have seen ~20% CPU and ~55% memory usage(This seems mainly due to configured KSQL_HEAP_OPTS="-Xms30G -Xmx45G") on each node.

Regards,
Anup Tiwari

Alan Sheinberg

unread,

Oct 20, 2020, 1:15:33 PM10/20/20

to Anup Tiwari, ksqldb-users

Hi Anup,

That seems fairly reasonable to me.

I'm a bit surprised that you're only seeing 20% CPU load when running these benchmarks. For a workload of single key pull queries, we've been seeing things fairly CPU bound for smaller tables, so that suggests that maybe your data is large.

How large are your table records and what's the number of unique elements of the table? You might want to try benchmarking with a small table that can largely fit in memory and see if your p99 comes down a lot. We've done a lot of testing with tables of various sizes, though usually with small records. Maybe the numbers you're seeing are more reasonable if the rows are rather large.

Thanks,

Alan

Anup Tiwari

unread,

Oct 21, 2020, 1:14:19 AM10/21/20

to Alan Sheinberg, ksqldb-users

Hi Alan,

When you say small / large table, how many columns / rows are you referring to ?

Please find below table description(rowkey=userid) on which we are running benchmarking.

In the above table we have around 10 million unique rowkey(userid) and this number will grow w.r.t. time.

Regards,
Anup Tiwari

Anup Tiwari

unread,

Oct 26, 2020, 1:11:34 PM10/26/20

to Alan Sheinberg, ksqldb-users

Hi Alan,

Did you got a chance to look into this?

Anup Tiwari

unread,

Nov 10, 2020, 2:23:07 AM11/10/20

to Alan Sheinberg, ksqldb-users

Hi Team,

Can someone check this and revert ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,

Dec 8, 2020, 12:00:11 PM12/8/20

to Anup Tiwari, ksqldb-users

Hi Anup,

That table you're using seems pretty reasonable in size, so I would guess it's probably not that. It's really hard to know how to adjust every config and how it would affect throughputs and latencies. I think ksql.streams.cache.max.bytes.buffering=400000000 is probably not that large on a machine with many Gigs of memory. I would try increasing to 2 or 4GB of cache. Also, I'm not sure what version of ksql you're using, but we changed this config to be ksql.plugins.rocksdb.cache.size at some point. I would make sure you're setting the right config for your ksql version. If you throw more memory at it, it should lower latency. I would try that, and from there, you can always tune it down, if possible.

Alan

On Tue, Dec 8, 2020 at 1:28 AM Anup Tiwari <anupsd...@gmail.com> wrote:

Hi Alan,

Can you guide me here ?

Regards,
Anup Tiwari

Reply all

Reply to author

Forward