Load Testing || KsqlDB Pull query response time

629 views
Skip to first unread message

Anup Tiwari

unread,
Oct 11, 2020, 6:40:28 AM10/11/20
to ksqldb-users
Hi Team,

Actually i am working on optimising pull query response time and while doing load testing i have observed below results w.r.t. respective configuration.

Below are Jmeter configs :-

jmeter.sh -JtargetConcurrency=100 -JrampUpTime=10 -JrampUpStepsCount=10 -JholdTargetRateTime=130 -JtargetThroughput=6000

-- ===================== Observation 1 ===============================

Configs :

ksql.streams.num.stream.threads=1
ksql.streams.cache.max.bytes.buffering=400000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

image.png


-- ===================== Observation 2 ===============================

Configs :

ksql.streams.num.stream.threads=5
ksql.streams.cache.max.bytes.buffering=400000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

image.png


-- ===================== Observation 3 ===============================

Configs :

ksql.streams.num.stream.threads=5
ksql.streams.cache.max.bytes.buffering=2147483648 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

image.png


-- ===================== Observation 4 ===============================

Configs :

ksql.streams.num.stream.threads=1
ksql.streams.cache.max.bytes.buffering=800000000 # it serves as a read cache to speed up reading data from a state store
ksql.streams.commit.interval.ms=1000 # For shorter intervals between updates

Result :

image.png


Here i have 3 node KsqlDB cluster and i can see that memory and CPU both are under 25% utilisation. So i have 2 questions :-

1. May I know why increasing the value of "ksql.streams.num.stream.threads" is impacting pull query performance(Increasing response time) ?
2. How to reduce response time further i.e. 95th pct / 99th pct ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,
Oct 12, 2020, 12:39:59 PM10/12/20
to ksqldb-users
Hi Anup,

For 1), in general increasing the number of streams threads increases the amount of CPU spent processing the existing topologies and spent doing disk IO as it saves to state stores.  If you're running everything on a machine with one disk, it's possible that increasing the number of threads increases the seeks that you're doing simultaneously and hurts the performance of reads done during a pull query.  If you want to minimize interference, only run queries that are necessary to build the state store you're querying with the pull query.  Also, as you've done, increasing the cache will help minimize this IO penalty, which it has.

2) With our benchmarking, depending on how much data we produce alongside our pull queries, we're able to get p99s with sub 50ms latencies under high throughput.  Are you running each KSQL node on a separate physical machine and do they share the machine with other processes that might be using disk?  Also, I would not set ksql.streams.commit.interval.ms since I think decreasing from the default 2000 can have negative performance impacts.

Alan

Anup Tiwari

unread,
Oct 12, 2020, 6:39:19 PM10/12/20
to Alan Sheinberg, ksqldb-users
Hi Alan,

W.r.t point 1, 
(i) Actually I have multiple disks but ksqldb state store is writing on single disk/machine with 250 GB space (GP2 SSD) and 750 iops. 
(ii) Also I am running only those queries that are necessary to build the state store I am querying with the pull query. 
(iii) Also I haven't seen significant improvement when I increase cache size as you can see in results shown in trail mail(observation 4). 
(iv) Also could you please describe your point more ? --> 
"If you're running everything on a machine with one disk, it's possible that increasing the number of threads increases the seeks that you're doing simultaneously and hurts the performance of reads done during a pull query. "




W.r.t point 2, 
(i) Yes I am running each KSQL node on a separate physical machine and they DO NOT share the machine with other processes and are dedicated for this job only.
(ii)  Will try setting ksql.streams.commit.interval.ms to default and run benchmarking again. May I know how reducing this to default can impact performance?  I did it because I wanted updates to be reflected asap. 
(iii) could you please share your benchmarking results? 

Just wanted to mention one last things i.e. increasing threads doesn't translates into more cpu utilisation in my case and I can see that cpu and memory utilisations are under 20 % for 100 rps. 



--
You received this message because you are subscribed to the Google Groups "ksqldb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-users/3067b2cf-13d4-4ac2-b49d-f7a132fce6b6n%40googlegroups.com.

Anup Tiwari

unread,
Oct 13, 2020, 3:41:12 AM10/13/20
to Alan Sheinberg, ksqldb-users
Also, I set commit.interval to default(2000 ms) and reset the load testing for 100 Rps but can't see improvement in 90 / 95 / 99 pct. Please find below metrics :-

image.png


Regards,
Anup Tiwari

Anup Tiwari

unread,
Oct 14, 2020, 9:32:52 PM10/14/20
to Alan Sheinberg, ksqldb-users
Hi Alan / Team, 

Could you please check this once and guide me? 

Anup Tiwari

unread,
Oct 19, 2020, 1:10:29 AM10/19/20
to Alan Sheinberg, ksqldb-users
Hi Team,

Could you please check this once and guide me here ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,
Oct 19, 2020, 1:07:20 PM10/19/20
to Anup Tiwari, ksqldb-users
Hi Anup,

Sorry for the delay.  It can be a bit challenging to troubleshoot performance.

What kind of machines are you using to run these benchmarks?  Are they cloud instances or are they on prem machines?  What kind of CPU and how much memory is available to the process?  Depending on how you're benchmarking, things can sometimes be cpu bound, so this can end up being a factor.

Are you running KSQL in a cluster with multiple nodes? Are the nodes on the same local network?  If you are doing pull queries that get forwarded between nodes, there can be a big performance penalty if the networking isn't configured to be very fast between nodes in a cluster.  You can test this by pinging one node from the other.

Thanks,
Alan


Anup Tiwari

unread,
Oct 20, 2020, 4:08:13 AM10/20/20
to Alan Sheinberg, ksqldb-users
Hi Alan,
Thanks for the response. Please find answers to your questions :-
We are on aws and cluster configuration is as described in the table.

Service Instance Type Config Number of Instances Disk Size
KsqlDB r5.2xlarge 8 CPU / 64 GB RAM 3 400 GB / Node (GP2)


These machines are in the same VPC and while load testing we have seen ~20% CPU and ~55% memory usage(This seems mainly due to configured KSQL_HEAP_OPTS="-Xms30G -Xmx45G") on each node.


Regards,
Anup Tiwari

Alan Sheinberg

unread,
Oct 20, 2020, 1:15:33 PM10/20/20
to Anup Tiwari, ksqldb-users
Hi Anup,

That seems fairly reasonable to me.

I'm a bit surprised that you're only seeing 20% CPU load when running these benchmarks.  For a workload of single key pull queries, we've been seeing things fairly CPU bound for smaller tables, so that suggests that maybe your data is large.

How large are your table records and what's the number of unique elements of the table?  You might want to try benchmarking with a small table that can largely fit in memory and see if your p99 comes down a lot.  We've done a lot of testing with tables of various sizes, though usually with small records.  Maybe the numbers you're seeing are more reasonable if the rows are rather large.

Thanks,
Alan

Anup Tiwari

unread,
Oct 21, 2020, 1:14:19 AM10/21/20
to Alan Sheinberg, ksqldb-users
Hi Alan,

When you say small / large table, how many columns / rows are you referring to ?
Please find below table description(rowkey=userid) on which we are running benchmarking.

describe realtime_datapoints_final ;

Name                 : realtime_datapoints_final
 Field                 | Type                      
---------------------------------------------------
 ROWTIME               | BIGINT           (system)
 ROWKEY                | BIGINT           (system)
 USERID                | BIGINT                    
 LAST_TEN_AC           | VARCHAR(STRING)          
 LAST_TEN_AC           | VARCHAR(STRING)          
 STATE                 | VARCHAR(STRING)          
 LATEST_TIME           | BIGINT                    
 R_LATEST_TIME         | VARCHAR(STRING)          
 LAST_ACAMT            | DOUBLE                    
 NUMCG                 | BIGINT                    
 NUMWWCG               | BIGINT                    
 PAW                   | DOUBLE                    
 NAW                   | DOUBLE                    
 NAL                   | DOUBLE                    
 NWG                   | BIGINT                    
 NLG                   | BIGINT     

In the above table we have around 10 million unique rowkey(userid) and this number will grow w.r.t. time.

Regards,
Anup Tiwari

Anup Tiwari

unread,
Oct 26, 2020, 1:11:34 PM10/26/20
to Alan Sheinberg, ksqldb-users
Hi Alan, 

Did you got a chance to look into this? 

Anup Tiwari

unread,
Nov 10, 2020, 2:23:07 AM11/10/20
to Alan Sheinberg, ksqldb-users
Hi Team,

Can someone check this and revert ?

Regards,
Anup Tiwari

Alan Sheinberg

unread,
Dec 8, 2020, 12:00:11 PM12/8/20
to Anup Tiwari, ksqldb-users
Hi Anup,

That table you're using seems pretty reasonable in size, so I would guess it's probably not that.  It's really hard to know how to adjust every config and how it would affect throughputs and latencies.  I think ksql.streams.cache.max.bytes.buffering=400000000 is probably not that large on a machine with many Gigs of memory.  I would try increasing to 2 or 4GB of cache.  Also, I'm not sure what version of ksql you're using, but we changed this config to be ksql.plugins.rocksdb.cache.size at some point.  I would make sure you're setting the right config for your ksql version.  If you throw more memory at it, it should lower latency.  I would try that, and from there, you can always tune it down, if possible.

Alan
 

On Tue, Dec 8, 2020 at 1:28 AM Anup Tiwari <anupsd...@gmail.com> wrote:
Hi Alan,

Can you guide me here ?

Regards,
Anup Tiwari

Reply all
Reply to author
Forward
0 new messages