Advice on benchmarking configuration for Scylla

437 views
Skip to first unread message

Kirill Bogdanov

<kirill.sc@gmail.com>
unread,
Apr 22, 2017, 6:43:11 AM4/22/17
to ScyllaDB users
Hi, 

I am measuring a single node Scylla's performance on Amazon EC2 using YCSB synthetic benchmark, and I would like to get an advice regarding the setup of my experiments. 

Cluster setup:
  • Cluster is a single instance c4.xlarge 4 vCPU 7.5 GB RAM.
  • The instance has an additional XFS partition based on general purpose SSD.
  • Posix networking stack.
Benchmark configuration:
  • YCSB workload, generated using 2xC4.xlarge instances. 
  • Keyspace is 1M keys (about 1Gb total, each value is about 1.2Kb), workload is reads only with consistency ONE, key popularity is uniform
  • Median network RTT between all VMs is about 0.12 ms.
  • Performance is evaluated past the initial warm up stage (by monitoring disk IO I can see when Scylla stops reading from disk, i.e., all keys are read from memory. Reducing the size of the key space does not significantly improves performance past the warm up stage). 
  • YCSB appear not to be the bottle neck during the evaluation. 
Scylla configuration:
  • Scylla has been built from source.
  • scylla_setup has been performed.
  • I am launching Scylla as following: ./build/release/scylla --options-file conf/scylla.yaml --max-io-requests 21
  • I was trying to run with and without --poll-mode option (option did not affected results significantly).
  • Jumping ahead, I have repeated my measurements on the provided Scylla AMI instance and I got about the same performance as on my own build.
Network considerations:
  • Using iperf I measured the maximum bw between YCSBs and Scylla to be steady 1 Gbit per instance.
  • During the measurement the aggregated bw that Scylla consumed (ingress + egress) was below 500Mbit with total of 40.000 packets per second (ingress + egress counters). (this makes me believe that network is not an issue).
Measurement results:

I've been varying the umber of concurrent clients (YCSB is a closed loop generator) and the number of network connections among which these requests are sent. 
The maximum throughput that I was able to achieve is around 52.000 reads/sec (with 256 clients and 16 streams), beyond that point I was just getting an increase in queuing time without increase in throughput. 
In comparison, Cassandra achieves maximum of 36.000 reads/sec on the given cluster. 

I would like to know if the performance numbers that I have achieved are in the range of expected for the mentioned HW and posix configuration, or perhaps I have overlooked some important configuration options for Scylla ?


Side question
Q1: Scylla's AMI appear to come with posix as a default configuration, is it possible to run Scylla with DPDK on EC2? 


Thanks!
Kirill

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 22, 2017, 12:18:12 PM4/22/17
to ScyllaDB users
They are. There are other advantages of using Scylla other than sheer
throughput (like the fact that we compact faster, require less
maintenance, etc)

But from the performance PoV, our main architectural edge comes from
our linear scalability on core count. c4.xlarge has a very low core
count (2), with 4 vCPUs, so the expectation is that we'll perform
better than Cassandra, but not by a factor of 10.

Also note that c4.xlarge is technically not a supported instance, due
to the lack of local SSDs.

For more information on expectations on low end hardware, see
http://www.scylladb.com/2017/03/06/performance-report-scylla-vs-cassandra-low-end-hardware/

>
>
> Side question:
> Q1: Scylla's AMI appear to come with posix as a default configuration, is it
> possible to run Scylla with DPDK on EC2?
>
>
> Thanks!
> Kirill
>
> --
> You received this message because you are subscribed to the Google Groups
> "ScyllaDB users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scylladb-user...@googlegroups.com.
> To post to this group, send email to scyllad...@googlegroups.com.
> Visit this group at https://groups.google.com/group/scylladb-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/scylladb-users/e40f0de9-04de-4d0d-923d-b295b305ee62%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Avi Kivity

<avi@scylladb.com>
unread,
Apr 22, 2017, 12:34:23 PM4/22/17
to scylladb-users@googlegroups.com, Kirill Bogdanov
Running on EBS with general-purpose SSD will not yield good results.  Use an instance with fast local SSDs like i2 or i3 (i3 is still not supported, look for it in 1.8).

Most likely, you maxed out your volumes IOPS.  To verify, use iostat to measure IOPS and compare with the AWS numbers for you volume type and size.

Side question
Q1: Scylla's AMI appear to come with posix as a default configuration, is it possible to run Scylla with DPDK on EC2? 


Thanks!
Kirill

Kirill Bogdanov

<kirill.sc@gmail.com>
unread,
Apr 24, 2017, 2:28:00 AM4/24/17
to ScyllaDB users, kirill.sc@gmail.com
Thank you for the comments and the link to the performance report, somehow I overlooked it.

However, I don't think that SSD was an issue in my setup. I was using nmon (similar to iostat) to monitor IO and I started measuring only when all SSD activity ceased (The dataset was fitting into the memory). 

Thanks,
Kirill

Avi Kivity

<avi@scylladb.com>
unread,
Apr 24, 2017, 3:33:55 AM4/24/17
to scylladb-users@googlegroups.com, Kirill Bogdanov
Well, 52k ops/sec is low for two cores (4 vcpu = 2 cores, due to hyperthreading).  To understand more, please set up scylla-grafana-monitoring so we have visibility on what the node is doing.

Tomasz Grabiec

<tgrabiec@scylladb.com>
unread,
Apr 24, 2017, 3:39:38 AM4/24/17
to ScyllaDB users

Something to keep in mind: YCSB benchmark doesn't yet use prepared statements, so Scylla server will burn some CPU on string parsing, which it wouldn't in a typical production app or cassandra-stress.

There's work in progress in YCSB to improve this:

 

I would like to know if the performance numbers that I have achieved are in the range of expected for the mentioned HW and posix configuration, or perhaps I have overlooked some important configuration options for Scylla ?


Side question
Q1: Scylla's AMI appear to come with posix as a default configuration, is it possible to run Scylla with DPDK on EC2? 


Thanks!
Kirill

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.

Kirill Bogdanov

<kirill.sc@gmail.com>
unread,
Apr 24, 2017, 2:08:04 PM4/24/17
to ScyllaDB users

Right, the absence of prepare statements could be an issue here indeed. I did not know that it YCSB and stress tool are differ in this way. 

Please see three of my grafana snapshots below. The actual workload starts at 19.45 and lasts for 10 minutes. 
  1. metrics
  2. metrics per server
  3. IO
Basically, I am trying to understand where I am hitting the bottleneck in my setup. The scenario is generally as I described in my first email, using Scylla AMI, with only difference that I set the key space to only 10.000 keys (to make sure everything is cached and to avoid long warm up time with single SSD). 

The only delay that I am observing from these graphs is "Query I/O Queue delay" which stays statically around 2 ms. To me it looks like graph just doesn't have any samples thus it uses the latest value (i.e., 2 ms) but practically there is no I/O. 

Could you tell if anything looks abnormal on these graphs? 

Thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 24, 2017, 2:10:24 PM4/24/17
to ScyllaDB users
On Mon, Apr 24, 2017 at 2:08 PM, Kirill Bogdanov <kiri...@gmail.com> wrote:
>
> Right, the absence of prepare statements could be an issue here indeed. I
> did not know that it YCSB and stress tool are differ in this way.
>
> Please see three of my grafana snapshots below. The actual workload starts
> at 19.45 and lasts for 10 minutes.
>
> metrics
> metrics per server
> IO
>
> Basically, I am trying to understand where I am hitting the bottleneck in my
> setup. The scenario is generally as I described in my first email, using
> Scylla AMI, with only difference that I set the key space to only 10.000
> keys (to make sure everything is cached and to avoid long warm up time with
> single SSD).
>
> The only delay that I am observing from these graphs is "Query I/O Queue
> delay" which stays statically around 2 ms. To me it looks like graph just
> doesn't have any samples thus it uses the latest value (i.e., 2 ms) but
> practically there is no I/O.
>
> Could you tell if anything looks abnormal on these graphs?
>
> Thanks!
>

Before I even look at that, can you share the contents of the following file:
/etc/scylla.d/cpuset.conf ?
> https://groups.google.com/d/msgid/scylladb-users/f07e6e3e-ce80-4fc8-96bb-d2405ba56723%40googlegroups.com.

Tomasz Grabiec

<tgrabiec@scylladb.com>
unread,
Apr 24, 2017, 2:42:10 PM4/24/17
to ScyllaDB users
On Mon, Apr 24, 2017 at 8:08 PM, Kirill Bogdanov <kiri...@gmail.com> wrote:

Right, the absence of prepare statements could be an issue here indeed. I did not know that it YCSB and stress tool are differ in this way. 

Please see three of my grafana snapshots below. The actual workload starts at 19.45 and lasts for 10 minutes. 

It looks like the server is CPU-bound, with reactor load around 100%. All reads are served from memory, no reads from disk.

I would check the output of mpstat to see if kernel/irqs are not to blame:

  mpstat -P ALL 1 

then profile using perf:

  perf record --call-graph dwarf -p `pgrep scylla`
  ^C

For visualizing I recommend https://github.com/brendangregg/FlameGraph


To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.

To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.

Kirill Bogdanov

<kirill.sc@gmail.com>
unread,
Apr 25, 2017, 8:12:49 AM4/25/17
to ScyllaDB users
I have re-run the measurement with the following results:

1. content of the /etc/scylla.d/cpuset.conf 
CPUSET="--cpuset 0-3 "

2. The following was recorded during the workload 

  mpstat -P ALL 1 
11:57:49 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:57:50 AM  all   84,50    0,00   15,50    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:50 AM    0   85,00    0,00   15,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:50 AM    1   84,16    0,00   15,84    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:50 AM    2   84,00    0,00   16,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:50 AM    3   85,00    0,00   15,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:57:51 AM  all   84,75    0,00   15,25    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:51 AM    0   84,00    0,00   16,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:51 AM    1   85,86    0,00   14,14    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:51 AM    2   85,00    0,00   15,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00
11:57:51 AM    3   84,00    0,00   16,00    0,00    0,00    0,00    0,00    0,00    0,00    0,00

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all   94,93    0,00    2,82    0,01    0,00    2,23    0,00    0,00    0,00    0,01
Average:       0   92,48    0,00    3,06    0,01    0,00    4,45    0,00    0,00    0,00    0,00
Average:       1   97,17    0,00    2,81    0,02    0,00    0,00    0,00    0,00    0,00    0,00
Average:       2   92,82    0,00    2,68    0,01    0,00    4,48    0,00    0,00    0,00    0,01
Average:       3   97,26    0,00    2,72    0,01    0,00    0,00    0,00    0,00    0,00    0,01

3. I am attaching 10 seconds of perf record converted to  flame plot + folded output (in case you need to re-plot the data in a different way).

out.folded
plot.svg

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2017, 1:16:47 AM4/28/17
to ScyllaDB users
Hi Kirill,

Your initial question was if those numbers are inline with expectations.
After looking at it again, here's my conclusion2:

1) CPU bound workloads with small partitions (around 500 bytes) that
cassandra-stress generates can do around 35 - 40k req/sec per core.
You have 2 cores, so that would be 80 k/sec, with hyperthread
sometimes adding some 20 % -> so around 100 k give or take.

2) On EC2, only CPU0 can take interrupts, which is why we usually
leave it out of the run. But for 4 vCPUs, we don't isolate it, because
that would represent a very large chunk of the node's processing
power. So one of the shards will be slow, end end up slowing the rest
a bit.

3) Your benchmark keys are bigger, which reduce the throughput naturally

4) Your benchmark doesn't issue prepared statements.

Reviewing your data and graphs, I personally saw nothing out of the
ordinary. So ~ 52k requests for that scenario seems to me like it is
inline with expectations. Note that a lot of the Scylla advantage
comes from scalability, so as the machines grow larger, you should see
this gap improving.

Hope that helps.
> https://groups.google.com/d/msgid/scylladb-users/ff462fa9-5596-43ba-8d96-f30a8666820a%40googlegroups.com.

Kirill Bogdanov

<kirill.sc@gmail.com>
unread,
Apr 28, 2017, 2:33:02 AM4/28/17
to ScyllaDB users
Hi Glauber,

Thank you for the detailed reply. 

I was primarily concerned with the possibility of me overlooking some important configuration options (in Scylla or Linux), instead now I see the significant impact of the overall experiment setup (hardware, workload, etc.) which affected the performance. 
Moreover,  I got an idea of where (and how) to look for potential bottlenecks around Scylla.  

I completely agree that the setup described above does not capture all Scylla's features nor it gives a comprehensive evaluation, but that was never the intention of my measurements. 

Thanks, 
Kirill
Reply all
Reply to author
Forward
0 new messages