regarding ScyllaDB vs. Cassanrda bench-marking

227 views
Skip to first unread message

ashmahg19@gmail.com

<ashmahg19@gmail.com>
unread,
Nov 30, 2016, 2:32:22 PM11/30/16
to ScyllaDB users
I was looking into the ScyllaDB vs. Cassandra bench-marking in "http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/" and I noticed that in Cassandra's reads results part (in the reading phase after population), the disk writes Bps is around 25M on average while the disk reads Bps is almost zero (except at the end of the simulation).

My question is about what is being written to disk while performing read-only stress operations ? and how ScyllaDB's caching provides better read performance while Cassandra nearly doesn't access the disk at all ? 

Regards
Ashraf

Tomasz Grabiec

<tgrabiec@scylladb.com>
unread,
Nov 30, 2016, 2:49:34 PM11/30/16
to scylladb-users@googlegroups.com
On Wed, Nov 30, 2016 at 8:32 PM, <ashm...@gmail.com> wrote:
I was looking into the ScyllaDB vs. Cassandra bench-marking in "http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/" and I noticed that in Cassandra's reads results part (in the reading phase after population), the disk writes Bps is around 25M on average while the disk reads Bps is almost zero (except at the end of the simulation).

My question is about what is being written to disk while performing read-only stress operations ?

Those writes come from sstable compaction which follows after the population phase.
 
and how ScyllaDB's caching provides better read performance while Cassandra nearly doesn't access the disk at all ? 

Note that ScyllaDB provides better performance not only due to caching. 

Both ScyllaDB and Cassandra don't issue disk reads in the read benchmark because the data set fits in memory.

As for differences in caching, Cassandra relies on system page cache, where individual pages of sstable files are cached. Scylla manages its own case which holds merged partitions, not sstable file pages.

Ashraf Mahgoub

<ashmahg19@gmail.com>
unread,
Nov 30, 2016, 8:59:48 PM11/30/16
to scylladb-users@googlegroups.com
OK so this brings a couple of questions:

1- So this means that if the stress-tool waited for some time between populating the table and starting the read stress test giving some time for the compaction to take place, then there will be no disk writes ? 

2- if both ScyallDB and Cassandra are reading from main memory, how come the writes throughput is twice as the reads (at least for Cassandra's case). I know that in writes, disk access is continuous for adding records to the commit-log (in addition to memtables flushing) and if all the data fits in memory then reads should be faster. 

Best
Ashraf 

--
You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/mdhTqwZBt4I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAO2XSW64OWe%3DAJBa_-qUBf%3D%2BHNJ3p5C4Xu3pGpe-AqM5%2Bm5dMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Tomasz Grabiec

<tgrabiec@scylladb.com>
unread,
Dec 1, 2016, 5:47:00 AM12/1/16
to scylladb-users@googlegroups.com
On Thu, Dec 1, 2016 at 2:59 AM, Ashraf Mahgoub <ashm...@gmail.com> wrote:
OK so this brings a couple of questions:

1- So this means that if the stress-tool waited for some time between populating the table and starting the read stress test giving some time for the compaction to take place, then there will be no disk writes ? 

Yes, you can see this on the benchmark page on this graph for the Cassandra run:

Inline image 1

At 9:30 AM the writes cease and read throughput peaks at almost 200k ops/s.
 

2- if both ScyallDB and Cassandra are reading from main memory, how come the writes throughput is twice as the reads (at least for Cassandra's case). 
I know that in writes, disk access is continuous for adding records to the commit-log (in addition to memtables flushing) and if all the data fits in memory then reads should be faster. 

First thing to note is that Cassandra is not pushing the disk in the write benchmark to its bandwidth limit. Scylla is able to push it a lot further. So disk is likely not the limiting factor for writes in Cassandra case.

Another thing is that reads may need to merge data from multiple sstables for a single read operation. You can see on the graph above that as compaction progresses reads are getting faster. 


ashmahg19@gmail.com

<ashmahg19@gmail.com>
unread,
Dec 6, 2016, 12:53:15 PM12/6/16
to ScyllaDB users
OK I seen now, Thank you so much.

Best
Ashraf Mahgoub
Reply all
Reply to author
Forward
0 new messages