High memory usage and rows+partitions evictions (row remove from cache)

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 11, 2019, 8:33:01 PM3/11/19

to ScyllaDB users

Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

These row removals impacting on the read operation which lead to a very high read latency.

Screenshot at Mar 12 08-26-31.png

Thanks

Regard,

Phuc

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 4:25:24 AM3/12/19

to ScyllaDB users

Hi,

Which Scylla version are you using?

On Tue, Mar 12, 2019 at 1:33 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:

Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

You can only disable the cache altogether using --enable-cache=0.

When enabled, the cache will take all the free space, and shrink on external demand.

These row removals impacting on the read operation which lead to a very high read latency.

Are you sure the latency is due to removals, and not due to the fact that data is not in cache after it's evicted and thus reads need to go to disk now?

In such a case, caching less would make the problem worse.

Thanks
Regard,
Phuc

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/f19ca982-c3e3-4a86-b057-81c5cbd2ee51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 4:36:30 AM3/12/19

to ScyllaDB users

Hi Tomasz,

On Tuesday, March 12, 2019 at 4:25:24 PM UTC+8, Tomasz Grabiec wrote:

Hi,

Which Scylla version are you using?

I'm using 3.0.3

On Tue, Mar 12, 2019 at 1:33 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

You can only disable the cache altogether using --enable-cache=0.

When enabled, the cache will take all the free space, and shrink on external demand.

These row removals impacting on the read operation which lead to a very high read latency.

Are you sure the latency is due to removals, and not due to the fact that data is not in cache after it's evicted and thus reads need to go to disk now?
In such a case, caching less would make the problem worse.

Yes, as something I found on scylladb official site(https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/), the caching uses LRU to keep the row data remain to serve the next read operations when the partitions are large enough will trigger the rows evictions and row removals which making the high latency.

Could I do anything to optimize the caching system for a lower latency & performance?

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 5:04:05 AM3/12/19

to ScyllaDB users

On Tue, Mar 12, 2019 at 9:36 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:

Hi Tomasz,

On Tuesday, March 12, 2019 at 4:25:24 PM UTC+8, Tomasz Grabiec wrote:
Hi,

Which Scylla version are you using?
I'm using 3.0.3

On Tue, Mar 12, 2019 at 1:33 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

You can only disable the cache altogether using --enable-cache=0.

When enabled, the cache will take all the free space, and shrink on external demand.

These row removals impacting on the read operation which lead to a very high read latency.

Are you sure the latency is due to removals, and not due to the fact that data is not in cache after it's evicted and thus reads need to go to disk now?
In such a case, caching less would make the problem worse.

Yes, as something I found on scylladb official site(https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/), the caching uses LRU to keep the row data remain to serve the next read operations when the partitions are large enough will trigger the rows evictions and row removals which making the high latency.

Could I do anything to optimize the caching system for a lower latency & performance?

Normally, eviction shouldn't significantly affect latency. How much of an impact are we talking about? What's your per-shard CPU utilization before and during eviction?

From your graph I can see that you have single-row partitions. How large are the rows?

Do you run with default block detector settings? You could try running with --blocked-reactor-notify-ms=2 --blocked-reactor-reports-per-minute=10000 to catch any task running longer than 2ms.

Maybe your CPUs run close to capacity and get overutilized when the server switches to eviction mode.

Can you share all your prometheus metrics, or access to grafana? It's not clear if eviction happens on read (synchronous) or during memtable flush (background).

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 6:47:37 AM3/12/19

to ScyllaDB users

On Tuesday, March 12, 2019 at 5:04:05 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 9:36 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi Tomasz,

On Tuesday, March 12, 2019 at 4:25:24 PM UTC+8, Tomasz Grabiec wrote:
Hi,

Which Scylla version are you using?
I'm using 3.0.3

On Tue, Mar 12, 2019 at 1:33 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

You can only disable the cache altogether using --enable-cache=0.

When enabled, the cache will take all the free space, and shrink on external demand.

These row removals impacting on the read operation which lead to a very high read latency.

Are you sure the latency is due to removals, and not due to the fact that data is not in cache after it's evicted and thus reads need to go to disk now?
In such a case, caching less would make the problem worse.

Yes, as something I found on scylladb official site(https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/), the caching uses LRU to keep the row data remain to serve the next read operations when the partitions are large enough will trigger the rows evictions and row removals which making the high latency.

Could I do anything to optimize the caching system for a lower latency & performance?

Normally, eviction shouldn't significantly affect latency. How much of an impact are we talking about? What's your per-shard CPU utilization before and during eviction?

It's almost 98%, impacting around 200mil of data reading operations

From your graph I can see that you have single-row partitions. How large are the rows?

The rows have only 2 column: key(text), value and key column is primary key

Do you run with default block detector settings? You could try running with --blocked-reactor-notify-ms=2 --blocked-reactor-reports-per-minute=10000 to catch any task running longer than 2ms.

I'm not sure but let me try.

Maybe your CPUs run close to capacity and get overutilized when the server switches to eviction mode.

Could you explain more detail about it?

Can you share all your prometheus metrics, or access to grafana? It's not clear if eviction happens on read (synchronous) or during memtable flush (background).

Sure, I just opened the grafana to the world: http://54.169.83.176:3000/d/-sd5YEjmz/scylla-per-server-metrics-2-2?orgId=1&from=now-12h&to=now&refresh=30s&var-monitor_disk=md0&var-monitor_network_interface=eth0&var-by=shard&var-cluster=Test%20Cluster%20I3.4x%20AMI&var-dc=All&var-node=All&var-shard=All

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 7:03:19 AM3/12/19

to ScyllaDB users

On Tue, Mar 12, 2019 at 11:47 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:

On Tuesday, March 12, 2019 at 5:04:05 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 9:36 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi Tomasz,

On Tuesday, March 12, 2019 at 4:25:24 PM UTC+8, Tomasz Grabiec wrote:
Hi,

Which Scylla version are you using?
I'm using 3.0.3

On Tue, Mar 12, 2019 at 1:33 AM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Hi guys,

I'm getting this issue on both write & read

Is there any way to reduce the size of the rows, partitions for caching in memory. (In configuration scylla.yaml or something I'm not sure)

You can only disable the cache altogether using --enable-cache=0.

When enabled, the cache will take all the free space, and shrink on external demand.

These row removals impacting on the read operation which lead to a very high read latency.

Are you sure the latency is due to removals, and not due to the fact that data is not in cache after it's evicted and thus reads need to go to disk now?
In such a case, caching less would make the problem worse.

Yes, as something I found on scylladb official site(https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/), the caching uses LRU to keep the row data remain to serve the next read operations when the partitions are large enough will trigger the rows evictions and row removals which making the high latency.

Could I do anything to optimize the caching system for a lower latency & performance?

Normally, eviction shouldn't significantly affect latency. How much of an impact are we talking about? What's your per-shard CPU utilization before and during eviction?
It's almost 98%, impacting around 200mil of data reading operations

Is that before or during eviction?

What's the latency before and during?

From your graph I can see that you have single-row partitions. How large are the rows?
The rows have only 2 column: key(text), value and key column is primary key

What's the size in bytes of the value?

Do you run with default block detector settings? You could try running with --blocked-reactor-notify-ms=2 --blocked-reactor-reports-per-minute=10000 to catch any task running longer than 2ms.
I'm not sure but let me try.

Maybe your CPUs run close to capacity and get overutilized when the server switches to eviction mode.
Could you explain more detail about it?

From queueing theory, it follows that as the utilization approaches 100% the queue size (which directly impacts latency) grows to infinity. If you have a limited concurrency, it's not infinity, but the amount of external concurrency. It could be that when you're at 80%, your latencies are low, because the queue doesn't form. When you make requests slightly more expensive, you reach saturation and suddenly you have a long queue of requests.

Can you share all your prometheus metrics, or access to grafana? It's not clear if eviction happens on read (synchronous) or during memtable flush (background).

Sure, I just opened the grafana to the world: http://54.169.83.176:3000/d/-sd5YEjmz/scylla-per-server-metrics-2-2?orgId=1&from=now-12h&to=now&refresh=30s&var-monitor_disk=md0&var-monitor_network_interface=eth0&var-by=shard&var-cluster=Test%20Cluster%20I3.4x%20AMI&var-dc=All&var-node=All&var-shard=All

--

You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/1e2bb5b1-e21f-4cff-931e-3928ceae67e3%40googlegroups.com.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 7:15:11 AM3/12/19

to ScyllaDB users

What time range in UTC should I be looking at?

I checked Mar 12 00:24 UTC and there are only writes, no reads, and eviction doesn't seem to cause the latencies to increase, they're still dropping, according to the "Scylla Overview Metrics 2.3" dashboard.

Are you concerned with write latency or read latency?

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 8:58:11 AM3/12/19

to ScyllaDB users

Hi Tomasz,

Sorry for the tracking board too confusing,

I will set up another instance then run again, pls give me 30mins.

Thanks

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 10:02:49 AM3/12/19

to ScyllaDB users

Could you access to this board: http://54.169.83.176:3000/d/-sd5YEjmz/scylla-per-server-metrics-2-2?orgId=1&from=now-5m&to=now&refresh=30s&var-monitor_disk=md0&var-monitor_network_interface=eth0&var-by=shard&var-cluster=Test%20Cluster%20i3.8x%20AMI&var-dc=All&var-node=All&var-shard=All

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 10:18:09 AM3/12/19

to ScyllaDB users

Sorry, I just restarted the instance, pls try again

A question: I'm having this workflow: almost write first, then run `nodetool compact` and all reads. Does it impacting to the current performance?

Thanks

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 10:27:50 AM3/12/19

to ScyllaDB users

On Tue, Mar 12, 2019 at 3:18 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:

Sorry, I just restarted the instance, pls try again
A question: I'm having this workflow: almost write first, then run `nodetool compact` and all reads. Does it impacting to the current performance?

I don't understand what you're asking about, please elaborate.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 10:35:05 AM3/12/19

to ScyllaDB users

I meant there are 3 workloads:

1) Most write operations

2) Compaction

3) Most read operations

By following an order of 1, 2 then 3

Maybe the ram is not enough memory for caching because of the write operations used almost space on memtable? So it could be one of some impactions affect to the performance results

Btw, I'm using this schema

CREATE TABLE keyspace1.cache1 (
key text PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '2'}
AND compression = {'sstable_compression': 'LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 315360000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

Could you access to my grafana board?

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 11:36:28 AM3/12/19

to ScyllaDB users

On Tue, Mar 12, 2019 at 3:35 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:

On Tuesday, March 12, 2019 at 10:27:50 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 3:18 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Sorry, I just restarted the instance, pls try again
A question: I'm having this workflow: almost write first, then run `nodetool compact` and all reads. Does it impacting to the current performance?

I don't understand what you're asking about, please elaborate.

I meant there are 3 workloads:
1) Most write operations
2) Compaction
3) Most read operations

By following an order of 1, 2 then 3
Maybe the ram is not enough memory for caching because of the write operations used almost space on memtable? So it could be one of some impactions affect to the performance results

Memtables can use no more than half of the memory. When they are flushed, cache can grow in their place. Looking at your dashboard, cache is using 90% of memory.

I can also see that your reads miss in cache, so perhaps your dataset really doesn't fit in memory?

Is the performance not as good as you'd expect?

Btw, I'm using this schema

CREATE TABLE keyspace1.cache1 (
key text PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '2'}
AND compression = {'sstable_compression': 'LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 315360000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

Could you access to my grafana board?

Yes.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 11:53:07 AM3/12/19

to ScyllaDB users

On Tuesday, March 12, 2019 at 11:36:28 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 3:35 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:

On Tuesday, March 12, 2019 at 10:27:50 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 3:18 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Sorry, I just restarted the instance, pls try again
A question: I'm having this workflow: almost write first, then run `nodetool compact` and all reads. Does it impacting to the current performance?

I don't understand what you're asking about, please elaborate.

I meant there are 3 workloads:
1) Most write operations
2) Compaction
3) Most read operations

By following an order of 1, 2 then 3
Maybe the ram is not enough memory for caching because of the write operations used almost space on memtable? So it could be one of some impactions affect to the performance results

Memtables can use no more than half of the memory. When they are flushed, cache can grow in their place. Looking at your dashboard, cache is using 90% of memory.

I can also see that your reads miss in cache, so perhaps your dataset really doesn't fit in memory?

I'm not really sure, but the whole dataset on disk only took 150GB size but I'm using i3.8xlarge with 244GB and, the read operations just read around 39GB(not all tables) which will not place all data in memory.

Screenshot at Mar 12 23-50-59.png

Is the performance not as good as you'd expect?

Yes, it's not fast as the Cassandra one installed on c5.9x instance

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 12:19:18 PM3/12/19

to ScyllaDB users

On Tue, Mar 12, 2019 at 4:53 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:

On Tuesday, March 12, 2019 at 11:36:28 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 3:35 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:

On Tuesday, March 12, 2019 at 10:27:50 PM UTC+8, Tomasz Grabiec wrote:

On Tue, Mar 12, 2019 at 3:18 PM Phuc Nguyen <phucpngu...@gmail.com> wrote:
Sorry, I just restarted the instance, pls try again
A question: I'm having this workflow: almost write first, then run `nodetool compact` and all reads. Does it impacting to the current performance?

I don't understand what you're asking about, please elaborate.

I meant there are 3 workloads:
1) Most write operations
2) Compaction
3) Most read operations

By following an order of 1, 2 then 3
Maybe the ram is not enough memory for caching because of the write operations used almost space on memtable? So it could be one of some impactions affect to the performance results

Memtables can use no more than half of the memory. When they are flushed, cache can grow in their place. Looking at your dashboard, cache is using 90% of memory.

I can also see that your reads miss in cache, so perhaps your dataset really doesn't fit in memory?

I'm not really sure, but the whole dataset on disk only took 150GB size but I'm using i3.8xlarge with 244GB and, the read operations just read around 39GB(not all tables) which will not place all data in memory.

Is the performance not as good as you'd expect?

Yes, it's not fast as the Cassandra one installed on c5.9x instance

What kind of workload are you running (type of queries, concurrency)?

What performance did you measure on your side for Scylla, and what did you get with Cassandra?

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 12:37:02 PM3/12/19

to ScyllaDB users

The scripts using prepared-statements and run with 72 concurrencies for both Cassandra & Scylla.

Btw, I noticed that every time I re-run the script on the same amount of dataset, it seems not faster than the first run. What I'm understanding is the old row(least used) will be removed via rows/partitions eviction to get space for new rows, so it's always removing the data from the beginning of the data on cache until the last one to insert the same size of data. Isn't it?

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 12, 2019, 1:08:25 PM3/12/19

to ScyllaDB users

Looks like it. It's expected when the data set doesn't fit in cache.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 12, 2019, 1:19:44 PM3/12/19

to ScyllaDB users

I see. May I know if there is any way to reduce the size of row cache? Is it good for scale out to 3 nodes of i3.2x instead of 1 node i3.8x ?

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 14, 2019, 12:42:47 AM3/14/19

to ScyllaDB users

A question: this issue never happened on Cassandra which running on c5.9x large with 72GB RAM, maybe it has different caching strategy (Row cache vs Key cache)?

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 14, 2019, 5:51:18 AM3/14/19

to ScyllaDB users

Do you mean, make the same amount of data use less memory? There's some room for improvement there, but this requires changing the code.

Is it good for scale out to 3 nodes of i3.2x instead of 1 node i3.8x ?

Assuming RF=1?

In terms of capacity, the configuration with 1x i3.8x looks better on paper. I haven't tested it though.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 14, 2019, 8:17:50 AM3/14/19

to ScyllaDB users

Yeah, can you provide which resources I should access to?

Is it good for scale out to 3 nodes of i3.2x instead of 1 node i3.8x ?

Assuming RF=1?

Yes.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 14, 2019, 8:24:39 AM3/14/19

to ScyllaDB users

What do you mean? You want to improve caching on your own?

The code is here: https://github.com/scylladb/scylla/blob/master/row_cache.cc

Maybe you data is highly compressible, so sstables get compressed and fit in the page cache. Our in-memory cache is not compressed.

How large are your values?

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 14, 2019, 9:30:36 AM3/14/19

to ScyllaDB users

Just want to take a look for better understanding maybe, I'm not familiar with C++.

The code is here: https://github.com/scylladb/scylla/blob/master/row_cache.cc

Maybe you data is highly compressible, so sstables get compressed and fit in the page cache. Our in-memory cache is not compressed.

How large are your values?

The table only 2 columns, key & value and both of them is text data type. Those values is not too large, only the key is a hash string of rsa 256 while value is 5 chars of length only.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 14, 2019, 9:48:49 AM3/14/19

to ScyllaDB users

My current schema for this table:

CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE keyspace1.cache1 (
key text PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '2'}
AND compression = {'sstable_compression': 'LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 315360000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

Thanks

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 14, 2019, 3:19:16 PM3/14/19

to ScyllaDB users

Here's what memory_footprint gives for your partition:

mutation footprint:

- in cache: 1136

- in memtable: 891

- in sstable: 300

- frozen: 457

- canonical: 527

- query result: 339

So we need 3x the space for that data in cache than on disk.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 14, 2019, 8:30:25 PM3/14/19

to ScyllaDB users

Sorry I don't get what you mean, are you saying about the compression which the schema is using or the compression should be disabled?

Thanks

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 15, 2019, 6:59:40 AM3/15/19

to ScyllaDB users

It's with compression enabled, but measured for a single mutation, so there's not much to compress. The result with compression disabled is the same. If your data is highly compressible, the difference will be even greater.

Scylla caches data differently than Cassandra. We cache data using our object-based representation, which reflects the state of data merged from all sstables.

Cassandra reads from sstable files, relying on system's page cache for caching the files.

Since data in sstables is stored more tightly packed (at the expense of slower reads), if data is not spread among many sstables, you can fit more in memory by caching sstable files than by caching object-based representations. The memory_footprint run shows that the same data takes 3 times more space in our object-based representation than it does in sstables. This measurement doesn't account for the size of sstable indexes, so the difference will be actually less than that since reads will need to read the partition index as well.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 15, 2019, 9:28:05 PM3/15/19

to ScyllaDB users

So, if the current cannot be changed, there are no possible ways to improve this?

I just read this article: https://www.scylladb.com/2019/01/24/scylla-sstable-3-0-can-decrease-file-sizes-50-or-more/ but I guess it can't help.

Thanks,

Phuc

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 18, 2019, 9:20:48 AM3/18/19

to ScyllaDB users

If the cause for high latency in your case is indeed due to disk access, then it won't help.

I'm not sure though. Again, what's the performance you measured with Scylla and what do you get with Cassandra? Does it get better when your data set fits in memory (no disk activity)?

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 18, 2019, 10:58:51 AM3/18/19

to ScyllaDB users

Yes,

Because I wanted to switch from Cassandra to Scylla then I did some experiments on both of them.

Env:

Both Scylla(3.0.3) and Cassandra(3.11) was installed on docker environment and different instance type: i3.4x and c5.9x (because I want to reduce the cost)

Machine size:

We knew that i3.4 has 16 cores while c5.9 36 cores

What I did:

1/ Ran cassandra stress test on a default schema:

- Write throughput showed that Scylla same as Cassandra while Read slower than, but actually my production workload is read operation heavily

- Read throughput: ~298 - 315k/ops for Cassandra vs ~145 - 178k/ops Scylla

2/ Then I tried to run the data migration from Cassandra and set up the schema from Cassandra to Scylla used the same schema structure

Cassandra:

CREATE TABLE keyspace1.cache1 (
key text PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

This is Scylla adjusted version:

CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE keyspace1.cache1 (
key text PRIMARY KEY,
value text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '2'}
AND compression = {'sstable_compression': 'LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 315360000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

and the dataset total size around 250m.

The results showed that Scylla took around 1h15 - 1h24 to finish the read from my application (check if the key existing: SELECT FROM keyspace1.cache1 WHERE key IN (?,?,?)) while Cassandra only took 35 minutes (all the data compacted and already stopped the compaction at the run time).

I noticed that if I run on a smaller dataset which the memory can be fit on Scylla, the performance will be better than because of the caching.

I also tried on i3.8x which RAM is larger than i3.4, the results for the first time is 30 minutes on the full dataset (250m), the second and n times is around 13 - 14 mins. This result is my expectation except for the cost of i3.8x instance is too high, higher than the Cassandra one.

Thanks

Avi Kivity

<avi@scylladb.com>

unread,

Mar 21, 2019, 7:05:33 AM3/21/19

to scylladb-users@googlegroups.com, Tomasz Grabiec

Doesn't it depend on the cell sizes too?

Avi Kivity

<avi@scylladb.com>

unread,

Mar 21, 2019, 7:13:08 AM3/21/19

to scylladb-users@googlegroups.com, Phuc Nguyen

-

I see you have strong imbalance among the shards. What driver are you using? You should use the shard-aware driver to spread connections among the cores.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Mar 21, 2019, 7:16:20 AM3/21/19

to Avi Kivity, ScyllaDB users

I does, and I ran memory_footprint with the cell size provided by Phyc, that is "5 chars of length"

Avi Kivity

<avi@scylladb.com>

unread,

Mar 21, 2019, 7:19:17 AM3/21/19

to Tomasz Grabiec, ScyllaDB users

Yes, I later saw in a previous email.

Phuc, I saw that your writes only use one connection and so cause imbalance among the cores. You should use a shard-aware driver that will automatically balance your connections.

Also, if you intend to read all the keys, and don't care about the order, you should use a full scan. This will be a lot more efficient. See https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 23, 2019, 11:17:35 PM3/23/19

to ScyllaDB users

I'm getting the issue with the read operations, did you mean the reads only use one connection? How will I use a shard-aware driver? Is it good? https://github.com/scylladb/gocql ?

Also, if you intend to read all the keys, and don't care about the order, you should use a full scan. This will be a lot more efficient. See https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/.

I'm not sure about this because I thought I did the right way on querying statements.

Avi Kivity

<avi@scylladb.com>

unread,

Mar 24, 2019, 3:35:41 AM3/24/19

to scylladb-users@googlegroups.com, Phuc Nguyen

I didn't see reads in the monitoring link you sent, perhaps I missed the correct time period. You should check the per-shard distribution.

How will I use a shard-aware driver? Is it good? https://github.com/scylladb/gocql ?

Yes, if you use this driver you will get your connections equally distributed among shards.

Also, if you intend to read all the keys, and don't care about the order, you should use a full scan. This will be a lot more efficient. See https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/.

I'm not sure about this because I thought I did the right way on querying statements.

It depends on what you are trying to achieve. If you want to read all of the data and if you don't care about the order, a full scan is more efficient.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/671bf820-ed0c-4e2b-b66a-497e88287709%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 24, 2019, 12:20:35 PM3/24/19

to ScyllaDB users

I see, but my client application is php-driver from datastax currently. Is there any way to use the java or go driver without changing the client application?

Also, if you intend to read all the keys, and don't care about the order, you should use a full scan. This will be a lot more efficient. See https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/.

I'm not sure about this because I thought I did the right way on querying statements.

It depends on what you are trying to achieve. If you want to read all of the data and if you don't care about the order, a full scan is more efficient.

I'm using the query statement to check a key which is existing (select key, value where key IN (?,?,?)). Is it a full table scan also? (Yes, it's not ordering)

Thanks,

Avi Kivity

<avi@scylladb.com>

unread,

Mar 24, 2019, 12:38:04 PM3/24/19

to scylladb-users@googlegroups.com, Phuc Nguyen

No. The PHP driver uses the C++ driver underneath, but shard-aware support was not added to the C++ driver yet. You can try to do that yourself using https://github.com/scylladb/scylla/blob/master/docs/protocol-extensions.md.

Also, if you intend to read all the keys, and don't care about the order, you should use a full scan. This will be a lot more efficient. See https://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/.

I'm not sure about this because I thought I did the right way on querying statements.

It depends on what you are trying to achieve. If you want to read all of the data and if you don't care about the order, a full scan is more efficient.

I'm using the query statement to check a key which is existing (select key, value where key IN (?,?,?)). Is it a full table scan also? (Yes, it's not ordering)

No, this is a single-key read. A full scan is "SELECT key, value FROM table", but see the blog post about how to do that in parallel. It will be a lot faster if you are interested in reading all key/value pairs.

Thanks,

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/671bf820-ed0c-4e2b-b66a-497e88287709%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/c6d4f439-9df4-40e7-8f62-db136222ae67%40googlegroups.com.

Phuc Nguyen

<phucpnguyenphoai@gmail.com>

unread,

Mar 24, 2019, 11:17:56 PM3/24/19

to ScyllaDB users

Hi Avi,

I got it, I'm starting to start an instance again with your suggestions and will send you results/dashboard monitoring once I get the benchmark running.