Does Scylla respect min_index_interval and max_index

erozycki@edoinc.com

<erozycki@edoinc.com>

unread,

Jul 15, 2018, 2:22:12 PM7/15/18

to ScyllaDB users

I am testing various settings for a table with (roughly) the following schema.

CREATE TABLE test (
    h varint PRIMARY KEY,
    m set<bigint>
  ) WITH COMPRESSION = {
    'sstable_compression': 'LZ4Compressor',
    'chunk_length_kb': '2KB'
  } AND COMPACTION = {
    'class' : 'LeveledCompactionStrategy'
  } AND bloom_filter_fp_chance = 0.001
  AND min_index_interval = ?
  AND max_index_interval = ?;

I tried adjusting min_index_interval and max_index_interval, but as far as I can tell changing these settings had no effect. I created multiple copies of this table with (min_index_interval, max_index_interval) ranging from (32, 32) to (128, 2048) and I saw zero difference in the size of the Summary.db files (and in read performance). The documentation at http://docs.scylladb.com/cassandra-compatibility/ indicates these settings are supported.

erozycki@edoinc.com

<erozycki@edoinc.com>

unread,

Jul 15, 2018, 2:27:14 PM7/15/18

to ScyllaDB users

I am using Scylla 2.2 on Ubuntu 16.04.

Shlomi Livne

<shlomi@scylladb.com>

unread,

Jul 15, 2018, 2:55:27 PM7/15/18

to ScyllaDB users

Hi

2.2 includes Size-based sampling rate in sstable summary files

The commit has the info

----

commit 8726ee937da24a1d4a20bf73d98cd434b237027e

Author: Raphael S. Carvalho <raph...@scylladb.com>

Date: Thu Aug 10 02:16:20 2017 -0300

sstables: introduce size-based sampling for sstable summary

Currently, a summary entry is added after min_index_interval index

entries were written. Not taking into account size of index entries

becomes a problem with large partitions which may create big index

entries due to promoted indexes. Read performance is affected as a

consequence because index entries spanned by summary are all read

from disk to serve request.

What we wanna do is to also add a summary entry after index reaches

a boundary. To deal with oversampling, we want to write 1 byte to

summary for every 2000 bytes written to data file (this will be

eventually made into an option in the config file).

Both conditions must be met to avoid under or oversampling.

That way, the amount of data needed from index file to satify the

request is drastically reduced.

Fixes #1842.

---

if you are using wide partitions than see will try to optimize the summary so that scan's will be more efficient.

Shlomi

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/9a6f19aa-ec34-4870-97f3-b2ff68e19e09%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

erozycki@edoinc.com

<erozycki@edoinc.com>

unread,

Jul 15, 2018, 6:17:00 PM7/15/18

to ScyllaDB users

Ok, thanks for the quick reply.

I read that there are plans to add some config option (maybe summary_byte_cost) that will allow users to fine-tune the sparsity of Summary.db. There is at least one potential user (me) who would appreciate this option. In my Scylla evaluation I am trying to maximize read throughput for many small partitions, and it seems that currently Scylla reads about 20KB from Index.db per SSTable accessed during a read. Reducing min_index_interval and max_index_interval does not reduce this number. I imagine that a larger Summary.db could reduce this greatly. And in my case, the extra memory usage wouldn't matter much (the row cache isn't very helpful for my very random read pattern).

On Sunday, July 15, 2018 at 11:22:12 AM UTC-7, eroz...@edoinc.com wrote:

Shlomi Livne

<shlomi@scylladb.com>

unread,

Jul 16, 2018, 3:35:35 AM7/16/18

to ScyllaDB users

what is your average partition size ?

are you using multiple rows in a partition - if so whats the size of a single row ?

--

You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.

To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/166d0140-1abb-4342-b279-333ff4cbdf3f%40googlegroups.com.

erozycki@edoinc.com

<erozycki@edoinc.com>

unread,

Jul 16, 2018, 12:58:54 PM7/16/18

to ScyllaDB users

Avg ~80 bytes (compressed; ~150 bytes uncompressed) per partition. Single-row partitions.

On Monday, July 16, 2018 at 12:35:35 AM UTC-7, Shlomi Livne wrote:

what is your average partition size ?

are you using multiple rows in a partition - if so whats the size of a single row ?

On Mon, Jul 16, 2018 at 1:17 AM, <eroz...@edoinc.com> wrote:

Ok, thanks for the quick reply.

I read that there are plans to add some config option (maybe summary_byte_cost) that will allow users to fine-tune the sparsity of Summary.db. There is at least one potential user (me) who would appreciate this option. In my Scylla evaluation I am trying to maximize read throughput for many small partitions, and it seems that currently Scylla reads about 20KB from Index.db per SSTable accessed during a read. Reducing min_index_interval and max_index_interval does not reduce this number. I imagine that a larger Summary.db could reduce this greatly. And in my case, the extra memory usage wouldn't matter much (the row cache isn't very helpful for my very random read pattern).

On Sunday, July 15, 2018 at 11:22:12 AM UTC-7, eroz...@edoinc.com wrote:
I am testing various settings for a table with (roughly) the following schema.

CREATE TABLE test ( h varint PRIMARY KEY, m set<bigint> ) WITH COMPRESSION = { 'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': '2KB' } AND COMPACTION = { 'class' : 'LeveledCompactionStrategy' } AND bloom_filter_fp_chance = 0.001 AND min_index_interval = ? AND max_index_interval = ?;

I tried adjusting min_index_interval and max_index_interval, but as far as I can tell changing these settings had no effect. I created multiple copies of this table with (min_index_interval, max_index_interval) ranging from (32, 32) to (128, 2048) and I saw zero difference in the size of the Summary.db files (and in read performance). The documentation at http://docs.scylladb.com/cassandra-compatibility/ indicates these settings are supported.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.

Visit this group at https://groups.google.com/group/scylladb-users.

Tomasz Grabiec

<tgrabiec@scylladb.com>

unread,

Jul 17, 2018, 4:17:38 AM7/17/18

to ScyllaDB users

There is a server config option called "sstable_summary_ratio" (in scylla.yaml), or --sstable-summary-ratio (cmdline param), which you can use to make summary more dense. It's a ratio of summary size to data file size. By default it's 0.0005, which means 1 byte of summary per 2KB of data file.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.

Visit this group at https://groups.google.com/group/scylladb-users.

Reply all

Reply to author

Forward

Does Scylla respect min_index_interval and max_index_interval?

erozycki@edoinc.com

erozycki@edoinc.com

Shlomi Livne

erozycki@edoinc.com

Shlomi Livne

erozycki@edoinc.com

Tomasz Grabiec