Does Scylla respect min_index_interval and max_index_interval?

110 views
Skip to first unread message

erozycki@edoinc.com

<erozycki@edoinc.com>
unread,
Jul 15, 2018, 2:22:12 PM7/15/18
to ScyllaDB users
I am testing various settings for a table with (roughly) the following schema.

CREATE TABLE test (
    h varint PRIMARY KEY
,
    m
set<bigint>
 
) WITH COMPRESSION = {
   
'sstable_compression': 'LZ4Compressor',
   
'chunk_length_kb': '2KB'
 
} AND COMPACTION = {
   
'class' : 'LeveledCompactionStrategy'
 
} AND bloom_filter_fp_chance = 0.001
  AND min_index_interval
= ?
  AND max_index_interval
= ?;


I tried adjusting min_index_interval and max_index_interval, but as far as I can tell changing these settings had no effect. I created multiple copies of this table with (min_index_interval, max_index_interval) ranging from (32, 32) to (128, 2048) and I saw zero difference in the size of the Summary.db files (and in read performance). The documentation at http://docs.scylladb.com/cassandra-compatibility/ indicates these settings are supported.

erozycki@edoinc.com

<erozycki@edoinc.com>
unread,
Jul 15, 2018, 2:27:14 PM7/15/18
to ScyllaDB users
I am using Scylla 2.2 on Ubuntu 16.04.

Shlomi Livne

<shlomi@scylladb.com>
unread,
Jul 15, 2018, 2:55:27 PM7/15/18
to ScyllaDB users
Hi


The commit has the info 

----

commit 8726ee937da24a1d4a20bf73d98cd434b237027e
Author: Raphael S. Carvalho <raph...@scylladb.com>
Date:   Thu Aug 10 02:16:20 2017 -0300

    sstables: introduce size-based sampling for sstable summary
    
    Currently, a summary entry is added after min_index_interval index
    entries were written. Not taking into account size of index entries
    becomes a problem with large partitions which may create big index
    entries due to promoted indexes. Read performance is affected as a
    consequence because index entries spanned by summary are all read
    from disk to serve request.
    
    What we wanna do is to also add a summary entry after index reaches
    a boundary. To deal with oversampling, we want to write 1 byte to
    summary for every 2000 bytes written to data file (this will be
    eventually made into an option in the config file).
    Both conditions must be met to avoid under or oversampling.
    That way, the amount of data needed from index file to satify the
    request is drastically reduced.
    
    Fixes #1842.

---

if you are using wide partitions than see will try to optimize the summary so that scan's will be more efficient.

Shlomi



--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/9a6f19aa-ec34-4870-97f3-b2ff68e19e09%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

erozycki@edoinc.com

<erozycki@edoinc.com>
unread,
Jul 15, 2018, 6:17:00 PM7/15/18
to ScyllaDB users
Ok, thanks for the quick reply.

I read that there are plans to add some config option (maybe summary_byte_cost) that will allow users to fine-tune the sparsity of Summary.db. There is at least one potential user (me) who would appreciate this option. In my Scylla evaluation I am trying to maximize read throughput for many small partitions, and it seems that currently Scylla reads about 20KB from Index.db per SSTable accessed during a read. Reducing min_index_interval and max_index_interval does not reduce this number. I imagine that a larger Summary.db could reduce this greatly. And in my case, the extra memory usage wouldn't matter much (the row cache isn't very helpful for my very random read pattern).


On Sunday, July 15, 2018 at 11:22:12 AM UTC-7, eroz...@edoinc.com wrote:

Shlomi Livne

<shlomi@scylladb.com>
unread,
Jul 16, 2018, 3:35:35 AM7/16/18
to ScyllaDB users
what is your average partition size ?

are you using multiple rows in a partition - if so whats the size of a single row ?

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.

erozycki@edoinc.com

<erozycki@edoinc.com>
unread,
Jul 16, 2018, 12:58:54 PM7/16/18
to ScyllaDB users
Avg ~80 bytes (compressed; ~150 bytes uncompressed) per partition. Single-row partitions.

On Monday, July 16, 2018 at 12:35:35 AM UTC-7, Shlomi Livne wrote:
what is your average partition size ?

are you using multiple rows in a partition - if so whats the size of a single row ?
On Mon, Jul 16, 2018 at 1:17 AM, <eroz...@edoinc.com> wrote:
Ok, thanks for the quick reply.

I read that there are plans to add some config option (maybe summary_byte_cost) that will allow users to fine-tune the sparsity of Summary.db. There is at least one potential user (me) who would appreciate this option. In my Scylla evaluation I am trying to maximize read throughput for many small partitions, and it seems that currently Scylla reads about 20KB from Index.db per SSTable accessed during a read. Reducing min_index_interval and max_index_interval does not reduce this number. I imagine that a larger Summary.db could reduce this greatly. And in my case, the extra memory usage wouldn't matter much (the row cache isn't very helpful for my very random read pattern).


On Sunday, July 15, 2018 at 11:22:12 AM UTC-7, eroz...@edoinc.com wrote:
I am testing various settings for a table with (roughly) the following schema.

CREATE TABLE test (
    h varint PRIMARY KEY
,
    m
set<bigint>
 
) WITH COMPRESSION = {
   
'sstable_compression': 'LZ4Compressor',
   
'chunk_length_kb': '2KB'
 
} AND COMPACTION = {
   
'class' : 'LeveledCompactionStrategy'
 
} AND bloom_filter_fp_chance = 0.001
  AND min_index_interval
= ?
  AND max_index_interval
= ?;


I tried adjusting min_index_interval and max_index_interval, but as far as I can tell changing these settings had no effect. I created multiple copies of this table with (min_index_interval, max_index_interval) ranging from (32, 32) to (128, 2048) and I saw zero difference in the size of the Summary.db files (and in read performance). The documentation at http://docs.scylladb.com/cassandra-compatibility/ indicates these settings are supported.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.

Tomasz Grabiec

<tgrabiec@scylladb.com>
unread,
Jul 17, 2018, 4:17:38 AM7/17/18
to ScyllaDB users
There is a server config option called "sstable_summary_ratio" (in scylla.yaml), or --sstable-summary-ratio (cmdline param), which you can use to make summary more dense. It's a ratio of summary size to data file size. By default it's 0.0005, which means 1 byte of summary per 2KB of data file.
 

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages