Large index files normal?

4 views
Skip to first unread message

hor...@gmail.com

<horschi@gmail.com>
unread,
Aug 4, 2022, 8:01:17 AM8/4/22
to ScyllaDB users
Hi,

I have a certain table that has Index files that are as large as the data files. I assume it is because of many very small partitions.

Is it normal that index files can be as big as the data files? It seems a bit extreme to me and I wonder if this can be tweaked?

regards,
Christian


Files:
-rw-r--r--  1 scylla scylla  83M Aug  2 12:17 md-127125-big-CompressionInfo.db
-rw-r--r--  1 scylla scylla  11G Aug  2 12:17 md-127125-big-Data.db
-rw-r--r--  1 scylla scylla   10 Aug  2 12:17 md-127125-big-Digest.crc32
-rw-r--r--  1 scylla scylla 288M Aug  2 12:17 md-127125-big-Filter.db
-rw-r--r--  1 scylla scylla  10G Aug  2 12:17 md-127125-big-Index.db
-rw-r--r--  1 scylla scylla  89K Aug  2 12:17 md-127125-big-Scylla.db
-rw-r--r--  1 scylla scylla 6.4K Aug  2 12:17 md-127125-big-Statistics.db
-rw-r--r--  1 scylla scylla 5.9M Aug  2 12:17 md-127125-big-Summary.db
-rw-r--r--  1 scylla scylla  102 Aug  2 11:41 md-127125-big-TOC.txt

-rw-r--r--  1 scylla scylla  83M Aug  2 12:23 md-127154-big-CompressionInfo.db
-rw-r--r--  1 scylla scylla  11G Aug  2 12:22 md-127154-big-Data.db
-rw-r--r--  1 scylla scylla   10 Aug  2 12:23 md-127154-big-Digest.crc32
-rw-r--r--  1 scylla scylla 288M Aug  2 12:23 md-127154-big-Filter.db
-rw-r--r--  1 scylla scylla  10G Aug  2 12:22 md-127154-big-Index.db
-rw-r--r--  1 scylla scylla  89K Aug  2 12:23 md-127154-big-Scylla.db
-rw-r--r--  1 scylla scylla 6.4K Aug  2 12:23 md-127154-big-Statistics.db
-rw-r--r--  1 scylla scylla 5.9M Aug  2 12:23 md-127154-big-Summary.db
-rw-r--r--  1 scylla scylla  102 Aug  2 11:49 md-127154-big-TOC.txt
...

nodetool tablestats reports:
        SSTable count: 20
        SSTables in each level: [20/4]
        Space used (live): 91355164672
        Space used (total): 91355164672
        Space used by snapshots (total): 0
        Off heap memory used (total): 1295388978
        SSTable Compression Ratio: 0.400782
        Number of partitions (estimate): 912987672
        Memtable cell count: 62605
        Memtable data size: 39944963
        Memtable off heap memory used: 44433408
        Memtable switch count: 12
        Local read count: 385113
        Local read latency: 7.519 ms
        Local write count: 426750
        Local write latency: 0.007 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 8329
        Bloom filter false ratio: 0.03632
        Bloom filter space used: 1207102280
        Bloom filter off heap memory used: 1207238520
        Index summary off heap memory used: 43717050
        Compression metadata off heap memory used: 0
        Compacted partition minimum bytes: 61
        Compacted partition maximum bytes: 7007506
        Compacted partition mean bytes: 104
        Average live cells per slice (last five minutes): 0.0
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 0

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 4, 2022, 9:51:48 AM8/4/22
to Christian, Raphael S.Carvalho, scylladb-users@googlegroups.com
On Thu, 2022-08-04 at 05:01 -0700, hor...@gmail.com wrote:
> Hi,
>
> I have a certain table that has Index files that are as large as the data files. I assume it is because of many very small partitions.
>
> Is it normal that index files can be as big as the data files?

Very small partitions should not generate promoted index entries at all
if they are smaller that the threshold (64KB by default IIRC),
so maybe they are small but still above the minimum, and possibly
with long clustering keys that take the most part of the storage space?


> It seems a bit extreme to me and I wonder if this can be tweaked?

There's column_index_size_in_kb in scylla.yaml that could increased if that's indeed the issue.

Can you share with us one of those sstables (including all components and the table schema)
so Raphael can analyze it?

Thanks,

Benny
> --
> You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/229b738c-5612-4891-826e-bf1b4dbec0d3n%40googlegroups.com.

Avi Kivity

<avi@scylladb.com>
unread,
Aug 4, 2022, 9:56:30 AM8/4/22
to scylladb-users@googlegroups.com, hor...@gmail.com


On 04/08/2022 15.01, hor...@gmail.com wrote:
Hi,

I have a certain table that has Index files that are as large as the data files. I assume it is because of many very small partitions.


It can happen when most of the partition's content is the partition key. The fact that Data.db is compressed and Index.db isn't also helps.



Is it normal that index files can be as big as the data files? It seems a bit extreme to me and I wonder if this can be tweaked?


It can happen. Can you share the schema, and typical column sizes?


It's a weakness of the file format that insists that every partition have an index entry. Perhaps it can be relaxed so that we have an index entry per few hundred bytes, and search linearly to find the partition.


horschi

<horschi@gmail.com>
unread,
Aug 4, 2022, 10:21:11 AM8/4/22
to Avi Kivity, scylladb-users@googlegroups.com
Indeed most of the data is part of the key! Very often there is also just one row in a partition.

CREATE TABLE ...
(
  aid INT,
  iid TINYINT,
  data BLOB,
  bucket INT,
  off BIGINT,  // this can be record-time if RT index
  uid INT,
  ou SMALLINT,
  eventTime BIGINT, // only written for recordTime indexes
  PRIMARY KEY ((aid, iid, data, bucket), off, uid, ou)
)

Is that something that could be optimized? Should I share a sstable file as Benny asked?

regards,
Chrsitian

Avi Kivity

<avi@scylladb.com>
unread,
Aug 4, 2022, 10:25:03 AM8/4/22
to horschi, scylladb-users@googlegroups.com

If `data` is large and can be moved off the partition key, I recommend it.


Otherwise, I don't see what we can do. The system is behaving as designed, even if it is suboptimal for this use case.

Reply all
Reply to author
Forward
0 new messages