How does system.size_estimates work and what can we get from it?

443 views
Skip to first unread message

Xiang Zhou

<feixiang11010@gmail.com>
unread,
Jul 21, 2021, 4:46:24 AM7/21/21
to ScyllaDB users
Hi~
If there is only one item per partition in my table, could the partitions_count obtained through system.size_estimates reflect the total number of items in the table?

What is the accuracy of this data and what factors are affected by it?

Nadav Har'El

<nyh@scylladb.com>
unread,
Jul 21, 2021, 6:21:28 AM7/21/21
to ScyllaDB users
Some things you need to remember about how this estimate works:

1. size_estimates is node-local so if for example you have N=10 nodes and RF=3, each node will contain around 1/10*3 = 30% of the partitions, so the number you get by querying one node will be 30% of the total partition count.
2. If the data is not fully repaired, when you ask one node for its count, it may include fewer or more partitions than some other node. Hopefully the error here is relatively small.
3. Even on one node and one shard, the data is split into multiple separate sstables. Scylla can't just add the number of partitions in the different sstables, as those may be overlapping (changing the same partitions) or not. So Scylla can use a "cardinality estimator", which to make a long story short, samples the different sstables in a way that we can figure out easily how much the sstables overlap to improve the estimate.

After writing the above, I went to the actual code, db/size_estimates_virtual_reader.cc, and it *seems* that the third part is missing in the code! It seems we just sum the number of keys in the separate sstable. Maybe someone else remembers why? This naive "sum" thing will be fairly accurate (up to 10%) in LCS, but can be wildly inaccurate in STCS depending on your disk space amplification (how many copies of the same partition appear in different sstables). I'll open an issue about this.

Nadav Har'El

<nyh@scylladb.com>
unread,
Jul 21, 2021, 6:31:16 AM7/21/21
to ScyllaDB users

Nadav Har'El

<nyh@scylladb.com>
unread,
Jul 21, 2021, 6:34:53 AM7/21/21
to ScyllaDB users
On Wed, Jul 21, 2021 at 1:21 PM Nadav Har'El <n...@scylladb.com> wrote:

On Wed, Jul 21, 2021 at 11:46 AM Xiang Zhou <feixia...@gmail.com> wrote:
Hi~
If there is only one item per partition in my table, could the partitions_count obtained through system.size_estimates reflect the total number of items in the table?

What is the accuracy of this data and what factors are affected by it?

Some things you need to remember about how this estimate works:

1. size_estimates is node-local so if for example you have N=10 nodes and RF=3, each node will contain around 1/10*3 = 30% of the partitions, so the number you get by querying one node will be 30% of the total partition count.
2. If the data is not fully repaired, when you ask one node for its count, it may include fewer or more partitions than some other node. Hopefully the error here is relatively small.
3. Even on one node and one shard, the data is split into multiple separate sstables. Scylla can't just add the number of partitions in the different sstables, as those may be overlapping (changing the same partitions) or not. So Scylla can use a "cardinality estimator", which to make a long story short, samples the different sstables in a way that we can figure out easily how much the sstables overlap to improve the estimate.

Oh, and another problem:
4. Even with a "cardinality estimator" I think we can't correctly handle deleted partitions, in the sense that we count them as a partition, while you probably don't want to count them.

Xiang Zhou

<feixiang11010@gmail.com>
unread,
Jul 22, 2021, 9:59:34 PM7/22/21
to ScyllaDB users
The value I obtained through system.size_estimates.partitions_count is not related to the actual number of partitions in the table. 

I don't know if I have any misunderstanding about the value of system.size_estimates.partitions_count. If there is something wrong, feel free to correct me.

Reply all
Reply to author
Forward
0 new messages