Scylla high sstable-cache-evictions?

9 views
Skip to first unread message

hor...@gmail.com

unread,
Sep 15, 2021, 11:49:58 AMSep 15
to ScyllaDB users
Hi,

we are having one scylla cluster that is having performance issues.

While going through the logs/metrics a few questions came up:

1) My initial guess was that we have lots of tombstones somewhere, but I wasn't able to find any evidence for this. It would be nice if scylla would offer a scylla_sstables_cell_tombstone_reads metric, which it does not seem to have. Wouldn't such a metric make sense?


2) Also there is a big difference in those two metics:
scylla_sstables_cell_tombstone_writes{shard="0"} 29650471.0
scylla_sstables_cell_tombstone_writes{shard="1"} 30915762.0
scylla_sstables_tombstone_writes{shard="0"} 1713.0
scylla_sstables_tombstone_writes{shard="1"} 3792.0
How do they differ and why does their value differ so dramatically?


3) The server has plenty of memory to cache entries, but still I am seeing high values for scylla_sstables_index_page_cache_evictions  and scylla_sstables_pi_cache_evictions in this cluster. Might these metrics be relevant? I do not understand how the page cache fits into scyllas architecture? Is this something that would be improved by  #7079?


4) I was seeing such a log entry a few times yesterday (when the situation was really bad):

Sep 14 19:15:29 llpL1221 scylla[3834435]:  [shard 0] reader_concurrency_semaphore - (rate limiting dropped 446 similar messages) Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
                                        Permits with state active
                                                     memory        count        name
                                                     0B        1        pcc.lock4:multishard-mutation-query
                                                     0B        1        pcc.outintconfigs:multishard-mutation-query
                                                     33K        2        pcc.qstates2:data-query
                                                     82K        4        pcc.outintstates:data-query
                                                     128K        4        pcc.outintconfigs:data-query
                                                     167K        6        pcc.lock4:data-query
                                                     613K        5        pcc.tcounters:counter-read-before-write
                                                     1243K        15        pcc.scripts:data-query
                                                     3170K        19        pcc.tindex2:data-query
                                                     8M        49        pcc.tevents2:data-query
                                                     
                                                     14M        106        total
                                                     
                                                     Permits with state waiting
                                                     memory        count        name
                                                     0B        2        pcc.lock4:shard-reader
                                                     0B        2        pcc.jobq2:shard-reader
                                                     0B        1        pcc.outintconfigs:shard-reader
                                                     0B        2        pcc.customworkerconfigs:shard-reader
                                                     0B        3        pcc.qstates2:data-query
                                                     289B        1        pcc.outintstates:data-query
                                                     295B        1        pcc.cwrktates:data-query
                                                     855B        34        pcc.tindex2:data-query
                                                     2K        5        pcc.lock4:data-query
                                                     2K        38        pcc.tcounters:counter-read-before-write
                                                     14K        6        pcc.outintconfigs:data-query
                                                     46K        37        pcc.tevents2:data-query
                                                     675K        28        pcc.scripts:data-query
                                                     
                                                     738K        160        total
                                                     
                                                     Total: permits: 266, memory: 14M

Does this mean there are too many concurrent requests being processed and its dumping whats active and what is pending? Is it possible the multishard-queries are a problem? These tables are very small and are continuesly read for changes in the background. But I think scylla does not like full table scans. Would it perhaps be better to avoid single full table scans and read them token-range by token-range?

regards,
Christian

Avi Kivity

unread,
Sep 19, 2021, 4:56:53 AMSep 19
to scyllad...@googlegroups.com, hor...@gmail.com


On 15/09/2021 18.49, hor...@gmail.com wrote:
Hi,

we are having one scylla cluster that is having performance issues.

While going through the logs/metrics a few questions came up:

1) My initial guess was that we have lots of tombstones somewhere, but I wasn't able to find any evidence for this. It would be nice if scylla would offer a scylla_sstables_cell_tombstone_reads metric, which it does not seem to have. Wouldn't such a metric make sense?


It does, and such metrics were added. It is not yet in any release, though.


https://github.com/scylladb/scylla/commit/5f9695c1b2046afb73af1b863ad3f5727bd3c204



2) Also there is a big difference in those two metics:
scylla_sstables_cell_tombstone_writes{shard="0"} 29650471.0
scylla_sstables_cell_tombstone_writes{shard="1"} 30915762.0
scylla_sstables_tombstone_writes{shard="0"} 1713.0
scylla_sstables_tombstone_writes{shard="1"} 3792.0
How do they differ and why does their value differ so dramatically?


cell tombstone writes -> deletes of a single cell (it's possible that a TTLed cell also increments this metric when it expires but not yet purged).

tombstone_writes -> deletions of an entire partition.



3) The server has plenty of memory to cache entries, but still I am seeing high values for scylla_sstables_index_page_cache_evictions  and scylla_sstables_pi_cache_evictions in this cluster. Might these metrics be relevant? I do not understand how the page cache fits into scyllas architecture? Is this something that would be improved by  #7079?


Index cache entries become obsolete when an sstable is compacted (unlike the row cache). It could be something else, but that is the most likely explanation.

It is dumping those stats so we can tell if there is a memory problem or not. From these dumps, it appears there are no problems. The concurrency is high but entirely reasonable. Memory usage is reasaonable too (how much memory do you have per shard? Even in the worst case of 1GB/shard it's just 1.4% of shard memory).


Probably the disk is the bottleneck, check Query I/O Queue Delay in the Advanced dashboard.


Is it possible the multishard-queries are a problem? These tables are very small and are continuesly read for changes in the background. But I think scylla does not like full table scans. Would it perhaps be better to avoid single full table scans and read them token-range by token-range?


Full scans are fine, there is no need to break them apart. If you have just a single client issuing them, they should have little effect on the cluster unless they encounter a run of tombstones. You can try such a scan from cqlsh with tracing on to see what's going on.


An alternative to repeated scan is to use Change Data Capture to watch for changes. But again, those full scans are fine. We do plan to make runs of tombstones work better with Scylla.


regards,
Christian
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/cac6b829-3218-4d46-b679-b596881288d3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages