Hi,
we are having one scylla cluster that is having performance issues.
While going through the logs/metrics a few questions came up:
1) My initial guess was that we have lots of tombstones somewhere, but I wasn't able to find any evidence for this. It would be nice if scylla would offer a scylla_sstables_cell_tombstone_reads metric, which it does not seem to have. Wouldn't such a metric make sense?
It does, and such metrics were added. It is not yet in any
release, though.
https://github.com/scylladb/scylla/commit/5f9695c1b2046afb73af1b863ad3f5727bd3c204
2) Also there is a big difference in those two metics:scylla_sstables_cell_tombstone_writes{shard="0"} 29650471.0scylla_sstables_cell_tombstone_writes{shard="1"} 30915762.0scylla_sstables_tombstone_writes{shard="0"} 1713.0scylla_sstables_tombstone_writes{shard="1"} 3792.0How do they differ and why does their value differ so dramatically?
cell tombstone writes -> deletes of a single cell (it's possible that a TTLed cell also increments this metric when it expires but not yet purged).
tombstone_writes -> deletions of an entire partition.
3) The server has plenty of memory to cache entries, but still I am seeing high values for scylla_sstables_index_page_cache_evictions and scylla_sstables_pi_cache_evictions in this cluster. Might these metrics be relevant? I do not understand how the page cache fits into scyllas architecture? Is this something that would be improved by #7079?
Index cache entries become obsolete when an sstable is compacted
(unlike the row cache). It could be something else, but that is
the most likely explanation.
It is dumping those stats so we can tell if there is a memory
problem or not. From these dumps, it appears there are no
problems. The concurrency is high but entirely reasonable. Memory
usage is reasaonable too (how much memory do you have per shard?
Even in the worst case of 1GB/shard it's just 1.4% of shard
memory).
Probably the disk is the bottleneck, check Query I/O Queue Delay
in the Advanced dashboard.
Is it possible the multishard-queries are a problem? These tables are very small and are continuesly read for changes in the background. But I think scylla does not like full table scans. Would it perhaps be better to avoid single full table scans and read them token-range by token-range?
Full scans are fine, there is no need to break them apart. If you
have just a single client issuing them, they should have little
effect on the cluster unless they encounter a run of tombstones.
You can try such a scan from cqlsh with tracing on to see what's
going on.
An alternative to repeated scan is to use Change Data Capture to
watch for changes. But again, those full scans are fine. We do
plan to make runs of tombstones work better with Scylla.
regards,Christian--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/cac6b829-3218-4d46-b679-b596881288d3n%40googlegroups.com.