Interestingly the pending compactions in the scylla-advanced dashbaord show no pending compactions:
A restart of these particular nodes seems to solve the issue, as I can see many compactions being triggered after the restart.
I was never really concerned about this behaviour, but today I think it might have crashed a node, because too many sstables piled up. At least I saw various bad-alloc errors with it. One example:
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] storage_proxy - exception during mutation write to
10.4.43.8: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: [shard 0] sstable - failed reading index for /var/lib/scylla/data/pc/lck4-e3f1e300676c11e9b751000000000000/md-7040-big-Data.db: std::bad_alloc (std::bad_alloc)
May 20 08:33:11 lqa-2 scylla[2835]: scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed.
May 20 08:33:11 lqa-2 scylla[2835]: Aborting on shard 0.
May 20 08:33:11 lqa-2 scylla[2835]: Backtrace:
May 20 08:33:11 lqa-2 scylla[2835]: 0x3dbe2f8
May 20 08:33:11 lqa-2 scylla[2835]: 0x3def152
May 20 08:33:11 lqa-2 scylla[2835]: 0x7f55bf206a1f
May 20 08:33:11 lqa-2 scylla[2835]: /opt/scylladb/libreloc/libc.so.6+0x3d2a1
May 20 08:33:11 lqa-2 scylla[2835]: /opt/scylladb/libreloc/libc.so.6+0x268a3
May 20 08:33:11 lqa-2 scylla[2835]: /opt/scylladb/libreloc/libc.so.6+0x26788
May 20 08:33:11 lqa-2 scylla[2835]: /opt/scylladb/libreloc/libc.so.6+0x35a15
May 20 08:33:11 lqa-2 scylla[2835]: 0x1722ea7
May 20 08:33:11 lqa-2 scylla[2835]: 0x173db94
May 20 08:33:11 lqa-2 scylla[2835]: 0x173dcf1
May 20 08:33:11 lqa-2 scylla[2835]: 0x3dd0cc4
May 20 08:33:11 lqa-2 scylla[2835]: 0x3dd20b7
May 20 08:33:11 lqa-2 scylla[2835]: 0x3dd12fc
May 20 08:33:11 lqa-2 scylla[2835]: 0x3d7aaae
May 20 08:33:11 lqa-2 scylla[2835]: 0x3d79e26
May 20 08:33:11 lqa-2 scylla[2835]: 0xf41135
May 20 08:33:11 lqa-2 scylla[2835]: /opt/scylladb/libreloc/libc.so.6+0x27b74
May 20 08:33:11 lqa-2 scylla[2835]: 0xf3e2ed
Could this be a bug, causing compactions not being triggered for some reason? Are there any metrics I can check?
Currently I am observing this with 4.6.3-0.20220414.8bf149fdd.