This short report brings to light some interesting commits to
scylla.git master from the last week. Commits in the
3089558f..0af2c2b1cb range are covered.
There were 111 non-merge commits from 17 authors in that period.
Some notable commits:
The currenttime() and related functions were incorrectly marked
as deterministic. This could lead to incorrect results in prepared
statements. They are now marked as non-deterministic.
If Scylla stalls while reclaiming memory, it will now log memory-related
diagnostics so it is easier to understand the root cause.
Repair now reads data with a very
long timeout (instead of infinite timeout). This is a
last-resort defence against internal deadlocks.
Multi-column (e.g. in tuple form (a, b)=(1, 2)) conditions in the
WHERE clause were incorrectly ignored when using an index, leading
to unwanted rows in the result. The unwanted rows are now filtered
out.
After adding a node, a cleanup process is run to remove data that
was copied to the new node. This is a compaction process that
compacts only one sstable at a time. This fact was used to optimize
cleanup. In addition, the check for whether a partition should be
removed during cleanup was also improved.
When Scylla starts up, it checks if all sstables conform to the
compaction strategy rules, and if not, it reshapes the data to
make it conformant. This helps keep reads fast. It is now possible
to abort
the reshape process in order to get Scylla to start more quickly.
Scylla uses reader objects to read sequential data. It caches
those readers so they can be reused across multiple pages of the
result set, eliminating the overhead of starting a new sequential
read each time. However, this optimization was missed for internal
paging used to implement aggregations (e.g. SUM(column)). Scylla
now uses
the optimization for aggregates too.
A serious bug where read-repair could skip a row if it landed
just after the end of the page was fixed.
See you in the next issue of last week in scylla.git master!