[RELEASE] Scylla 5.0 Release Candidate 1 (RC1)

235 views

Skip to first unread message

Tzach Livyatan

<tzach@scylladb.com>

unread,

Feb 8, 2022, 8:42:21 AM2/8/22

to ScyllaDB users, scylladb-dev

The Scylla team is pleased to announce Scylla Open Source 5.0 RC1, the first Release Candidate for the Scylla Open Source 5.0 major release.

Scylla 5.0 introduces safe schema changes with Raft, automatic management of tombstone garbage collection, and more functional, performance and stability improvements (below).

Find the Scylla Open Source 5.0 repository for your Linux distribution here. Scylla 5.0 RC1 Docker is also available.

Use the release candidate with caution; RC1 is not production-ready yet. You can help stabilize Scylla Open Source 5.0 by reporting bugs here.

Only the last two minor releases of the Scylla Open Source project are supported. Once Scylla Open Source 5.0 is officially released, only Scylla Open Source 5.0 and Scylla 4.6 will be supported, and Scylla 4.5 will be retired.

Note that Scylla 4.6.0 is not released yet; it will be out in a few days.

Features below will be discussed in the upcoming virtual Scylla Summit 2022, Feb 9, 10.

New features

Raft (experimental)

First version with Raft base schema management.

Where schema management is DDL operations like CREATE, ALTER, DROP for, TABLE, MV etc.

Unstable schema management has been a problem in all Apache Cassandra and Scylla versions so far. See for example

Apache Cassandra: CASSANDRA-10250, CASSANDRA-10699, CASSANDRA-14957

Scylla: #2921, #7426, #8968, #9774

The root cause is the unsafe propagation of schema updates over gossip, as concurrent schema updates can lead to schema collisions.

With the implementation of the Raft consensus algorithm, Scylla can now use it to implement data and cluster level operations, starting with schema, making concurrent schema updates safe.

To enable experimental safe schema update with Raft use:

--experimental-features-raft

Updates in this release:

Scylla now sets up a "Raft Group 0", a central Raft group that will be used for topology and schema coordination. It is still not used by default.
Schema synchronization across the cluster can now be done via raft. Note it is still disconnected from the CQL statements that generate schema updates.
When Raft is enabled (in experimental mode), all schema management (i.e. CREATE TABLE / ALTER TABLE etc) will be done via Raft. This prevents incompatible schema changes from completing successfully on different nodes.

For more, see “Making Schema Changes Safe with Raft” by Konstantin Osipov in Scylla Summit 2022

Automating away the gc_grace_seconds parameter (experimental)

There is now optional automatic management of tombstone garbage collection, replacing gc_grace_seconds. Tombstones that are older than the most recent repair will be purged, and newer ones will be kept. This drops tombstones more frequently if repairs are made in a timely manner, and prevents data resurrection if repairs are delayed beyond gc_grace_seconds. The feature is disabled by default and needs to be enabled via ALTER TABLE.

For more, see “Repair Based Tombstone GC” by Asia He in Scylla Summit 2022

Virtual Table for Configuration

A new virtual table, system.config, allows querying and updating configuration over CQL.

Note that only a subset of the configuration parameter can be updated. These updates are not persistent, and will return to the scylla.yaml update after restart.

Virtual Tables for nodetool command information

Several virtual tables have been added for providing information usually obtained via nodetool:

system.snapshots - replacement for nodetool listnapshots;

system.protocol_servers - replacement for nodetool statusbinary as well as Thrift active and Native Transport active from nodetool info;

system.runtime_info - replacement for nodetool info, not an exact match: some fields were removed, some were refactored to make sense for scylla;

system.versions - replacement for nodetool version, prints all versions, including build-id;

Unlike nodetool, which is blocked for remote access by default, virtual tables allow remote access over CQL, including for Scylla Cloud users.

For all available virtual tables see here

Deployment

Debian 11 is now supported
Ubuntu 16.04 and Debian 9, both announced deprecated in Scylla 4.6 are no longer supported in Scylla 5.0
The AWS im4gn and is4gen instance families are now pre-tuned and supported out of the box.
Due to a bug in CentOS 8 mdadm, we now pin mdadm's version to a known-good release. #9540

Improvements

CQL

One can now omit irrelevant clustering key columns from ORDER BY clauses.

Alternator

Alternator, Scylla's implementation of the DynamoDB API, now supports DELETE operations that remove an element from a set. #5864
Alternator, Scylla's implementation of the DynamoDB API, has initial support for time-to-live (TTL) expiration. #9624
Alternator, Scylla's implementation of the DynamoDB API, now has fault-tolerant TTL expiration. It will delete expired items even when a node is down. #9787

Reverse Queries

A reverse query is a query SELECT that uses a reverse order compared to the one used in the table schema. If no order was defined, the default order is ascending (ASC).

Reverse Queries was improved in 4.6, and is further improved in Scylla 5.0 as follow:

Scylla honors the page size requested by the client driver, but can also return short pages to limit its memory consumption. With the older implementation of reversed queries, the ability to return shorter pages was not available for reversed queries. With native reversed queries now enabled, Scylla will also return short pages for reversed queries.
The row cache can now serve reversed queries (with query clustering order opposite from the schema definition). Previously, reversed queries automatically bypassed the row cache.

Security

There is now support for a certificate revocation list for TLS encrypted connections. This makes it possible to deny a client with a compromised certificate. #9630
The default Prometheus listener address is now localhost. Note that you may need to update this configuration item to expose the Prometheus port to your collector. #8757
The recently-changed Prometheus listen address configuration has been refined. Scylla will now bind to the same host as the internal RPC address if not configured. This will reduce misconfigurations during upgrades, as typically configuration will not require any changes. See #8757 above #9701

Performance and space improvements

Seastar has disabled Nagle's algorithm for the http server, preventing 40ms latency spikes.#9619
XFS filesystems created by scylla_setup now have online discard enabled. This can improve SSD performance. #9608
Scylla precalculates replication maps (these contain information about which replicas each token maps to). We now share replication maps among keyspaces with similar configuration, to save memory and time.
Scylla will now compact memtables when flushing them into sstables. This results in smaller sstables in delete-heavy workloads. Memtable compaction is automatically disabled when there are no relevant tombstones in the memtable. #7983
Scylla can now fast-forward when reading a partition backwards. Fast-forwarding is used to skip over unneeded data when several subranges of clustering keys are wanted, for example in the query "SELECT * FROM tab WHERE pk = ? AND ck1 IN (?, ?, ?) ORDER BY ck1 DESC, ck2 DESC". #9427
The internal cache used for (among other things) prepared statements now has pollution resistance. This means that a workload that uses unprepared statements will not interfere with a workload that properly prepares and reuses its statements. #8674 #9590
Leveled compaction strategy (LCS) will now be less aggressive in promoting sstables to higher levels, reducing write amplification.
Data in memtables is now considered when purging tombstones. While it's very unlikely to have data in memtable that is older than a tombstone in an sstable, it's still better to protect against it. #1745
When using Time Window Compaction Strategy (TWCS), Scylla will now compact tombstones across buckets, so they can remove the deleted data. Note that using DELETE with TWCS is still not recommended. #9662
Time window compaction strategy (TWCS) major compactions will now serialize behind other compactions, in order to to include all sstables in the compaction input set. #9553
Repair will now prefer closer nodes for pulling in missing data, when there is a choice. This reduces cross-datacenter traffic. PR#9769
A new I/O scheduler was integrated via a Seastar update. The new scheduler is better at restricting disk I/O in order to keep latency low.
Alternator, Scylla's implementation of the DynamoDB API, will now avoid large allocations while streaming. A similar change was made to the BatchGetItems API. This reduces latency spikes. #8522
Scylla performs a reshape compaction to bring sstables into the structure required by the compaction strategy, for example after a repair. It will now require less free space while doing so.
Reshape of Time Window Compaction Strategy (TWCS) now tries to compact sstables of similar size, like Size Tiered Compaction Strategy, to reduce reshape time. This helps reduce reshape time if one accidentally creates tiny time windows and then has to increase them dramatically to avoid an explosion in sstable counts.
Generally, if a node finds it needs to reshape sstables while starting up, it will remain offline while doing so, in order to reduce read amplification. However, for the case of repair sstables, remaining offline can be avoided and the node can start up immediately. #9895

Tooling and API

The main Scylla executable can now run subtools by supplying a subcommand: scylla sstables and scylla types, to inspect sstables and schema types. Additional commands will be added over time. #7801
The scylla tool sub-subcommands have changed from switch form ('scylla sstable --validate') to subcommand form ('scylla sstable validate').
When the user requests to stop compactions, Scylla will now only stop regular compaction, not user-request compactions like CLEANUP.
It is now possible to stop compaction on a particular set of tables. #9700
A replace operation in repair-based-node-operations mode will now ignore dead nodes, allowing the replacement of multiple dead nodes.
There is now a configuration flag to disable the new reversed reads code, in case an unexpected bug is found after release. PR#9908

Bug fixes and stability

A bug where incorrect results were returned from queries that use an index was fixed. The bug was triggered when large page sizes were used, and the primary key columns were also large, so that the page size multiplied by the key size exceeded a megabyte. #9198
Scylla implements upgrades by having nodes negotiate "features" and only enabling those features when all nodes support them. The negotiated features are now persistent to disallow some illegal downgrades.
An exception safety problem leading to a crash when inserting data to memtables was fixed. #9728
A crash in Scylla memory management code, triggered by index file caching, was fixed. The bug was caused by an allocation from within the memory allocator causing cache eviction in order to free memory. Freeing the evicted items re-enters the memory allocator, in a way that was not expected by the code. Fixes #9821 #9192 #9825 #9544 #9508 #9573
The INSERT JSON statement now correctly rejects empty string partition keys. #9853
Change Data Capture (CDC) preimage now works correctly for COMPACT STORAGE tables. #9876

Refactoring

Scylla used to have different internal representations of SELECT-clause expressions ("selectors") and WHERE-clause expressions ("terms"). They are now unified into a single expression class, paving the way to a more regular and richer CQL grammar.
The source base has been migrated to Software Package Data Exchange (SPDX) to reduce license information boilerplate. More than 27,000 lines have been removed. PR#9937