Cc: "Raphael S. Carvalho" <raph...@scylladb.com>
Ref: https://github.com/scylladb/scylladb/issues/18191
Signed-off-by: Laszlo Ersek <laszlo...@scylladb.com>
---
Notes:
v2:
- rather than unnesting the future chain, rewrite it as a coroutine
[Avi]
sstables/sstables.cc | 26 +++++++++-----------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/sstables/sstables.cc b/sstables/sstables.cc
index 2fbe81946845..d265586af7b4 100644
--- a/sstables/sstables.cc
+++ b/sstables/sstables.cc
@@ -1592,20 +1592,18 @@ future<> sstable::load(const dht::sharder& sharder, sstable_open_config cfg) noe
future<> sstable::load(sstables::foreign_sstable_open_info info) noexcept {
static_assert(std::is_nothrow_move_constructible_v<sstables::foreign_sstable_open_info>);
- return read_toc().then([this, info = std::move(info)] () mutable {
- _components = std::move(info.components);
- _data_file = make_checked_file(_read_error_handler, info.data.to_file());
- _index_file = make_checked_file(_read_error_handler, info.index.to_file());
- _shards = std::move(info.owners);
- _metadata_size_on_disk = info.metadata_size_on_disk;
- validate_min_max_metadata();
- validate_max_local_deletion_time();
- validate_partitioner();
- return update_info_for_opened_data().then([this]() {
- _total_reclaimable_memory.reset();
- _manager.increment_total_reclaimable_memory_and_maybe_reclaim(this);
- });
- });
+ co_await read_toc();
+ _components = std::move(info.components);
+ _data_file = make_checked_file(_read_error_handler, info.data.to_file());
+ _index_file = make_checked_file(_read_error_handler, info.index.to_file());
+ _shards = std::move(info.owners);
+ _metadata_size_on_disk = info.metadata_size_on_disk;
+ validate_min_max_metadata();
+ validate_max_local_deletion_time();
+ validate_partitioner();
+ co_await update_info_for_opened_data();
+ _total_reclaimable_memory.reset();
+ _manager.increment_total_reclaimable_memory_and_maybe_reclaim(this);
}
future<foreign_sstable_open_info> sstable::get_open_info() & {
New feature-let, shouldn't be backported.
See the v3 updates in the Notes section of each patch.
Example:
- run scylla with
build/Dev/scylla --overprovisioned --developer-mode=yes --memory=2G --smp=3
- after scylla settles:
find $SCYLLA_HOME/data/ -type f -printf '%s\n' \
| (TOTAL=0; while read SIZE; do TOTAL=$((TOTAL+SIZE)); done; echo $TOTAL)
905720
- output from <http://localhost:9180/metrics>:
# HELP scylla_storage_bytes_occupied Bytes occupied by sstables
# TYPE scylla_storage_bytes_occupied gauge
scylla_storage_bytes_occupied{shard="0",storage_class="filesystem"} 520570.000000
scylla_storage_bytes_occupied{shard="0",storage_class="s3"} 0.000000
scylla_storage_bytes_occupied{shard="1",storage_class="filesystem"} 167833.000000
scylla_storage_bytes_occupied{shard="1",storage_class="s3"} 0.000000
scylla_storage_bytes_occupied{shard="2",storage_class="filesystem"} 217317.000000
scylla_storage_bytes_occupied{shard="2",storage_class="s3"} 0.000000
- 905720 = 520570 + 167833 + 217317
Cc: "Raphael S. Carvalho" <raph...@scylladb.com>
Thanks
Laszlo
Laszlo Ersek (3):
sstables/storage: add funcs to track used space per storage type per
shard
sstables: coroutinize sstable::load()
sstables: track used space per storage type per shard
main.cc | 1 +
sstables/sstables.cc | 52 ++++++++++++++------
sstables/sstables.hh | 2 +
sstables/storage.cc | 25 +++++++++-
sstables/storage.hh | 11 +++++
5 files changed, 75 insertions(+), 16 deletions(-)
base-commit: 61d19e4464c2d36dcebb7c50aec95fa863bfb207
- discovering an existent sstable (potentially deleting it in the end):
sstable ctor -> load -> close_files -> [unlink]
- creating a new sstable (potentially deleting it in the end):
sstable ctor -> open_sstable -> seal_sstable -> close_files -> [unlink]
On the sstable "read path", credit the storage in load() and debit it in
unlink(). On the "write path", credit the storage in seal_sstable() and
debit it in unlink().
Define the current storage usage of an sstable on this shard as:
- zero, if the sstable is shared, or unshared but foreign,
- as the sum of its recognized components' sizes, otherwise.
In the latter case, "_recognized_components" is populated
- on the read path
- in the 1st load variant via: load -> load_metadata -> read_toc,
- in the 2nd load variant via: load -> read_toc,
- on the write path via: open_sstable -> generate_toc.
I can't use the existent bytes_on_disk() function for this purpose.
Namely, since at least commit 2dbae856f84e ("sstable: Piggyback on sstable
parser and writer to provide bytes_on_disk", 2023-04-27), bytes_on_disk()
insists on "_data_file_size" and "_index_file_size" having been set (to
nonzero values). However, at least "_data_file_size" may not be set when
we call bytes_on_disk() from seal_sstable(), for example.
Instead, stat each recognized component afresh whenever the sst's storage
footprint is requested for updating the metric.
The loop in the new bytes_on_disk_uncached() function is similar to the
loop removed in commit 2dbae856f84e. One difference is that the new one is
not specific to "filesystem_storage", i.e., the reactor's file_stat()
function. Instead, open each component using the sstable's appropriate
storage backend, and then call the storage-specific file::stat() member
for fetching the size. For S3, this appears to be satisfied by
s3::client::readable_file::stat().
Cc: "Raphael S. Carvalho" <raph...@scylladb.com>
Ref: https://github.com/scylladb/scylladb/issues/18191
Signed-off-by: Laszlo Ersek <laszlo...@scylladb.com>
---
Notes:
v3:
- track used space per storage type per shard [Avi]
- rename bytes_on_disk_uncached() to bytes_on_storage_uncached()
- make bytes_on_storage_uncached() shard-aware: return zero bytes for
shared and foreign sstables
- refresh commit message
v2:
- replace "co_await ... finally" with "co_await with_closeable" [Avi]
sstables/sstables.hh | 2 ++
sstables/sstables.cc | 26 ++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/sstables/sstables.hh b/sstables/sstables.hh
index 0b18acd2b637..fb8824eabef9 100644
--- a/sstables/sstables.hh
+++ b/sstables/sstables.hh
@@ -710,6 +710,8 @@ class sstable : public enable_lw_shared_from_this<sstable> {
// Reload components from which memory was previously reclaimed
future<> reload_reclaimed_components();
+ future<uint64_t> bytes_on_storage_uncached() const noexcept;
+
public:
// Finds first position_in_partition in a given partition.
// If reversed is false, then the first position is actually the first row (can be the static one).
diff --git a/sstables/sstables.cc b/sstables/sstables.cc
index d265586af7b4..f404d66b3f6b 100644
--- a/sstables/sstables.cc
+++ b/sstables/sstables.cc
@@ -1559,6 +1559,27 @@ future<> sstable::reload_reclaimed_components() {
sstlog.info("Reloaded bloom filter of {}", get_filename());
}
+future<uint64_t> sstable::bytes_on_storage_uncached() const noexcept {
+ uint64_t bytes = 0;
+
+ if (_shards.size() != 1 || _shards[0] != this_shard_id()) {
+ co_return bytes;
+ }
+
+ co_await coroutine::parallel_for_each(_recognized_components, [this, &bytes] (const component_type& type) -> future<> {
+ try {
+ file f = co_await _storage->open_component(*this, type, open_flags::ro, file_open_options{}, false);
+ struct stat st = co_await with_closeable(std::move(f), [] (file& f) {
+ return f.stat();
+ });
+ bytes += st.st_size;
+ } catch (...) {
+ sstlog.warn("failed to get size of {} for storage usage metric: {}", filename(type), std::current_exception());
+ }
+ });
+ co_return bytes;
+}
+
future<> sstable::load_metadata(sstable_open_config cfg, bool validate) noexcept {
co_await read_toc();
// read scylla-meta after toc. Might need it to parse
@@ -1588,6 +1609,7 @@ future<> sstable::load(const dht::sharder& sharder, sstable_open_config cfg) noe
std::vector<unsigned>{this_shard_id()} : compute_shards_for_this_sstable(sharder);
}
co_await open_data(cfg);
+ _storage->credit_bytes(co_await bytes_on_storage_uncached());
}
future<> sstable::load(sstables::foreign_sstable_open_info info) noexcept {
@@ -1602,6 +1624,7 @@ future<> sstable::load(sstables::foreign_sstable_open_info info) noexcept {
validate_max_local_deletion_time();
validate_partitioner();
co_await update_info_for_opened_data();
+ _storage->credit_bytes(co_await bytes_on_storage_uncached());
_total_reclaimable_memory.reset();
_manager.increment_total_reclaimable_memory_and_maybe_reclaim(this);
}
@@ -1902,6 +1925,7 @@ bool sstable::may_contain_rows(const query::clustering_row_ranges& ranges) const
future<> sstable::seal_sstable(bool backup)
{
co_await _storage->seal(*this);
+ _storage->credit_bytes(co_await bytes_on_storage_uncached());
if (_marked_for_deletion == mark_for_deletion::implicit) {
_marked_for_deletion = mark_for_deletion::none;
}
@@ -3053,6 +3077,7 @@ future<>
sstable::unlink(storage::sync_dir sync) noexcept {
_on_delete(*this);
+ uint64_t bytes = co_await bytes_on_storage_uncached();
auto remove_fut = _storage->wipe(*this, sync);
try {
@@ -3066,6 +3091,7 @@ sstable::unlink(storage::sync_dir sync) noexcept {
co_await std::move(remove_fut);
_stats.on_delete();
+ _storage->debit_bytes(bytes);
_manager.on_unlink(this);
}