[PATCH 0/2] Change commitlog maximum size

152 views
Skip to first unread message

Glauber Costa

<glauber@scylladb.com>
unread,
Nov 13, 2015, 2:49:12 PM11/13/15
to scylladb-dev@googlegroups.com, Glauber Costa
Recently, while benchmarking on EC2, I saw a lot of flush activity that could
be traced back to the fact that the commitlog was triggering memtable flushes a
lot less often than the disk could handle.

While eventually I came to realize that it was only happening because my
instance had a crappy disk, horrible disks are a part of life. Even with good
disks, there is no reason to generate all that activity for no reason.

This patchset keeps that behavior configured, and by the same option that C* used.
However, it introduces the value "-1", meaning "all memory".

We won't use "0", because that already means "disable entirely", and we still
want a way to cap that somewhere to avoid having the commitlog growing to
infinity in workloads that keep writing to the same set of keys repeatedly.

Glauber Costa (2):
config: change type for commitlog maximum size config option
change defaults for commitlog maximum size

db/config.hh | 2 +-
db/commitlog/commitlog.cc | 3 ++-
conf/scylla.yaml | 9 +++++----
3 files changed, 8 insertions(+), 6 deletions(-)

--
2.4.3

Glauber Costa

<glauber@scylladb.com>
unread,
Nov 13, 2015, 2:49:13 PM11/13/15
to scylladb-dev@googlegroups.com, Glauber Costa
This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb. Moving to 64 is not strictly needed, since even a
signed 32-bit type would allow us to easily handle 2TB. But since we store that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers. With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of the
JVM. We don't need that, and in many-shards configuration, each shard flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we would
like to equate that (at least) with the amount of memory. The problem is how to
do that without introducing new options or changing the semantics of existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glo...@scylladb.com>
---
db/config.hh | 2 +-
db/commitlog/commitlog.cc | 3 ++-
conf/scylla.yaml | 3 +++
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/db/config.hh b/db/config.hh
index a30343c..c153dec 100644
--- a/db/config.hh
+++ b/db/config.hh
@@ -316,7 +316,7 @@ class config {
val(commitlog_sync_batch_window_in_ms, uint32_t, 10000, Used, \
"Controls how long the system waits for other writes before performing a sync in \"batch\" mode." \
) \
- val(commitlog_total_space_in_mb, uint32_t, 8192, Used, \
+ val(commitlog_total_space_in_mb, int64_t, 8192, Used, \
"Total space used for commitlogs. If the used space goes above this value, Cassandra rounds up to the next nearest segment multiple and flushes memtables to disk for the oldest commitlog segments, removing those log segments. This reduces the amount of data to replay on startup, and prevents infrequently-updated tables from indefinitely keeping commitlog segments. A small total commitlog space tends to cause more flush activity on less-active tables.\n" \
"Related information: Configuring memtable throughput" \
) \
diff --git a/db/commitlog/commitlog.cc b/db/commitlog/commitlog.cc
index a0fea10..c9a316e 100644
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -55,6 +55,7 @@
#include <core/rwlock.hh>
#include <core/gate.hh>
#include <core/fstream.hh>
+#include <seastar/core/memory.hh>
#include <net/byteorder.hh>

#include "commitlog.hh"
@@ -89,7 +90,7 @@ class crc32_nbo {

db::commitlog::config::config(const db::config& cfg)
: commit_log_location(cfg.commitlog_directory())
- , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb())
+ , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb() >= 0 ? cfg.commitlog_total_space_in_mb() : memory::stats().total_memory())
, commitlog_segment_size_in_mb(cfg.commitlog_segment_size_in_mb())
, commitlog_sync_period_in_ms(cfg.commitlog_sync_batch_window_in_ms())
, mode(cfg.commitlog_sync() == "batch" ? sync_mode::BATCH : sync_mode::PERIODIC)
diff --git a/conf/scylla.yaml b/conf/scylla.yaml
index 87672df..cef21a4 100644
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -417,6 +417,9 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# segment multiple), Scylla will flush every dirty CF in the oldest
# segment and remove it. So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
+#
+# A value of -1 will automatically equate it to the total amount of memory
+# available for Scylla.
commitlog_total_space_in_mb: 8192

# This sets the amount of memtable flush writer threads. These will
--
2.4.3

Glauber Costa

<glauber@scylladb.com>
unread,
Nov 13, 2015, 2:49:14 PM11/13/15
to scylladb-dev@googlegroups.com, Glauber Costa
arbitrary 8G -> all_memory.

Signed-off-by: Glauber Costa <glo...@scylladb.com>
---
db/config.hh | 2 +-
conf/scylla.yaml | 8 +++-----
2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/db/config.hh b/db/config.hh
index c153dec..6b04798 100644
--- a/db/config.hh
+++ b/db/config.hh
@@ -316,7 +316,7 @@ class config {
val(commitlog_sync_batch_window_in_ms, uint32_t, 10000, Used, \
"Controls how long the system waits for other writes before performing a sync in \"batch\" mode." \
) \
- val(commitlog_total_space_in_mb, int64_t, 8192, Used, \
+ val(commitlog_total_space_in_mb, int64_t, -1, Used, \
"Total space used for commitlogs. If the used space goes above this value, Cassandra rounds up to the next nearest segment multiple and flushes memtables to disk for the oldest commitlog segments, removing those log segments. This reduces the amount of data to replay on startup, and prevents infrequently-updated tables from indefinitely keeping commitlog segments. A small total commitlog space tends to cause more flush activity on less-active tables.\n" \
"Related information: Configuring memtable throughput" \
) \
diff --git a/conf/scylla.yaml b/conf/scylla.yaml
index cef21a4..d5be11a 100644
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -409,18 +409,16 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# offheap_objects: native memory, eliminating nio buffer heap overhead
# memtable_allocation_type: heap_buffers

-# Total space to use for commitlogs. Since commitlog segments are
-# mmapped, and hence use up address space, the default size is 32
-# on 32-bit JVMs, and 8192 on 64-bit JVMs.
+# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Scylla will flush every dirty CF in the oldest
# segment and remove it. So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
#
-# A value of -1 will automatically equate it to the total amount of memory
+# A value of -1 (default) will automatically equate it to the total amount of memory
# available for Scylla.
-commitlog_total_space_in_mb: 8192
+commitlog_total_space_in_mb: -1

# This sets the amount of memtable flush writer threads. These will
# be blocked by disk io, and each one will hold a memtable in memory
--
2.4.3

Commit Bot

<bot@cloudius-systems.com>
unread,
Nov 15, 2015, 3:29:29 AM11/15/15
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>

config: change type for commitlog maximum size config option

This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb. Moving to 64 is not strictly needed, since
even a
signed 32-bit type would allow us to easily handle 2TB. But since we store
that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers. With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of
the
JVM. We don't need that, and in many-shards configuration, each shard
flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we
would
like to equate that (at least) with the amount of memory. The problem is
how to
do that without introducing new options or changing the semantics of
existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since
those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glo...@scylladb.com>

---
diff --git a/conf/scylla.yaml b/conf/scylla.yaml
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -417,6 +417,9 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# segment multiple), Scylla will flush every dirty CF in the oldest
# segment and remove it. So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
+#
+# A value of -1 will automatically equate it to the total amount of memory
+# available for Scylla.
commitlog_total_space_in_mb: 8192

# This sets the amount of memtable flush writer threads. These will
diff --git a/db/commitlog/commitlog.cc b/db/commitlog/commitlog.cc
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -55,6 +55,7 @@
#include <core/rwlock.hh>
#include <core/gate.hh>
#include <core/fstream.hh>
+#include <seastar/core/memory.hh>
#include <net/byteorder.hh>

#include "commitlog.hh"
@@ -89,7 +90,7 @@ class crc32_nbo {

db::commitlog::config::config(const db::config& cfg)
: commit_log_location(cfg.commitlog_directory())
- , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb())
+ , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb() >= 0 ?
cfg.commitlog_total_space_in_mb() : memory::stats().total_memory())
, commitlog_segment_size_in_mb(cfg.commitlog_segment_size_in_mb())
, commitlog_sync_period_in_ms(cfg.commitlog_sync_batch_window_in_ms())
, mode(cfg.commitlog_sync() == "batch" ? sync_mode::BATCH :
sync_mode::PERIODIC)
diff --git a/db/config.hh b/db/config.hh
--- a/db/config.hh
+++ b/db/config.hh
@@ -316,7 +316,7 @@ public:
val(commitlog_sync_batch_window_in_ms, uint32_t, 10000, Used, \
"Controls how long the system waits for other writes before
performing a sync in \"batch\" mode." \
) \
- val(commitlog_total_space_in_mb, uint32_t, 8192, Used, \
+ val(commitlog_total_space_in_mb, int64_t, 8192, Used, \

Commit Bot

<bot@cloudius-systems.com>
unread,
Nov 15, 2015, 3:29:30 AM11/15/15
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Avi Kivity <a...@scylladb.com>

change defaults for commitlog maximum size

arbitrary 8G -> all_memory.

Signed-off-by: Glauber Costa <glo...@scylladb.com>

---
diff --git a/conf/scylla.yaml b/conf/scylla.yaml
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -409,18 +409,16 @@ partitioner:
org.apache.cassandra.dht.Murmur3Partitioner
# offheap_objects: native memory, eliminating nio buffer heap overhead
# memtable_allocation_type: heap_buffers

-# Total space to use for commitlogs. Since commitlog segments are
-# mmapped, and hence use up address space, the default size is 32
-# on 32-bit JVMs, and 8192 on 64-bit JVMs.
+# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Scylla will flush every dirty CF in the oldest
# segment and remove it. So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
#
-# A value of -1 will automatically equate it to the total amount of memory
+# A value of -1 (default) will automatically equate it to the total amount
of memory
# available for Scylla.
-commitlog_total_space_in_mb: 8192
+commitlog_total_space_in_mb: -1

# This sets the amount of memtable flush writer threads. These will
# be blocked by disk io, and each one will hold a memtable in memory
diff --git a/db/config.hh b/db/config.hh
--- a/db/config.hh
+++ b/db/config.hh
@@ -316,7 +316,7 @@ public:
val(commitlog_sync_batch_window_in_ms, uint32_t, 10000, Used, \
"Controls how long the system waits for other writes before
performing a sync in \"batch\" mode." \
) \
- val(commitlog_total_space_in_mb, int64_t, 8192, Used, \
+ val(commitlog_total_space_in_mb, int64_t, -1, Used, \

Avi Kivity

<avi@scylladb.com>
unread,
Nov 15, 2015, 3:30:41 AM11/15/15
to Glauber Costa, scylladb-dev@googlegroups.com, Glauber Costa
Note that for cassandra-stress default settings, we will always flush a
memtable before it fills up if you have enough memory, because it is an
overwriting workload.
Reply all
Reply to author
Forward
0 new messages