[PATCH 0/5] adapt scylla_io_setup to recent I/O Scheduler changes

82 views
Skip to first unread message

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:26 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
Available at:

g...@github.com:glommer/scylla.git scylla_io_setup

Recently many changes have landed in seastar for the I/O Scheduler. We
can now describe the I/O storage of a machine by its visible properties
like throughput and bandwidth instead of relying in an indirect
calculation.

For the instances we support, we can just measure that and start using
them right away.

A version of iotune that computes those properties is not yet ready, but
in its making I have noticed that we aren't really setting the nomerges
and scheduler properties of the disks under testing. We definitely
should, since that can influence the results. So this patchset also
starts doing that.

The commandline for iotunev2 shouldn't change much. When it is ready we
will just adjust this script once more.

Glauber Costa (5):
scripts: move scylla-blocktune to scripts location
blocktune: allow it to be called as a library.
io_setup: call blocktune before tuning I/O
scylla_lib: drop support for m3 and c3 AWS instance types
scylla_io_setup: preconfigure i3 and i2 instances with new I/O
scheduler properties

.../common/scripts/scylla_blocktune.py | 24 +----------
dist/common/scripts/scylla-blocktune | 47 ++++++++++++++++++++++
dist/common/scripts/scylla_io_setup | 41 ++++++++++++-------
dist/common/scripts/scylla_lib.sh | 2 +-
dist/debian/scylla-server.install.in | 1 -
dist/redhat/scylla.spec.in | 3 +-
6 files changed, 78 insertions(+), 40 deletions(-)
rename scylla-blocktune => dist/common/scripts/scylla_blocktune.py (76%)
mode change 100755 => 100644
create mode 100755 dist/common/scripts/scylla-blocktune

--
2.14.3

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:28 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
This patch makes the functions in scylla-blocktune available as a
library for other scripts - namely scylla_io_setup.

The filename, scylla-blocktune, is not the most convenient thing to call
from python so instead of just wrapping it in the usual test for
__main__ I am just splitting the file into two.

Another option would be to patch all callers to call
scylla_blocktune.py, but because we are usually not using extensions in
scripts that are meant to be called directly I decided for the split.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
.../{scylla-blocktune => scylla_blocktune.py} | 24 +----
dist/common/scripts/scylla-blocktune | 111 ++++-----------------
dist/redhat/scylla.spec.in | 1 +
3 files changed, 24 insertions(+), 112 deletions(-)
copy dist/common/scripts/{scylla-blocktune => scylla_blocktune.py} (76%)
mode change 100755 => 100644

diff --git a/dist/common/scripts/scylla-blocktune b/dist/common/scripts/scylla_blocktune.py
old mode 100755
new mode 100644
similarity index 76%
copy from dist/common/scripts/scylla-blocktune
copy to dist/common/scripts/scylla_blocktune.py
index 4580b6d1e..ac278863f
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla_blocktune.py
@@ -21,7 +21,7 @@
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
#

-import argparse, os, os.path
+import os, os.path

# try to write data to a sysfs path, expect problems
def try_write(path, data):
@@ -90,25 +90,3 @@ def tune_yaml(path, nomerges):
for fs in y['data_file_directories']:
tune_fs(fs, nomerges)
tune_fs(y['commitlog_directory'], nomerges)
-
-ap = argparse.ArgumentParser('Tune filesystems for ScyllaDB')
-ap.add_argument('--set-nomerges', metavar='VAL', dest='nomerges',
- help='Overwrite nomerges parameter')
-ap.add_argument('--filesystem', metavar='PATH', action='append', dest='fs', default=[],
- help='Tune filesystem containing PATH')
-ap.add_argument('--dev', metavar='PATH', action='append', dest='dev', default=[],
- help='Tune device node PATH')
-ap.add_argument('--config', metavar='YAML', action='append', dest='yaml', default=[],
- help='Process given scylla.yaml')
-
-args = ap.parse_args()
-
-if not args.yaml and not args.fs and not args.dev:
- tune_yaml('/etc/scylla/scylla.yaml', args.nomerges)
-else:
- for yaml in args.yaml:
- tune_yaml(yaml, args.nomerges)
- for fs in args.fs:
- tune_fs(fs, args.nomerges)
- for dev in args.dev:
- tune_dev(dev, args.nomerges)
diff --git a/dist/common/scripts/scylla-blocktune b/dist/common/scripts/scylla-blocktune
index 4580b6d1e..d01edeb20 100755
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla-blocktune
@@ -1,7 +1,7 @@
#!/usr/bin/python3

#
-# Copyright (C) 2016 ScyllaDB
+# Copyright (C) 2018 ScyllaDB
#

#
@@ -19,96 +19,29 @@
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
-#
-
-import argparse, os, os.path

-# try to write data to a sysfs path, expect problems
-def try_write(path, data):
- try:
- open(path, 'w').write(data)
- except Exception:
- print("warning: unable to tune {} to {}".format(path, data))
+import argparse
+from scylla_blocktune import *

-# update a sysfs path if it does not satisfy a check
-# function (default = check that the data is already there)
-def tune_path(path, data, check=None):
- def default_check(current):
- return current == data
- if check is None:
- check = default_check
- if not os.path.exists(path):
- return
- if check(open(path).read().strip()):
- print('already tuned: {}'.format(path))
- return
- print('tuning: {} {}'.format(path, data))
- try_write(path, data + '\n')
+if __name__ == "__main__":
+ ap = argparse.ArgumentParser('Tune filesystems for ScyllaDB')
+ ap.add_argument('--set-nomerges', metavar='VAL', dest='nomerges',
+ help='Overwrite nomerges parameter')
+ ap.add_argument('--filesystem', metavar='PATH', action='append', dest='fs', default=[],
+ help='Tune filesystem containing PATH')
+ ap.add_argument('--dev', metavar='PATH', action='append', dest='dev', default=[],
+ help='Tune device node PATH')
+ ap.add_argument('--config', metavar='YAML', action='append', dest='yaml', default=[],
+ help='Process given scylla.yaml')

-tuned_blockdevs = set()
+ args = ap.parse_args()

-# tune a blockdevice (sysfs node); updates I/O scheduler
-# and merge behavior. Tunes dependent devices
-def tune_blockdev(path, nomerges):
- from os.path import join, exists, dirname, realpath
- path = realpath(path)
- print('tuning {}'.format(path))
- if path in tuned_blockdevs:
- return
- tuned_blockdevs.add(path)
- def check_sched(current):
- return current == 'none' or '[noop]' in current
- if not nomerges:
- tune_path(join(path, 'queue', 'scheduler'), 'noop', check_sched)
- tune_path(join(path, 'queue', 'nomerges'), '2')
+ if not args.yaml and not args.fs and not args.dev:
+ tune_yaml('/etc/scylla/scylla.yaml', args.nomerges)
else:
- tune_path(join(path, 'queue', 'nomerges'), nomerges)
- slaves = join(path, 'slaves')
- if exists(slaves):
- for slave in os.listdir(slaves):
- tune_blockdev(join(slaves, slave), nomerges)
- if exists(join(path, 'partition')):
- tune_blockdev(dirname(path), nomerges)
-
-# tunes a /dev/foo blockdev
-def tune_dev(path, nomerges):
- dev = os.stat(path).st_rdev
- devfile = '/sys/dev/block/{}:{}'.format(dev // 256, dev % 256)
- tune_blockdev(devfile, nomerges)
-
-# tunes a filesystem
-# FIXME: btrfs
-def tune_fs(path, nomerges):
- dev = os.stat(path).st_dev
- devfile = '/sys/dev/block/{}:{}'.format(dev // 256, dev % 256)
- tune_blockdev(devfile, nomerges)
-
-# tunes all filesystems referenced from a scylla.yaml
-def tune_yaml(path, nomerges):
- import yaml
- y = yaml.load(open(path))
- for fs in y['data_file_directories']:
- tune_fs(fs, nomerges)
- tune_fs(y['commitlog_directory'], nomerges)
-
-ap = argparse.ArgumentParser('Tune filesystems for ScyllaDB')
-ap.add_argument('--set-nomerges', metavar='VAL', dest='nomerges',
- help='Overwrite nomerges parameter')
-ap.add_argument('--filesystem', metavar='PATH', action='append', dest='fs', default=[],
- help='Tune filesystem containing PATH')
-ap.add_argument('--dev', metavar='PATH', action='append', dest='dev', default=[],
- help='Tune device node PATH')
-ap.add_argument('--config', metavar='YAML', action='append', dest='yaml', default=[],
- help='Process given scylla.yaml')
-
-args = ap.parse_args()
-
-if not args.yaml and not args.fs and not args.dev:
- tune_yaml('/etc/scylla/scylla.yaml', args.nomerges)
-else:
- for yaml in args.yaml:
- tune_yaml(yaml, args.nomerges)
- for fs in args.fs:
- tune_fs(fs, args.nomerges)
- for dev in args.dev:
- tune_dev(dev, args.nomerges)
+ for yaml in args.yaml:
+ tune_yaml(yaml, args.nomerges)
+ for fs in args.fs:
+ tune_fs(fs, args.nomerges)
+ for dev in args.dev:
+ tune_dev(dev, args.nomerges)
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in
index 62b217123..053b65805 100644
--- a/dist/redhat/scylla.spec.in
+++ b/dist/redhat/scylla.spec.in
@@ -142,6 +142,7 @@ install -m755 seastar/dpdk/usertools/dpdk-devbind.py $RPM_BUILD_ROOT%{_prefix}/l
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
install -m755 build/release/iotune $RPM_BUILD_ROOT%{_bindir}
install -m755 dist/common/bin/scyllatop $RPM_BUILD_ROOT%{_bindir}
+install -m644 dist/common/scripts/scylla_blocktune.py $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 dist/common/scripts/scylla-blocktune $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 scylla-housekeeping $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
if @@HOUSEKEEPING_CONF@@; then
--
2.14.3

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:28 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
scylla-blocktune currently lives in the top level but this is mostly
historical. When time comes for us to install it, the packaging systems
will copy it to /usr/lib/scylla with the others.

So for consistency let's make sure that it also lives in the scripts
directory.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
scylla-blocktune => dist/common/scripts/scylla-blocktune | 0
dist/debian/scylla-server.install.in | 1 -
dist/redhat/scylla.spec.in | 2 +-
3 files changed, 1 insertion(+), 2 deletions(-)
rename scylla-blocktune => dist/common/scripts/scylla-blocktune (100%)

diff --git a/scylla-blocktune b/dist/common/scripts/scylla-blocktune
similarity index 100%
rename from scylla-blocktune
rename to dist/common/scripts/scylla-blocktune
diff --git a/dist/debian/scylla-server.install.in b/dist/debian/scylla-server.install.in
index 433d335ea..b4c78f425 100644
--- a/dist/debian/scylla-server.install.in
+++ b/dist/debian/scylla-server.install.in
@@ -5,7 +5,6 @@ seastar/dpdk/usertools/dpdk-devbind.py usr/lib/scylla
seastar/scripts/posix_net_conf.sh usr/lib/scylla
seastar/scripts/perftune.py usr/lib/scylla
dist/common/scripts/* usr/lib/scylla
-scylla-blocktune usr/lib/scylla
scylla-housekeeping usr/lib/scylla
tools/scyllatop usr/lib/scylla
swagger-ui/dist usr/lib/scylla/swagger-ui
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in
index 11ed36e99..62b217123 100644
--- a/dist/redhat/scylla.spec.in
+++ b/dist/redhat/scylla.spec.in
@@ -142,7 +142,7 @@ install -m755 seastar/dpdk/usertools/dpdk-devbind.py $RPM_BUILD_ROOT%{_prefix}/l
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
install -m755 build/release/iotune $RPM_BUILD_ROOT%{_bindir}
install -m755 dist/common/bin/scyllatop $RPM_BUILD_ROOT%{_bindir}
-install -m755 scylla-blocktune $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
+install -m755 dist/common/scripts/scylla-blocktune $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 scylla-housekeeping $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
if @@HOUSEKEEPING_CONF@@; then
install -m644 conf/housekeeping.cfg $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
--
2.14.3

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:29 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
We are not configuring the disks the way we want them with respect to
scheduler and nomerges. This is an oversigh that became clear now that
I started rewriting iotune-- since I will explicitly test for that. But
since this can affect the results, it should be here all along.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
dist/common/scripts/scylla_io_setup | 2 ++
1 file changed, 2 insertions(+)

diff --git a/dist/common/scripts/scylla_io_setup b/dist/common/scripts/scylla_io_setup
index 5a67f9687..3a836ace2 100755
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup
@@ -25,6 +25,7 @@ import argparse
import yaml
import logging
import sys
+import scylla_blocktune as blocktune

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='IO Setup script for Scylla.')
@@ -77,6 +78,7 @@ if __name__ == "__main__":
elif cpudata.smp():
iotune_args += [ "--smp", cpudata.smp() ]

+ blocktune.tune_fs(data_dir, '2')
try:
subprocess.check_call(["iotune",
"--evaluation-directory", data_dir,
--
2.14.3

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:30 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
m3 has 80GB SSDs in its largest form and I doubt anybody has ever
used it with Scylla.

I am also not aware of any c3 deployments. Since it is past generation,
it doesn't even show up in the default instance selector anymore.

I propose we drop AMI support for it. In practice, what that means is
that we won't auto-tune its I/O properties and people that want to use
it will have to run scylla_io_setup - like they do today with the EBS
instances.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
dist/common/scripts/scylla_lib.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dist/common/scripts/scylla_lib.sh b/dist/common/scripts/scylla_lib.sh
index 91187dba3..238f3ca30 100644
--- a/dist/common/scripts/scylla_lib.sh
+++ b/dist/common/scripts/scylla_lib.sh
@@ -33,7 +33,7 @@ is_selinux_enabled() {
ec2_is_supported_instance_type() {
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 1`
case $TYPE in
- "m3"|"c3"|"i2"|"i3") echo 1;;
+ "i2"|"i3") echo 1;;
*) echo 0;;
esac
}
--
2.14.3

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 28, 2018, 9:55:32 AM4/28/18
to scylladb-dev@googlegroups.com, Glauber Costa
We can use iotunev2 (or any other I/O generator) to test for the limits
of the disks for the i2 and i3 instance classes. The values I got here
are the values I got from ~5 invocations of the (yet to be upstreamed)
iotune v2, with the IOPS numbers rounded for convenience of reading.

During the execution, I verified that the disks were saturated so we
can trust these numbers even if iotunev2 is merged in a different form.
The numbers are very consistent, unlike what we usually saw with the
first version of iotune.

Previously, we were just multiplying the concurrency number by the
number of disks. Now that we have better infrastructure, we will
manually test i3.large and i3.xlarge, since their disks are smaller
and slower.

For the other i3, and all instances in the i2 family storage scales up
by adding more disks. So we can keep multiplying the characteristics of
one known disk by the number of disks and assuming perfect scaling.

Example for i3, obtained with i3.2xlarge:

read_iops = 411k
read_bandwidth = 1.9GB/s

So for i3.16xlarge, we would have read_iops = 3.28M and 15GB/s - very
close to the numbers advertised by AWS.

Signed-off-by: Glauber Costa <gla...@scylladb.com>
---
dist/common/scripts/scylla_io_setup | 39 ++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/dist/common/scripts/scylla_io_setup b/dist/common/scripts/scylla_io_setup
index 3a836ace2..3984da7cc 100755
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup
@@ -37,23 +37,34 @@ if __name__ == "__main__":
if not scylla_util.is_developer_mode():
if args.ami:
idata = scylla_util.aws_instance()
- nr_io_queues = cpudata.nr_shards()

- if idata.instance_class() == "i3":
- sysfs_file = "/sys/block/%s/queue/nr_requests"
- max_seastar_shard_req = 128
- max_sysfs_shard_req = sum([ int(file(sysfs_file % x).readline().strip()) for x in idata.ephemeral_disks() ])
- # obtained running iotune multiple times against a single i3 disk.
- max_iotune_disk = 192 * len(idata.ephemeral_disks())
- nr_reqs = min(max_seastar_shard_req * cpudata.nr_shards(), max_sysfs_shard_req, max_iotune_disk)
+ disk_properties = {}
+ nr_disks = len(idata.ephemeral_disks())
+ ## both i3 and i2 can run with 1 I/O Queue per shard
+ if idata.instance() == "i3.large":
+ disk_properties["read_iops"] = 111000
+ disk_properties["read_bandwidth"] = 653925080
+ disk_properties["write_iops"] = 36800
+ disk_properties["write_bandwdith"] = 215066473
+ elif idata.instance() == "i3.xlarge":
+ disk_properties["read_iops"] = 200800
+ disk_properties["read_bandwidth"] = 1185106376
+ disk_properties["write_iops"] = 53180
+ disk_properties["write_bandwdith"] = 423621267
+ elif idata.instance_class() == "i3":
+ disk_properties["read_iops"] = 411200 * nr_disks
+ disk_properties["read_bandwidth"] = 2015342735 * nr_disks
+ disk_properties["write_iops"] = 181500 * nr_disks
+ disk_properties["write_bandwdith"] = 808775652 * nr_disks
elif idata.instance_class() == "i2":
- nr_reqs = 32 * len(idata.ephemeral_disks())
- else:
- nr_reqs = 16 * max(len(idata.ephemeral_disks()), 2)
- if nr_reqs/nr_io_queues < 4:
- nr_io_queues = nr_reqs / 4
+ disk_properties["read_iops"] = 64000 * nr_disks
+ disk_properties["read_bandwidth"] = 507338935 * nr_disks
+ disk_properties["write_iops"] = 57100 * nr_disks
+ disk_properties["write_bandwdith"] = 483141731 * nr_disks
+ properties_file = file("/etc/scylla.d/io_properties.yaml", "w")
+ yaml.dump({ "disks": disk_properties }, properties_file, default_flow_style=False)
ioconf = file("/etc/scylla.d/io.conf", "w")
- ioconf.write("SEASTAR_IO=\"--num-io-queues {} --max-io-requests {}\"\n".format(nr_io_queues, nr_reqs))
+ ioconf.write("SEASTAR_IO=\"--io-properties-file={}\"\n".format(properties_file))
else:
if os.environ.has_key("SCYLLA_CONF"):
conf_dir = os.environ["SCYLLA_CONF"]
--
2.14.3

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 4:44:42 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: next

scripts: move scylla-blocktune to scripts location

scylla-blocktune currently lives in the top level but this is mostly
historical. When time comes for us to install it, the packaging systems
will copy it to /usr/lib/scylla with the others.

So for consistency let's make sure that it also lives in the scripts
directory.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla-blocktune
b/dist/common/scripts/scylla-blocktune
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla-blocktune
null
diff --git a/dist/debian/scylla-server.install.in
b/dist/debian/scylla-server.install.in
--- a/dist/debian/scylla-server.install.in
+++ b/dist/debian/scylla-server.install.in
@@ -5,7 +5,6 @@ seastar/dpdk/usertools/dpdk-devbind.py usr/lib/scylla
seastar/scripts/posix_net_conf.sh usr/lib/scylla
seastar/scripts/perftune.py usr/lib/scylla
dist/common/scripts/* usr/lib/scylla
-scylla-blocktune usr/lib/scylla
scylla-housekeeping usr/lib/scylla
tools/scyllatop usr/lib/scylla
swagger-ui/dist usr/lib/scylla/swagger-ui
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 4:44:43 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: next

blocktune: allow it to be called as a library.

This patch makes the functions in scylla-blocktune available as a
library for other scripts - namely scylla_io_setup.

The filename, scylla-blocktune, is not the most convenient thing to call
from python so instead of just wrapping it in the usual test for
__main__ I am just splitting the file into two.

Another option would be to patch all callers to call
scylla_blocktune.py, but because we are usually not using extensions in
scripts that are meant to be called directly I decided for the split.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla-blocktune
b/dist/common/scripts/scylla-blocktune
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla-blocktune
diff --git a/dist/common/scripts/scylla_blocktune.py
b/dist/common/scripts/scylla_blocktune.py
--- a/dist/common/scripts/scylla_blocktune.py
+++ b/dist/common/scripts/scylla_blocktune.py
@@ -0,0 +1,92 @@
+#!/usr/bin/python3
+
+#
+# Copyright (C) 2016 ScyllaDB
+#
+
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published
by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os, os.path
+
+# try to write data to a sysfs path, expect problems
+def try_write(path, data):
+ try:
+ open(path, 'w').write(data)
+ except Exception:
+ print("warning: unable to tune {} to {}".format(path, data))
+
+# update a sysfs path if it does not satisfy a check
+# function (default = check that the data is already there)
+def tune_path(path, data, check=None):
+ def default_check(current):
+ return current == data
+ if check is None:
+ check = default_check
+ if not os.path.exists(path):
+ return
+ if check(open(path).read().strip()):
+ print('already tuned: {}'.format(path))
+ return
+ print('tuning: {} {}'.format(path, data))
+ try_write(path, data + '\n')
+
+tuned_blockdevs = set()
+
+# tune a blockdevice (sysfs node); updates I/O scheduler
+# and merge behavior. Tunes dependent devices
+def tune_blockdev(path, nomerges):
+ from os.path import join, exists, dirname, realpath
+ path = realpath(path)
+ print('tuning {}'.format(path))
+ if path in tuned_blockdevs:
+ return
+ tuned_blockdevs.add(path)
+ def check_sched(current):
+ return current == 'none' or '[noop]' in current
+ if not nomerges:
+ tune_path(join(path, 'queue', 'scheduler'), 'noop', check_sched)
+ tune_path(join(path, 'queue', 'nomerges'), '2')
+ else:
+ tune_path(join(path, 'queue', 'nomerges'), nomerges)
+ slaves = join(path, 'slaves')
+ if exists(slaves):
+ for slave in os.listdir(slaves):
+ tune_blockdev(join(slaves, slave), nomerges)
+ if exists(join(path, 'partition')):
+ tune_blockdev(dirname(path), nomerges)
+
+# tunes a /dev/foo blockdev
+def tune_dev(path, nomerges):
+ dev = os.stat(path).st_rdev
+ devfile = '/sys/dev/block/{}:{}'.format(dev // 256, dev % 256)
+ tune_blockdev(devfile, nomerges)
+
+# tunes a filesystem
+# FIXME: btrfs
+def tune_fs(path, nomerges):
+ dev = os.stat(path).st_dev
+ devfile = '/sys/dev/block/{}:{}'.format(dev // 256, dev % 256)
+ tune_blockdev(devfile, nomerges)
+
+# tunes all filesystems referenced from a scylla.yaml
+def tune_yaml(path, nomerges):
+ import yaml
+ y = yaml.load(open(path))
+ for fs in y['data_file_directories']:
+ tune_fs(fs, nomerges)
+ tune_fs(y['commitlog_directory'], nomerges)
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in
--- a/dist/redhat/scylla.spec.in
+++ b/dist/redhat/scylla.spec.in
@@ -142,6 +142,7 @@ install -m755 seastar/dpdk/usertools/dpdk-devbind.py
$RPM_BUILD_ROOT%{_prefix}/l
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
install -m755 build/release/iotune $RPM_BUILD_ROOT%{_bindir}
install -m755 dist/common/bin/scyllatop $RPM_BUILD_ROOT%{_bindir}
+install -m644 dist/common/scripts/scylla_blocktune.py

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 4:44:44 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: next

io_setup: call blocktune before tuning I/O

We are not configuring the disks the way we want them with respect to
scheduler and nomerges. This is an oversigh that became clear now that
I started rewriting iotune-- since I will explicitly test for that. But
since this can affect the results, it should be here all along.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla_io_setup
b/dist/common/scripts/scylla_io_setup
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 4:44:45 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: next

scylla_lib: drop support for m3 and c3 AWS instance types

m3 has 80GB SSDs in its largest form and I doubt anybody has ever
used it with Scylla.

I am also not aware of any c3 deployments. Since it is past generation,
it doesn't even show up in the default instance selector anymore.

I propose we drop AMI support for it. In practice, what that means is
that we won't auto-tune its I/O properties and people that want to use
it will have to run scylla_io_setup - like they do today with the EBS
instances.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla_lib.sh
b/dist/common/scripts/scylla_lib.sh

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 4:44:47 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: next

scylla_io_setup: preconfigure i3 and i2 instances with new I/O scheduler
properties

We can use iotunev2 (or any other I/O generator) to test for the limits
of the disks for the i2 and i3 instance classes. The values I got here
are the values I got from ~5 invocations of the (yet to be upstreamed)
iotune v2, with the IOPS numbers rounded for convenience of reading.

During the execution, I verified that the disks were saturated so we
can trust these numbers even if iotunev2 is merged in a different form.
The numbers are very consistent, unlike what we usually saw with the
first version of iotune.

Previously, we were just multiplying the concurrency number by the
number of disks. Now that we have better infrastructure, we will
manually test i3.large and i3.xlarge, since their disks are smaller
and slower.

For the other i3, and all instances in the i2 family storage scales up
by adding more disks. So we can keep multiplying the characteristics of
one known disk by the number of disks and assuming perfect scaling.

Example for i3, obtained with i3.2xlarge:

read_iops = 411k
read_bandwidth = 1.9GB/s

So for i3.16xlarge, we would have read_iops = 3.28M and 15GB/s - very
close to the numbers advertised by AWS.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla_io_setup
b/dist/common/scripts/scylla_io_setup
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 5:19:14 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: master

scripts: move scylla-blocktune to scripts location

scylla-blocktune currently lives in the top level but this is mostly
historical. When time comes for us to install it, the packaging systems
will copy it to /usr/lib/scylla with the others.

So for consistency let's make sure that it also lives in the scripts
directory.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla-blocktune
b/dist/common/scripts/scylla-blocktune
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla-blocktune
null
diff --git a/dist/debian/scylla-server.install.in
b/dist/debian/scylla-server.install.in
--- a/dist/debian/scylla-server.install.in
+++ b/dist/debian/scylla-server.install.in
@@ -5,7 +5,6 @@ seastar/dpdk/usertools/dpdk-devbind.py usr/lib/scylla
seastar/scripts/posix_net_conf.sh usr/lib/scylla
seastar/scripts/perftune.py usr/lib/scylla
dist/common/scripts/* usr/lib/scylla
-scylla-blocktune usr/lib/scylla
scylla-housekeeping usr/lib/scylla
tools/scyllatop usr/lib/scylla
swagger-ui/dist usr/lib/scylla/swagger-ui
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in
--- a/dist/redhat/scylla.spec.in
+++ b/dist/redhat/scylla.spec.in
@@ -142,7 +142,7 @@ install -m755 seastar/dpdk/usertools/dpdk-devbind.py
$RPM_BUILD_ROOT%{_prefix}/l
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
install -m755 build/release/iotune $RPM_BUILD_ROOT%{_bindir}
install -m755 dist/common/bin/scyllatop $RPM_BUILD_ROOT%{_bindir}
-install -m755 scylla-blocktune $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
+install -m755 dist/common/scripts/scylla-blocktune
$RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 scylla-housekeeping $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
if @@HOUSEKEEPING_CONF@@; then

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 5:19:15 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: master

blocktune: allow it to be called as a library.

This patch makes the functions in scylla-blocktune available as a
library for other scripts - namely scylla_io_setup.

The filename, scylla-blocktune, is not the most convenient thing to call
from python so instead of just wrapping it in the usual test for
__main__ I am just splitting the file into two.

Another option would be to patch all callers to call
scylla_blocktune.py, but because we are usually not using extensions in
scripts that are meant to be called directly I decided for the split.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla-blocktune
b/dist/common/scripts/scylla-blocktune
--- a/dist/common/scripts/scylla-blocktune
+++ b/dist/common/scripts/scylla-blocktune
diff --git a/dist/redhat/scylla.spec.in b/dist/redhat/scylla.spec.in
--- a/dist/redhat/scylla.spec.in
+++ b/dist/redhat/scylla.spec.in
@@ -142,6 +142,7 @@ install -m755 seastar/dpdk/usertools/dpdk-devbind.py
$RPM_BUILD_ROOT%{_prefix}/l
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
install -m755 build/release/iotune $RPM_BUILD_ROOT%{_bindir}
install -m755 dist/common/bin/scyllatop $RPM_BUILD_ROOT%{_bindir}
+install -m644 dist/common/scripts/scylla_blocktune.py

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 5:19:17 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: master

scylla_lib: drop support for m3 and c3 AWS instance types

m3 has 80GB SSDs in its largest form and I doubt anybody has ever
used it with Scylla.

I am also not aware of any c3 deployments. Since it is past generation,
it doesn't even show up in the default instance selector anymore.

I propose we drop AMI support for it. In practice, what that means is
that we won't auto-tune its I/O properties and people that want to use
it will have to run scylla_io_setup - like they do today with the EBS
instances.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 5:19:17 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: master

io_setup: call blocktune before tuning I/O

We are not configuring the disks the way we want them with respect to
scheduler and nomerges. This is an oversigh that became clear now that
I started rewriting iotune-- since I will explicitly test for that. But
since this can affect the results, it should be here all along.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla_io_setup
b/dist/common/scripts/scylla_io_setup
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup

Commit Bot

<bot@cloudius-systems.com>
unread,
Apr 29, 2018, 5:19:18 AM4/29/18
to scylladb-dev@googlegroups.com, Glauber Costa
From: Glauber Costa <gla...@scylladb.com>
Committer: Glauber Costa <gla...@scylladb.com>
Branch: master

scylla_io_setup: preconfigure i3 and i2 instances with new I/O scheduler
properties

We can use iotunev2 (or any other I/O generator) to test for the limits
of the disks for the i2 and i3 instance classes. The values I got here
are the values I got from ~5 invocations of the (yet to be upstreamed)
iotune v2, with the IOPS numbers rounded for convenience of reading.

During the execution, I verified that the disks were saturated so we
can trust these numbers even if iotunev2 is merged in a different form.
The numbers are very consistent, unlike what we usually saw with the
first version of iotune.

Previously, we were just multiplying the concurrency number by the
number of disks. Now that we have better infrastructure, we will
manually test i3.large and i3.xlarge, since their disks are smaller
and slower.

For the other i3, and all instances in the i2 family storage scales up
by adding more disks. So we can keep multiplying the characteristics of
one known disk by the number of disks and assuming perfect scaling.

Example for i3, obtained with i3.2xlarge:

read_iops = 411k
read_bandwidth = 1.9GB/s

So for i3.16xlarge, we would have read_iops = 3.28M and 15GB/s - very
close to the numbers advertised by AWS.

Signed-off-by: Glauber Costa <gla...@scylladb.com>

---
diff --git a/dist/common/scripts/scylla_io_setup
b/dist/common/scripts/scylla_io_setup
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 30, 2018, 9:41:33 AM4/30/18
to scylladb-dev, Glauber Costa
ping

Avi Kivity

<avi@scylladb.com>
unread,
Apr 30, 2018, 9:44:14 AM4/30/18
to Glauber Costa, scylladb-dev

0530653da98dc755927afd95d02b3dc09a2b7c66

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To post to this group, send email to scylla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/CAD-J%3DzafLiwN%2BA81To8U-Nc%2BB7-b0dttUv2RZoLS7k1QYRFeZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Glauber Costa

<glauber@scylladb.com>
unread,
Apr 30, 2018, 9:49:10 AM4/30/18
to Avi Kivity, scylladb-dev
sigh...

Again I got no e-mail from the commit bot.
Sorry for the noise.

To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages