ScyllaDB in LXC/LXD containers?

341 views
Skip to first unread message

Brandon Lamb

<brandonlamb@gmail.com>
unread,
Jan 3, 2017, 7:06:59 AM1/3/17
to ScyllaDB users
Hello all,

Just diving into experimenting with ScyllaDB. I'm trying to stand up a 3-node cluster using lxc containers on a single host (dual quad-core, 48G, six-drive raid10 ssd).

Has anyone successfully done this and are there any learnings to share? I'm running into some issues with errors/warnings and assuming there is just some learning curve pain to work through with resources.

When trying to start scylla-jmx:

Jan  3 11:57:35 db1-dev systemd[1]: Failed to reset devices.list on /system.slice/scylla-server.service: Operation not permitted
Jan  3 11:57:35 db1-dev systemd[1]: Starting Scylla Server...
Jan  3 11:57:35 db1-dev scylla_prepare[361]: hugeadm:WARNING: Directory /var/lib/hugetlbfs/pagesize-2MB is already mounted.
Jan  3 11:57:36 db1-dev scylla_prepare[361]: tuning /sys/dev/block/0:45
Jan  3 11:57:36 db1-dev scylla_prepare[361]: tuning /sys/dev/block/0:45
Jan  3 11:57:36 db1-dev scylla[364]: FATAL: Exception during startup, aborting: std::runtime_error (insufficient processing units)
Jan  3 11:57:36 db1-dev systemd[1]: scylla-server.service: Main process exited, code=exited, status=7/NOTRUNNING
Jan  3 11:57:36 db1-dev systemd[1]: Failed to start Scylla Server.
Jan  3 11:57:36 db1-dev systemd[1]: Dependency failed for Scylla JMX.
Jan  3 11:57:36 db1-dev systemd[1]: scylla-jmx.service: Job scylla-jmx.service/start failed with result 'dependency'.
Jan  3 11:57:36 db1-dev systemd[1]: Dependency failed for Run Scylla Housekeeping daily.
Jan  3 11:57:36 db1-dev systemd[1]: scylla-housekeeping.timer: Job scylla-housekeeping.timer/start failed with result 'dependency'.
Jan  3 11:57:36 db1-dev systemd[1]: scylla-server.service: Unit entered failed state.
Jan  3 11:57:36 db1-dev systemd[1]: scylla-server.service: Failed with result 'exit-code'.

And when trying to start scylla-server

Jan  3 12:03:36 db1-dev systemd[1]: Failed to reset devices.list on /system.slice/systemd-tmpfiles-clean.service: Operation not permitted
Jan  3 12:03:36 db1-dev systemd[1]: Starting Cleanup of Temporary Directories...
Jan  3 12:03:36 db1-dev systemd[1]: Failed to reset devices.list on /system.slice/scylla-server.service: Operation not permitted
Jan  3 12:03:36 db1-dev systemd[1]: Starting Scylla Server...
Jan  3 12:03:36 db1-dev systemd-tmpfiles[369]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Jan  3 12:03:36 db1-dev systemd[1]: Started Cleanup of Temporary Directories.
Jan  3 12:03:36 db1-dev scylla_prepare[370]: hugeadm:WARNING: Directory /var/lib/hugetlbfs/pagesize-2MB is already mounted.
Jan  3 12:03:36 db1-dev scylla_prepare[370]: tuning /sys/dev/block/0:45
Jan  3 12:03:36 db1-dev scylla_prepare[370]: tuning /sys/dev/block/0:45
Jan  3 12:03:36 db1-dev scylla[373]: FATAL: Exception during startup, aborting: std::runtime_error (insufficient processing units)
Jan  3 12:03:36 db1-dev systemd[1]: scylla-server.service: Main process exited, code=exited, status=7/NOTRUNNING
Jan  3 12:03:36 db1-dev systemd[1]: Failed to start Scylla Server.
Jan  3 12:03:36 db1-dev systemd[1]: Dependency failed for Scylla JMX.
Jan  3 12:03:36 db1-dev systemd[1]: scylla-jmx.service: Job scylla-jmx.service/start failed with result 'dependency'.
Jan  3 12:03:36 db1-dev systemd[1]: Dependency failed for Run Scylla Housekeeping daily.
Jan  3 12:03:36 db1-dev systemd[1]: scylla-housekeeping.timer: Job scylla-housekeeping.timer/start failed with result 'dependency'.
Jan  3 12:03:36 db1-dev systemd[1]: scylla-server.service: Unit entered failed state.
Jan  3 12:03:36 db1-dev systemd[1]: scylla-server.service: Failed with result 'exit-code'

Profile applied to containers

root@s4:~# lxc profile show hugepages
name
: hugepages
config
:
  security
.privileged: "true"
description
: ""
devices
: {}

I did remove one config setting that was there originally while trying to troubleshoot the "already mounted" warnings

  raw.lxc: |
    lxc
.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0 0

I'm totally just fumbling in the dark on this, so any tips or point me in a general direction would be great.

Thanks!

Brandon

P.S. - The ScyllaDB Summit youtube videos were awesome, I stayed up super late watching them all. Looking forward to exploring what appears to be an awesome project.

Avi Kivity

<avi@scylladb.com>
unread,
Jan 3, 2017, 7:50:17 AM1/3/17
to scylladb-users@googlegroups.com

"insufficient processing units" means that Scylla thinks you asked for more logical cores than are available.


Did you tweak the configuration?


What does "hwloc-ls" show inside the container?


When Scylla starts up, it tries to take over all of the memory and all of the logical cores available on the machine.  In container environments, some of this information reflects the entire machine (not the resources you allocated to the container), and some reflects the container.  Therefore, you should override the auto-detection and supply your own values.


With your dual quad, you have 8 cores and 16 logical cores (hyper-threading), so a good configuration would be


  "--smp 5 --memory 14G --thread-affinity 0"


If you are sharing the machine with something else, you can try


  "--smp 5 --memory 14G --overprovisioned"


and Scylla will be more friendly towards other services running on the machine.  Adjust --smp and especially --memory to leave some resources free.


Other notes:

 - RAID10 is overkill due to replication, don't use it on production deployments

 - starting scylla-jmx started scylla-server, which failed; the two problems you saw are the same.


Good luck!
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/6b74da76-43a0-4b8f-9657-452c596f49dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brandon Lamb

<brandonlamb@gmail.com>
unread,
Jan 3, 2017, 3:11:31 PM1/3/17
to ScyllaDB users
Awesome, thanks for the fast reply! This makes sense since I'm running on the same host, I'll give it a try tonight!

Brandon Lamb

<brandonlamb@gmail.com>
unread,
Jan 4, 2017, 4:24:18 AM1/4/17
to ScyllaDB users
https://brandonlamb.com/2017/01/04/scylladb-on-lxc-lxd-containers-with-zfs/

I threw together a hasty blog post on my learnings. When I got towards the end I realized there is a ton of crap left out, so hopefully I can muster the attention span to come back and break it apart and fill in more.

The gist though, thanks for the tips because it looks like following these steps of pinning CPU cores and limiting RAM for the container, as well as setting the scylla_args worked.

Back in business!

Brandon


On Tuesday, January 3, 2017 at 4:50:17 AM UTC-8, Avi Kivity wrote:

Brandon Lamb

<brandonlamb@gmail.com>
unread,
Jan 4, 2017, 4:29:52 AM1/4/17
to ScyllaDB users
On a side note, interesting comment about RAID10. I've always used this RAID level for faster reads and "theoretical" ability of losing half your drives (vs only 1 or 2). I've always accepted the cost associated with this versus calculating parity bit for a RAID 5/6.

Having said that, I'm totally not much of a hardware guy, I've setup some hardware raids and mdadm software raids, used them in production just fine, but am by no means any kind of hardcore tuner. My understanding has typically been RAID5/6 gives you more available space at the cost of performance, and that the only real downside was you lose half your available disk space, but if you are okay with that then cool.

I'm totally interested in others' learnings, it's just one of the rabbit-hole topics that you lose four hours over googling on.

Cheers!

Brandon

On Tuesday, January 3, 2017 at 4:50:17 AM UTC-8, Avi Kivity wrote:

Avi Kivity

<avi@scylladb.com>
unread,
Jan 4, 2017, 4:51:10 AM1/4/17
to scylladb-users@googlegroups.com

With the replicated storage provided by Scylla, there is no need for redundancy at the node level, so people typically use RAID-0 (not RAID-5/6).  If you lose a drive, you replace it, then rebuild the node using the other replicas stored across the cluster.


With a typical replication factor of 3, using RAID10 gives you a total storage bloat factor of 6, while providing limited benefit (there is a benefit - rebuilding a RAID10 is less complex than rebuilding a node).


I don't recommend using RAID5/6 with Scylla unless it's a hardware RAID device with a battery backed unit.


Of course, non of this holds for a testing system that backs three nodes using one machine; that's all for production deployments.

Brandon Lamb

<brandonlamb@gmail.com>
unread,
Jan 4, 2017, 4:58:05 AM1/4/17
to ScyllaDB users
Ah, okay that totally makes sense with that use-case in mind. I'm running three physical hosts with Docker + LXC, so I have other app containers that run on the same system. If I was running dedicated boxes for ScyllaDB then I think that is what you are saying.

In a private cloud environment, we may have a bit more control over how the underlying storage is configured, but in a public cloud if Linode for example uses raid10 or raid6, it is what it is.

I am making (hopefully good) assumptions that for my particular use-case, I'm not going to have *that* much data, I mostly want the cross datacenter replication and ease of scaling vs master/slave RDBM headache. Hopefully ScyllaDB (or Cassandra for that matter) has "good enough" performance out of the box until you get to "real big data" levels (millions/billions of things?).

And with that, time to sleep! :)

Dor Laor

<dor@scylladb.com>
unread,
Jan 4, 2017, 2:15:07 PM1/4/17
to ScyllaDB users
On Wed, Jan 4, 2017 at 1:58 AM, Brandon Lamb <brand...@gmail.com> wrote:
Ah, okay that totally makes sense with that use-case in mind. I'm running three physical hosts with Docker + LXC, so I have other app containers that run on the same system. If I was running dedicated boxes for ScyllaDB then I think that is what you are saying.

In a private cloud environment, we may have a bit more control over how the underlying storage is configured, but in a public cloud if Linode for example uses raid10 or raid6, it is what it is.

I am making (hopefully good) assumptions that for my particular use-case, I'm not going to have *that* much data, I mostly want the cross datacenter replication and ease of scaling vs master/slave RDBM headache. Hopefully ScyllaDB (or Cassandra for that matter) has "good enough" performance out of the box until you get to "real big data" levels (millions/billions of things?).


That's the ScyllaDB design scheme.
Usually adding layers hurt performance but in your case, all of the good LXC bindings will minimize it.
We don't recommend ZFS since its async I/O isn't as good as XFS and since we have a shard per core
model we depend on aio.
 
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages