Allocations filling up the disk when using the exec driver

616 views
Skip to first unread message

Theo Hultberg

unread,
Nov 29, 2016, 10:35:52 AM11/29/16
to Nomad
Hi,

We're looking into Nomad to use as a way to schedule batch jobs, but we've run into a problem. We're using the exec driver since we want to change as little as possible from the setup we have now, just introducing Nomad is a big step and we'd like to avoid to change too much at the same time. The problem is that it looks like Nomad is copying everything from the chroot_env list (/usr, /etc, /lib, and so on) into each allocation's directory. Since /usr on our systems (AWS Linux 2016.03.3) is 1 GB this means that each allocation takes 1 GB disk. It also looks like Nomad isn't very quick in cleaning things up after jobs have run, so we can't run very many jobs before the machines become unusable after filling up their disks. I guess we could limit the list of things in the chroot_env list, but that would also limit what our jobs have access to (I'm not sure at this point how limiting it would be, but it wouldn't be without complications).

Is this really how the exec driver is supposed to work?

How do people avoid running out of disk?

How quickly can we expect Nomad to clean up old allocations?

Since the java driver seems to work similarly to the exec driver, does it have the same issue?

yours,
Theo

Lowe Schmidt

unread,
Nov 29, 2016, 10:53:43 AM11/29/16
to Theo Hultberg, Nomad
You can change what parts you want to include in the chroot created by Nomad with chroot_env { } within client {} 

If you know your dependencies, or have a good way of finding them out, you can streamline the size of each chroot.

--
Lowe Schmidt | +46 723 867 157

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/ccbae127-c0ba-43b1-8607-fdf38d6ccc4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Theo Hultberg

unread,
Nov 29, 2016, 1:49:48 PM11/29/16
to Nomad
Hi Lowe,

As I mentioned in my question it's not without complications to limit that list, jobs have different requirements, and when new jobs have new requirements you need to ship new config to all Nomad servers and restart them all (reload didn't seem to be enough). That's a very heavy-handed way of doing it.

Surely that can't be the only solution to this? Is it really how exec is supposed to work?

T#

Lowe Schmidt

unread,
Nov 29, 2016, 2:28:45 PM11/29/16
to Theo Hultberg, Nomad
Hey again Theo, 

i might sound a bit grumpy now (old sysadmin habits die hard), but it sounds like you've identified a limitation and you're trying to solve it with the wrong tool. If diskspace is the issue, and you run on AWS EC2, then give the machines more disk (so much, that you know that the machines will be reprovisioned before hitting the limit). 

If restarting nomad-agents is heavy handed, then I'd suggest investing some time in automating the config management of nomad configs and triggering agent restarts, I believe consul with consul template can do this for you.

All the best,

--
Lowe Schmidt | +46 723 867 157

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.

Michael Schurter

unread,
Nov 29, 2016, 3:00:55 PM11/29/16
to Theo Hultberg, Nomad
Yes, that's how the exec driver is designed to work, but you're running into a similar problem as is described in https://github.com/hashicorp/nomad/issues/1418 We'll be doing GC on the client soon to fix that issue and ensure clients don't run out of disk space.

The java and exec drivers need to copy files into the chroot to provide isolation from the host while allowing allocations free reign within their chroot. This is similar to the common complaint from Docker users of multi-GC images. We may move to an overlay filesystem in the future to save space, but they've traditionally had issues of their own.

As a workaround until we improve GC'ing you can force a GC through the API: https://www.nomadproject.io/docs/http/system.html

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages