No such file or directory: '/sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc/vmtest2/cpuset.cpus'

982 views
Skip to first unread message

Pál Tóbiás

unread,
Oct 8, 2015, 5:58:08 AM10/8/15
to ganeti
Dear Ganeti gurus,

I added an lxc instance:
$ gnt-instance add -t plain -s 1G --os-type debootstrap+default --debug -H lxc vmtest2

And now I can't remove  it:
root@ganeti0:~# gnt-instance remove vmtest2
This will remove the volumes of the instance vmtest2 (including
mirrors), thus removing all the data of the instance. Continue?
y/[n]/?: y
Failure: command execution error:
Could not shutdown instance vmtest2 on node ganeti0: Error while executing backend function: Getting CPU list for instance vmtest2 failed: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc/vmtest2/cpuset.cpus'


The .../ganeti.service/ directory exists, but there is no lxc directory in it:
root@ganeti0:~# ls -l /sys/fs/cgroup/cpuset/system.slice/ganeti.service
total 0
-rw-r--r-- 1 root root 0 Oct  8 16:19 cgroup.clone_children
-rw-r--r-- 1 root root 0 Oct  8 16:19 cgroup.procs
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.cpu_exclusive
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.cpus
-r--r--r-- 1 root root 0 Oct  8 16:19 cpuset.effective_cpus
-r--r--r-- 1 root root 0 Oct  8 16:19 cpuset.effective_mems
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.mem_exclusive
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.mem_hardwall
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.memory_migrate
-r--r--r-- 1 root root 0 Oct  8 16:19 cpuset.memory_pressure
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.memory_spread_page
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.memory_spread_slab
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.mems
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.sched_load_balance
-rw-r--r-- 1 root root 0 Oct  8 16:19 cpuset.sched_relax_domain_level
-rw-r--r-- 1 root root 0 Oct  8 16:19 notify_on_release
-rw-r--r-- 1 root root 0 Oct  8 16:19 tasks


This is the ganeti version I'm using:
root@ganeti0:~# gnt-cluster version
Software version: 2.15.1
Internode protocol: 2150000
Configuration format: 2150000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version 2.15.1-1


The cluster looks ok, except for that instance:
root@ganeti0:~# gnt-cluster verify
Submitted jobs 238, 239
Waiting for job 238 ...
Thu Oct  8 16:50:23 2015 * Verifying cluster config
Thu Oct  8 16:50:24 2015 * Verifying cluster certificate files
Thu Oct  8 16:50:24 2015 * Verifying hypervisor parameters
Thu Oct  8 16:50:24 2015 * Verifying all nodes belong to an existing group
Waiting for job 239 ...
Thu Oct  8 16:50:24 2015 * Verifying group 'default'
Thu Oct  8 16:50:24 2015 * Gathering data (1 nodes)
Thu Oct  8 16:50:24 2015 * Gathering information about nodes (1 nodes)
Thu Oct  8 16:50:25 2015 * Gathering disk information (1 nodes)
Thu Oct  8 16:50:26 2015 * Verifying configuration file consistency
Thu Oct  8 16:50:26 2015 * Verifying node status
Thu Oct  8 16:50:26 2015 * Verifying instance status
Thu Oct  8 16:50:26 2015   - ERROR: instance vmtest2: instance not running on its primary node ganeti0
Thu Oct  8 16:50:26 2015 * Verifying orphan volumes
Thu Oct  8 16:50:26 2015 * Verifying N+1 Memory redundancy
Thu Oct  8 16:50:26 2015 * Other Notes
Thu Oct  8 16:50:26 2015   - NOTICE: 1 non-redundant instance(s) found.
Thu Oct  8 16:50:26 2015 * Hooks Results


How is that /sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc directory created? Did I miss a step somewhere? There was a reboot before I tried to remove the instance.

Have a nice day,
Paul

Pál Tóbiás

unread,
Oct 8, 2015, 5:58:36 AM10/8/15
to ganeti

ge...@riseup.net

unread,
Oct 8, 2015, 6:21:15 AM10/8/15
to gan...@googlegroups.com
Hi Paul,

Which OS is this? Are you using systemd? If so, could you show the
output of
# systemctl status ganeti.service

Thanks,
Georg

Pál Tóbiás

unread,
Oct 8, 2015, 10:37:44 AM10/8/15
to ganeti, ge...@riseup.net
Hi Georg,

This is on Ubuntu 15.10 Wily.

root@ganeti0:~# systemctl status ganeti.service
● ganeti.service - LSB: Ganeti Cluster Manager
   Loaded: loaded (/etc/init.d/ganeti)
   Active: active (running) since Thu 2015-10-08 14:03:55 ICT; 7h ago
     Docs: man:systemd-sysv-generator(8)
   CGroup: /system.slice/ganeti.service
           ├─2173 /usr/bin/python /usr/sbin/ganeti-noded
           ├─2187 /usr/sbin/ganeti-confd
           ├─2203 /usr/sbin/ganeti-wconfd
           ├─2225 /usr/bin/python /usr/sbin/ganeti-rapi
           ├─2243 /usr/sbin/ganeti-luxid
           ├─2275 /usr/sbin/ganeti-mond
           └─3221 /usr/sbin/ganeti-metad

Oct 08 14:03:53 ganeti0 ganeti[2021]: ...done.
Oct 08 14:03:53 ganeti0 ganeti[2021]: * ganeti-rapi...
Oct 08 14:03:53 ganeti0 ganeti[2021]: ...done.
Oct 08 14:03:53 ganeti0 ganeti[2021]: * ganeti-luxid...
Oct 08 14:03:54 ganeti0 ganeti[2021]: ...done.
Oct 08 14:03:54 ganeti0 ganeti[2021]: * ganeti-kvmd...
Oct 08 14:03:54 ganeti0 ganeti[2021]: ...done.
Oct 08 14:03:54 ganeti0 ganeti[2021]: * ganeti-mond...
Oct 08 14:03:55 ganeti0 ganeti[2021]: ...done.
Oct 08 14:03:55 ganeti0 systemd[1]: Started LSB: Ganeti Cluster Manager.


Ganeti seems to be running, kvm instances work properly.

Lucas, Sascha

unread,
Oct 9, 2015, 1:51:10 AM10/9/15
to ganeti
Hi Paul,

> The .../ganeti.service/ directory exists, but there is no lxc directory in it:

It seems that either Ganeti or lxc (lxc-start etc.) get confused with systemd cgroup hierarchy. At least Ganeti expects the "lxc" directory under the hierarchy, where its own process is. Would you mind to find where the lxc directory is? I.e.: "find /sys/ -type d -name lxc". AFAIK the lxc directory is created by the lxc cli tools, not Ganeti. At my non systemd-system the lxc directory is at the root of the hierarchy (i.e. /sys/fs/cgroup/cpuset/lxc).

Thanks, Sascha.



Aufsichtsratsvorsitzender: Herbert Vogel
Geschäftsführung: Michael Krüger
Sitz der Gesellschaft: Halle/Saale
Registergericht: Amtsgericht Stendal | Handelsregister-Nr. HRB 208414
UST-ID-Nr. DE 158253683

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Empfänger sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail oder des Inhalts dieser Mail sind nicht gestattet. Diese Kommunikation per E-Mail ist nicht gegen den Zugriff durch Dritte geschützt. Die GISA GmbH haftet ausdrücklich nicht für den Inhalt und die Vollständigkeit von E-Mails und den gegebenenfalls daraus entstehenden Schaden. Sollte trotz der bestehenden Viren-Schutzprogramme durch diese E-Mail ein Virus in Ihr System gelangen, so haftet die GISA GmbH - soweit gesetzlich zulässig - nicht für die hieraus entstehenden Schäden.

Pál Tóbiás

unread,
Oct 9, 2015, 2:53:50 AM10/9/15
to ganeti
On Friday, 9 October 2015 12:51:10 UTC+7, sascha wrote:
Hi Paul,

> The .../ganeti.service/ directory exists, but there is no lxc directory in it:

It seems that either Ganeti or lxc (lxc-start etc.) get confused with systemd cgroup hierarchy. At least Ganeti expects the "lxc" directory under the hierarchy, where its own process is. Would you mind to find where the lxc directory is? I.e.: "find /sys/ -type d -name lxc". AFAIK the lxc directory is created by the lxc cli tools, not Ganeti. At my non systemd-system the lxc directory is at the root of the hierarchy (i.e. /sys/fs/cgroup/cpuset/lxc).

Thanks, Sascha.


Hi Sascha,

I actually made it work woohoo!

The problem is in `lib/hypervisor/hv_lxc.py` in `_GetCgroupSubsysDir()` https://github.com/ganeti/ganeti/blob/master/lib/hypervisor/hv_lxc.py#L342

For me `cls._GetOrPrepareCgroupSubsysMountPoint(subsystem)` returns `/sys/fs/cgroup/cpuset`

`cls._GetCurrentCgroupSubsysGroups().get(subsystem, "")` returns `system.slice/ganeti.service`

and then `return utils.PathJoin(subsys_dir, base_group, "lxc")` returns `/sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc`

But my lxc dirs are here:

root@ganeti0:~# find /sys/fs/cgroup/ -name lxc -type d | sort
/sys/fs/cgroup/blkio/lxc
/sys/fs/cgroup/cpu,cpuacct/lxc
/sys/fs/cgroup/cpuset/lxc
/sys/fs/cgroup/devices/lxc
/sys/fs/cgroup/freezer/lxc
/sys/fs/cgroup/hugetlb/lxc
/sys/fs/cgroup/memory/lxc
/sys/fs/cgroup/net_cls,net_prio/lxc
/sys/fs/cgroup/perf_event/lxc
/sys/fs/cgroup/systemd/lxc


So the fix is this:

--- /usr/share/ganeti/2.15/ganeti/hypervisor/hv_lxc.py.orig 2015-10-09 10:56:31.853503230 +0700
+++ /usr/share/ganeti/2.15/ganeti/hypervisor/hv_lxc.py 2015-10-09 13:30:59.603841176 +0700
@@ -351,7 +351,7 @@
subsys_dir = cls._GetOrPrepareCgroupSubsysMountPoint(subsystem)
base_group = cls._GetCurrentCgroupSubsysGroups().get(subsystem, "")

- return utils.PathJoin(subsys_dir, base_group, "lxc")
+ return utils.PathJoin(subsys_dir, "lxc")

@classmethod
def _GetCgroupParamPath(cls, param_name, instance_name=None):


I'm not sure why is it looking under `/sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc` is it supposed to be there?

Lucas, Sascha

unread,
Oct 9, 2015, 4:49:03 AM10/9/15
to gan...@googlegroups.com
Hi Paul,

> So the fix is this:
> - return utils.PathJoin(subsys_dir, base_group, "lxc")
> + return utils.PathJoin(subsys_dir, "lxc")

Good to hear, that you can work around.

> I'm not sure why is it looking under `/sys/fs/cgroup/cpuset/system.slice/ganeti.service/lxc` is it supposed to be there?

I think, there is a mismatch in cgroup hierarchy concepts between Ganeti and lxc. From my point of view hierarchies are a fundamental concept in cgroups. As you can see how systemd does nesting: system -> service. It theory it may be possible to run parallel LXC solutions like docker, LXD or even another instance of Ganeti. With the connect of nested cgroups every parallel solution gets i.e. fair shares of CPU or block-I/O bandwidth (independently of how many "children" they have). Instantiating the lxc-group at the root (like lxc seems to do in your case) breaks this fair share. Or in other words, it is no longer independent from the number of children (children in Ganeti terms are instances).

While the above is more theoretical and doesn't matter in real life(TM), Ganeti should be aware, that lxc seems to not respect cgroup hierarchies. Would you mind to open an issue, so that Ganeti devs know about this?

I wonder, how debian jessie with systemd works in this case? ATM I don't use Ganeti/lxc with debian, but I found my ssh service under /sys/fs/cgroup/systemd/system.slice/ssh.service/. So I assume the same problem exists in jessie?

ge...@riseup.net

unread,
Oct 9, 2015, 5:24:50 AM10/9/15
to gan...@googlegroups.com
On 2015-10-09 10:48, Lucas, Sascha wrote:
> I wonder, how debian jessie with systemd works in this case? ATM I
> don't use Ganeti/lxc with debian, but I found my ssh service under
> /sys/fs/cgroup/systemd/system.slice/ssh.service/. So I assume the same
> problem exists in jessie?

Works ootb:

# find /sys/fs/cgroup/ -name lxc -type d | sort
/sys/fs/cgroup/blkio/lxc
/sys/fs/cgroup/cpu,cpuacct/lxc
/sys/fs/cgroup/cpuset/lxc
/sys/fs/cgroup/devices/lxc
/sys/fs/cgroup/freezer/lxc
/sys/fs/cgroup/memory/lxc
/sys/fs/cgroup/net_cls,net_prio/lxc
/sys/fs/cgroup/perf_event/lxc
/sys/fs/cgroup/systemd/lxc

# ls -l /sys/fs/cgroup/systemd/system.slice/ganeti.service/
total 0
-rw-r--r-- 1 root root 0 Oct 9 11:22 cgroup.clone_children
-rw-r--r-- 1 root root 0 Oct 9 11:19 cgroup.procs
-rw-r--r-- 1 root root 0 Oct 9 11:22 notify_on_release
-rw-r--r-- 1 root root 0 Oct 9 11:22 tasks

Not really sure about the internals of cgroups and systemd, so can't
comment on this.

Pál Tóbiás

unread,
Oct 9, 2015, 6:01:55 AM10/9/15
to ganeti, ge...@riseup.net

Georg, is that with ganeti 2.12.4, or with 2.15.1 from jessie-backports?

ge...@riseup.net

unread,
Oct 9, 2015, 6:18:18 AM10/9/15
to gan...@googlegroups.com
(No need to Cc: me, I'm subscribed to the list.)

On 2015-10-09 12:01, Pál Tóbiás wrote:
> Georg, is that with ganeti 2.12.4, or with 2.15.1 from
> jessie-backports?

It's 2.15.1; I didn't had to touch any paths etc.


Lucas, Sascha

unread,
Oct 9, 2015, 7:31:07 AM10/9/15
to gan...@googlegroups.com
Hi,

> > So I assume the same problem exists in jessie?

> Works ootb:

Interesting. The conclusion must be, that /proc/self/cgroup is different in debian 8 and ubuntu 15.10? Would you (Paul and Georg) mind sending the contents of /proc/pid_of_ganeti_noded/ cgroup?

ge...@riseup.net

unread,
Oct 9, 2015, 7:46:12 AM10/9/15
to gan...@googlegroups.com
On 2015-10-09 13:30, Lucas, Sascha wrote:
>> > So I assume the same problem exists in jessie?
>
>> Works ootb:
>
> Interesting. The conclusion must be, that /proc/self/cgroup is
> different in debian 8 and ubuntu 15.10? Would you (Paul and Georg)
> mind sending the contents of /proc/pid_of_ganeti_noded/ cgroup?

# cat /proc/3996/cgroup
9:perf_event:/
8:net_cls,net_prio:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/session-6.scope

Lucas, Sascha

unread,
Oct 9, 2015, 8:21:58 AM10/9/15
to gan...@googlegroups.com
Hi Georg,

> # cat /proc/3996/cgroup
...
> 1:name=systemd:/user.slice/user-1000.slice/session-6.scope

Are you sure, that PID 3996 is the one from ganeti-noded? It looks more like a "user" process from UID 1000. You can check for the right PID if /proc/PID/cmdline is "/usr/bin/python^@/usr/sbin/ganeti-noded^@".

It's also possible to just enter: "cat /proc/$(pidof -x ganeti-noded)/cgroup"

ge...@riseup.net

unread,
Oct 9, 2015, 8:30:26 AM10/9/15
to gan...@googlegroups.com
On 2015-10-09 14:21, Lucas, Sascha wrote:
>> # cat /proc/3996/cgroup
> ...
>> 1:name=systemd:/user.slice/user-1000.slice/session-6.scope
>
> Are you sure, that PID 3996 is the one from ganeti-noded? It looks
> more like a "user" process from UID 1000. You can check for the right
> PID if /proc/PID/cmdline is
> "/usr/bin/python^@/usr/sbin/ganeti-noded^@".

Yes, I'm sure. noded is running as root on my system. Again, it's ootb
like this, didn't change anything in this regard.

> It's also possible to just enter: "cat /proc/$(pidof -x
> ganeti-noded)/cgroup"

The pid changed because of a reboot I just did, apart from this, same
result:

# cat /proc/$(pidof -x ganeti-noded)/cgroup
9:perf_event:/
8:net_cls,net_prio:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/session-2.scope

Pál Tóbiás

unread,
Oct 9, 2015, 8:42:46 AM10/9/15
to ganeti

Interesting. The conclusion must be, that /proc/self/cgroup is different in debian 8 and ubuntu 15.10? Would you (Paul and Georg) mind sending the contents of /proc/pid_of_ganeti_noded/ cgroup?


 This is how it looks for me on 15.10 wily:
root@ganeti0:~# cat /proc/$(pidof -x ganeti-noded)/cgroup
10:blkio:/system.slice/ganeti.service
9:perf_event:/system.slice/ganeti.service
8:cpuset:/system.slice/ganeti.service
7:memory:/system.slice/ganeti.service
6:freezer:/system.slice/ganeti.service
5:cpu,cpuacct:/system.slice/ganeti.service
4:hugetlb:/system.slice/ganeti.service
3:devices:/system.slice/ganeti.service
2:net_cls,net_prio:/system.slice/ganeti.service
1:name=systemd:/system.slice/ganeti.service

(Sorry, can't control CC.)

Lucas, Sascha

unread,
Oct 9, 2015, 9:14:10 AM10/9/15
to gan...@googlegroups.com
Hi,

Thanks a lot Georg and Paul. It's like I assumed:

ge...@riseup.net wrote:
> # cat /proc/$(pidof -x ganeti-noded)/cgroup
> 9:perf_event:/
> 8:net_cls,net_prio:/
> 7:freezer:/
> 6:devices:/
> 5:memory:/
> 4:blkio:/
> 3:cpu,cpuacct:/
> 2:cpuset:/
> 1:name=systemd:/user.slice/user-1000.slice/session-2.scope

Pál Tóbiás wrote:
> root@ganeti0:~# cat /proc/$(pidof -x ganeti-noded)/cgroup
> 10:blkio:/system.slice/ganeti.service
> 9:perf_event:/system.slice/ganeti.service
> 8:cpuset:/system.slice/ganeti.service
> 7:memory:/system.slice/ganeti.service
> 6:freezer:/system.slice/ganeti.service
> 5:cpu,cpuacct:/system.slice/ganeti.service
> 4:hugetlb:/system.slice/ganeti.service
> 3:devices:/system.slice/ganeti.service
> 2:net_cls,net_prio:/system.slice/ganeti.service
> 1:name=systemd:/system.slice/ganeti.service

With debian 8 (Georg) the cgroup hierarchy of ganeti-noded is "/" and with ubuntu 15.10 it is "/system.slice/ganeti.service" (as seen in the original post for the cpuset subsystem).

Don't know if it is really lxc's fault, but at least Ganeti should be aware of this problem.

Somewhat strange is, that in debian systemd puts it in the user name hierarchy. @Georg: my it be possible that ganeti-noded is started from a shell initiated by the user with ID 1000? Situations where this may happen are possibly "cluster init" or "node add". So that ganeti-noded inherits the cgroup hierarchy from its parent process? I wonder if you restart Ganeti via systemctl if the name hierarchy would change to "/system.slice/ganeti.service"?

ge...@riseup.net

unread,
Oct 9, 2015, 9:36:54 AM10/9/15
to gan...@googlegroups.com
On 2015-10-09 15:13, Lucas, Sascha wrote:
> Somewhat strange is, that in debian systemd puts it in the user name
> hierarchy. @Georg: my it be possible that ganeti-noded is started from
> a shell initiated by the user with ID 1000? Situations where this may
> happen are possibly "cluster init" or "node add". So that ganeti-noded
> inherits the cgroup hierarchy from its parent process?

- At first I initiated the cluster via: ssh as user && sudo -i &&
gnt-cluster init [...]
- Still, the uid (should have?) changed because of sudo -i
- After this I rebooted, the result was the just posted extract of
/proc/1728/cgroup, especially
1:name=systemd:/user.slice/user-1000.slice/session-2.scope
- ganeti(-noded) was started automatically via systemd.
- (Not sure at this point, why it's (still) associated with the user?)

- Second run, which I just did now: ssh as root && gnt-cluster init
[...] && reboot, results in:
# cat /proc/$(pidof -x ganeti-noded)/cgroup
9:perf_event:/
8:net_cls,net_prio:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/system.slice/ganeti.service

> I wonder if you restart Ganeti via systemctl if the name hierarchy
> would change to "/system.slice/ganeti.service"?

You're some somewhat right..still I don't understand, why this isn't
happening via the first way.
But, as I wrote, I've got no clue (yet) about systemd and cgroup
internals.





Ansgar Jazdzewski

unread,
Sep 1, 2017, 1:18:25 AM9/1/17
to ganeti
Hi,

i know this is a bit old but we are facing the same issue on ubuntu16.04. did someone made some progess or have a patch?

thanks,
Ansgar

Ansgar Jazdzewski

unread,
Sep 1, 2017, 4:09:12 AM9/1/17
to Ganeti
Hi *,

one more finding:

lxc-ls only works if the container config is also in
"/var/lib/lxc/<INSANCE>/config"

utils.WriteFile(conf_file, data=conf)
try:
os.stat('/var/lib/lxc/'+instance.name)
except:
os.mkdir('/var/lib/lxc/'+instance.name)
utils.WriteFile('/var/lib/lxc/'+instance.name+'/config', data=conf)

a dirty hack but now instances are started


2017-09-01 7:18 GMT+02:00 'Ansgar Jazdzewski' via ganeti
<gan...@googlegroups.com>:
Reply all
Reply to author
Forward
0 new messages