inotify watches

ya...@vapourforge.com

unread,

Jan 18, 2013, 8:50:25 PM1/18/13

to gan...@googlegroups.com

I just rebooted my cluster and got errors about running out of inotify
watches
"inotify Too many open files (EMFILE)"

I upped the max watches but it didn't fix it

i found upping
/proc/sys/fs/inotify/max_user_instances
from 128 to 500 seems to have fixed it, I don't know if its permanent or
not yet but we'll find out soon.

Any idea if this is a problem or if something has just outgrown something
else
I did boot a bunch (10 or so) of VM's at the same time
running ganeti 2.5 on a 12.04 host

Guido Trotter

unread,

Jan 19, 2013, 8:05:06 AM1/19/13

to gan...@googlegroups.com

We haven't seen this problem, I have to say.
Could you tell us a bit more:
1) which hypevisor do you use?
2) did you have anything that was "watching" the jobs as they ran?
3) do you have confd enabled/running
4) could you check in /proc/<your confd pid>/fd for how many inotify
sockets are open?
5) how about in your /proc/<your masterd pid>/fd
how about
find proc/ -type l -exec ls -l {} \; 2>/dev/null | grep anon_inode:inotify

who is the culprit? (process of group of processes with many inotify fds open)

Thanks,

Guido

Guido Trotter

unread,

Jan 19, 2013, 8:13:09 AM1/19/13

to gan...@googlegroups.com

(note: on my cluster I see a few too many open by ganeti-masterd, how
about on yours? I'll try to investigate this a bit more)

Thanks,

Guido

> Thanks,
>
> Guido

Jake Anderson

unread,

Jan 19, 2013, 5:27:21 PM1/19/13

to gan...@googlegroups.com, Guido Trotter

OK it looks like this is unrelated to ganeti, or at least not directly,
the result of that find is this

lr-x------ 1 root root 64 Jan 20 08:32 /proc/1/task/1/fd/5 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:32 /proc/1/task/1/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/1/fd/5 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/1/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:32 /proc/617/task/617/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/617/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:32 /proc/3120/task/3120/fd/5 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/3120/fd/5 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/12676/task/12676/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/12676/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/12863/task/12863/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 19 12:03 /proc/12863/fd/6 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16275/task/16275/fd/7 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16275/task/16275/fd/8 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16275/fd/7 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16275/fd/8 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16304/task/16304/fd/7 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16304/task/16304/fd/8 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16304/fd/7 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:33 /proc/16304/fd/8 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:51 /proc/26616/task/26616/fd/4 ->
anon_inode:inotify
lr-x------ 1 root root 64 Jan 20 08:51 /proc/26616/fd/4 ->
anon_inode:inotify

If i up the max_user_instances, then do a tail -f of syslog its ok but
if i close it and re open it by the number that i increased
max_user_instances by then it fails
so it sounds like something isn't releasing something
I have no clue about inotify, any pointers on where to start looking
would be really helpful

Guido Trotter

unread,

Jan 20, 2013, 1:06:13 AM1/20/13

to Jake Anderson, gan...@googlegroups.com

Could you try to reproduce the failure and when it's failing do the same check?
Also could you tell us if you can reproduce it?

Thanks,

Guido

Jake Anderson

unread,

Jan 20, 2013, 2:00:33 AM1/20/13

to gan...@googlegroups.com, Guido Trotter

it looks like its not releasing a watch or something
watch inotifywatch -v -e access -e modify -t 1 -r ~

gives me

Couldn't initialize inotify. Are you running Linux 2.6.13 or later, and
was the
CONFIG_INOTIFY option enabled when your kernel was compiled? If so,
something mysterious has gone wrong. Please e-mail ro...@mcgovern.id.au
and mention that you saw this message.

to cause it to happen I ran
watch inotifywatch -v -e access -e modify -t 1 -r ~
for a while

Guido Trotter

unread,

Jan 20, 2013, 2:16:05 AM1/20/13

to Jake Anderson, gan...@googlegroups.com

On Sun, Jan 20, 2013 at 8:00 AM, Jake Anderson <ya...@vapourforge.com> wrote:
> watch inotifywatch -v -e access -e modify -t 1 -r ~

No I mean in Ganeti, if for example restarting ganeti and then redoing
what you did (eg. starting multiple instances at the same time)
creates the same.

Thanks,

Guido

Jake Anderson

unread,

Jan 20, 2013, 6:52:43 AM1/20/13

to gan...@googlegroups.com, Guido Trotter

The problem doesn't appear linked to ganeti.
Shutting it (ganeti) down doesn't appear to make any difference.
Something somewhere else is stuffed I think I'll just blow the host away
and start over, unless you have any suggestions for how to track it down?

Michael Hanselmann

unread,

Jan 21, 2013, 5:33:35 AM1/21/13

to gan...@googlegroups.com

2013/1/19 Guido Trotter <ultr...@gmail.com>:

> On Sat, Jan 19, 2013 at 2:50 AM, <ya...@vapourforge.com> wrote:
>> i found upping
>> /proc/sys/fs/inotify/max_user_instances
>> from 128 to 500 seems to have fixed it, I don't know if its permanent or
>> not yet but we'll find out soon.
>

> We haven't seen this problem, I have to say.

We have seen some inotify problems:
<http://code.google.com/p/ganeti/issues/detail?id=218>. Some other
process(es) on the system are holding onto inotify handles. The
solution is likely to increase the number in
/proc/sys/fs/inotify/max_user_watches permanently.

Michael

Jake Anderson

unread,

Jan 21, 2013, 4:43:04 PM1/21/13

to gan...@googlegroups.com

The problem was increasing out didn't fix it as if I increased it from 128 out what ever the default was to 5000 after a day or so it would run out again. So I would increase it to 10000 and in another day it would run out again. Running tail -f repeatedly would cause it to run out

Sent from Samsung Galaxy Note

gertk

unread,

Jan 23, 2013, 2:15:08 AM1/23/13

to gan...@googlegroups.com

I saw exactly same problem on my testserver (Ubuntu 12.04, ganeti 2.5.2) . After reboot server it's gone.

Gert

gertk

unread,

Jan 23, 2013, 2:20:35 AM1/23/13

to gan...@googlegroups.com

Last login: Wed Jan 23 07:42:00 2013 from 10.100.201.5

root@node1:~# gnt-instance add -o snf-image+default --os-parameters img_passwd=Password,img_format=extdump,img_id=debian_base-6.0-5-x86_64 -t plain --disk=0:size=10G --net 0:link=prv5 testvm1

Unhandled protocol error while talking to the master daemon:

Caught exception: Cannot initialize new instance of inotify, Errno=Too many open files (EMFILE)

root@node1:~# /etc/init.d/ganeti restart

* Restarting Ganeti cluster * ganeti-confd... [ OK ]

* ganeti-rapi... [ OK ]

* ganeti-masterd... [ OK ]

* ganeti-noded... [ OK ]

* ganeti-masterd... [ OK ]

* ganeti-rapi... Error when starting daemon process: 'None (errno=None)'

[fail]

* ganeti-confd... Error when starting daemon process: 'None (errno=None)'

[fail]

root@node1:~# /etc/init.d/ganeti restart

* Restarting Ganeti cluster * ganeti-confd... [ OK ]

* ganeti-rapi... [ OK ]

* ganeti-masterd... [ OK ]

* ganeti-noded... [ OK ]

* ganeti-masterd... [ OK ]

* ganeti-rapi... Error when starting daemon process: 'None (errno=None)'

[fail]

* ganeti-confd... Error when starting daemon process: 'None (errno=None)'

[fail]

root@node1:~# nano /proc/sys/fs/inotify/max_user_watches

root@node1:~# reboot

Guido Trotter

unread,

Jan 23, 2013, 3:09:21 AM1/23/13

to gan...@googlegroups.com

On Wed, Jan 23, 2013 at 7:20 AM, gertk <gkar...@gmail.com> wrote:
> Last login: Wed Jan 23 07:42:00 2013 from 10.100.201.5
> root@node1:~# gnt-instance add -o snf-image+default --os-parameters
> img_passwd=Password,img_format=extdump,img_id=debian_base-6.0-5-x86_64 -t
> plain --disk=0:size=10G --net 0:link=prv5 testvm1
> Unhandled protocol error while talking to the master daemon:
> Caught exception: Cannot initialize new instance of inotify, Errno=Too many
> open files (EMFILE)

Could you check which processes had the inotify file descriptors open, here?

And here.

> root@node1:~# nano /proc/sys/fs/inotify/max_user_watches
> root@node1:~# reboot
>

Thanks!

Guido

gertk

unread,

Jan 23, 2013, 6:15:03 AM1/23/13

to gan...@googlegroups.com

kolmapäev, 23. jaanuar 2013 10:09.21 UTC+2 kirjutas Guido Trotter:

On Wed, Jan 23, 2013 at 7:20 AM, gertk <gkar...@gmail.com> wrote:
> Last login: Wed Jan 23 07:42:00 2013 from 10.100.201.5
> root@node1:~# gnt-instance add -o snf-image+default --os-parameters
> img_passwd=Password,img_format=extdump,img_id=debian_base-6.0-5-x86_64 -t
> plain --disk=0:size=10G --net 0:link=prv5 testvm1
> Unhandled protocol error while talking to the master daemon:
> Caught exception: Cannot initialize new instance of inotify, Errno=Too many
> open files (EMFILE)

Could you check which processes had the inotify file descriptors open, here?

I would like to , but I do not know anything about inotify and how to check inotify processes.

How can I check them ?

root@node1:~# find /proc/ -type l -exec ls -l {} \; 2>/dev/null | grep anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1/task/1/fd/5 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1/task/1/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1/fd/5 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/434/task/434/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/434/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1224/task/1224/fd/5 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/1224/fd/5 -> anon_inode:inotify
lr-x------ 1 whoopsie whoopsie 64 Jan 23 13:09 /proc/1482/task/1482/fd/3 -> anon_inode:inotify
lr-x------ 1 whoopsie whoopsie 64 Jan 23 13:09 /proc/1482/task/1545/fd/3 -> anon_inode:inotify
lr-x------ 1 whoopsie whoopsie 64 Jan 23 13:09 /proc/1482/fd/3 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2599/task/2599/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2599/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2914/task/2914/fd/7 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2914/task/2914/fd/8 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2914/fd/7 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2914/fd/8 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2943/task/2943/fd/7 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2943/task/2943/fd/8 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2943/fd/7 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/2943/fd/8 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/3708/task/3708/fd/6 -> anon_inode:inotify
lr-x------ 1 root root 64 Jan 23 13:09 /proc/3708/fd/6 -> anon_inode:inotify

gertk

unread,

Jan 24, 2013, 4:00:53 PM1/24/13

to gan...@googlegroups.com

Hello Valen,

what kernel version you use ? After downgrading kernel to 3.2.0-31-generic on my test node today morning I have no inotify errors (at least I have not seen them yet). With kernel 3.2.0-36-generic I got errors after every hour.

I've been using Ubuntu 12.04 (3.2.0-31-generic) with Ganeti 2.5.2 on two node cluster about half year and have not seen inotify errors.

Regards,

Gert

laupäev, 19. jaanuar 2013 3:50.25 UTC+2 kirjutas Valen:

Jake Anderson

unread,

Jan 24, 2013, 4:38:35 PM1/24/13

to gan...@googlegroups.com, gertk

3.2.0-36-generic #57-Ubuntu SMP Tue Jan 8 21:44:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linu
I did a format and re-install it seemed to work
it was fairly painless actually
migrate
re-install
--readd
and migrate stuff back

all the lvm disks just needed to update rather than starting over it was quite nice.

--

jake anderson

unread,

Jan 24, 2013, 9:43:09 PM1/24/13

to gan...@googlegroups.com

Ok, I just saw this happen on the secondary node which I had promoted to master running .36
i'm going to evacuate it and see if i can break it reliably

--

jake anderson

unread,

Jan 24, 2013, 9:51:46 PM1/24/13

to gan...@googlegroups.com

found the blighter
its a kernel bug by the look of it
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1101666

On 25/01/2013 8:00 AM, gertk wrote:

--

gertk

unread,

Jan 25, 2013, 3:19:17 AM1/25/13

to gan...@googlegroups.com

I don't like such kernel updates/upgrades , I hope they will fix it. :(

Jonathan Bayer

unread,

Jan 25, 2013, 9:31:29 AM1/25/13

to gan...@googlegroups.com, jake anderson

Is this specific to ubuntu, or is it a generic bug?

any idea if it is on any RedHat kernels?

JBB

gertk

unread,

Jan 25, 2013, 11:32:35 AM1/25/13

to gan...@googlegroups.com, jake anderson

Lino Sanfilippo (linosanfilippo) wrote 10 hours ago:

#39

First, thanks to Eugene and Adar for reporting and investigation.
I have had some time now to debug this and I found the reason for the bug. Obviously something went wrong during
the port of the patch series mentioned above from mainline to ubuntu kernel:
There is the function fsnotify_destroy() which is never called in ubuntu. But this function ensures
that all pending events are flushed and thereby ref counts on a fsnotify group held by those events are released.
So what has to be done is call fsnotify_destroy in inotify_release(). Otherwise there will always be references held to
the inotify group and the group will never get destroyed - which sooner or later results in a number of alive groups that
exceeds the allowed max number.
The same flaw can be found in the fanotify code. I will attach a patch that should fix the ref counts for both inotify and fanotify.

gertk

unread,

Jan 25, 2013, 11:42:13 AM1/25/13

to gan...@googlegroups.com, jake anderson

Oh, accidentally sent last email before I wanted.

I think it's specific to ubuntu.

This is excerpt from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1101666 :

Reply all

Reply to author

Forward