Issue 218 in ganeti: errors running ganeti jobs if /proc/sys/fs/inotify/max_user_watches is too low

1 view
Skip to first unread message

gan...@googlecode.com

unread,
Feb 24, 2012, 8:32:36 AM2/24/12
to ganeti...@googlegroups.com
Status: New
Owner: ----

New issue 218 by poberhol...@google.com: errors running ganeti jobs if
/proc/sys/fs/inotify/max_user_watches is too low
http://code.google.com/p/ganeti/issues/detail?id=218

What software version are you running? Please provide the output
of "gnt-cluster --version" and "gnt-cluster version".
Software version: 2.4.5
Internode protocol: 2040000
Configuration format: 2040000
OS api version: 20
Export interface: 0

What distribution are you using?
Debian Squeeze 6.0

What steps will reproduce the problem?
1. lower inotify max_user_watches
(echo 4000 > /proc/sys/fs/inotify/max_user_watches)
2. run gnt-cluster verify
(if no error appears, lower max_user_watches even more)

What is the expected output? What do you see instead?
Ganeti should warn about a too low max_user_watches
error reported instead:
Error checking job status: Job with id 6311 lost

Warning that inotifywatch /tmp reports if the number is too low:
Failed to watch /tmp/; upper limit on inotify watches reached!
Please increase the amount of inotify watches allowed per user via
`/proc/sys/fs/inotify/max_user_watches'.

Please provide any additional information below.
I have Crashplan installed on the same machine as Ganeti. Crashplan uses
inotify as well and already increased the default of 8192 to 16384 in
/etc/sysctl.conf.

gan...@googlecode.com

unread,
Feb 27, 2012, 11:39:37 AM2/27/12
to ganeti...@googlegroups.com

Comment #1 on issue 218 by han...@google.com: errors running ganeti jobs if
/proc/sys/fs/inotify/max_user_watches is too low
http://code.google.com/p/ganeti/issues/detail?id=218

Can Crashplan exclude a specific directory? Excluding /var/lib/ganeti/queue
would seem sensible. There's a lot of I/O with atomically replaced files
going on.

gan...@googlecode.com

unread,
Mar 1, 2012, 4:15:26 AM3/1/12
to ganeti...@googlegroups.com

Comment #2 on issue 218 by poberhol...@google.com: errors running ganeti
jobs if /proc/sys/fs/inotify/max_user_watches is too low
http://code.google.com/p/ganeti/issues/detail?id=218

Crashplan can exclude that directory, yes. But it doesn't help the fact
that it will crash when max_user_watches is set too low.

gan...@googlecode.com

unread,
Mar 1, 2012, 6:07:03 AM3/1/12
to ganeti...@googlegroups.com
Updates:
Status: Accepted
Owner: han...@google.com

Comment #3 on issue 218 by han...@google.com: errors running ganeti jobs if

/proc/sys/fs/inotify/max_user_watches is too low
http://code.google.com/p/ganeti/issues/detail?id=218

Yes, we will look into improving error reporting and handling.

gan...@googlecode.com

unread,
Dec 13, 2012, 6:49:52 AM12/13/12
to ganeti...@googlegroups.com
Updates:
Status: Fixed

Comment #6 on issue 218 by han...@google.com: errors running ganeti jobs if
/proc/sys/fs/inotify/max_user_watches is too low
http://code.google.com/p/ganeti/issues/detail?id=218

commit 383477e
Author: Michael Hanselmann <han...@google.com>
Date: Mon Dec 10 14:32:13 2012 +0100

jqueue: Improve inotify error reporting

This addresses issue 218. When the number of inotify watches is
exhausted, for example by being set too low from the beginning or by
other programs, waiting for a job to change would just report a lost job
(e.g. “Error checking job status: Job with id 7817 lost”).

This patch changes the job watcher to no longer catch
“errors.InotifyError” and, this is by far the larger part of this patch,
adds unittests for this situation.

Signed-off-by: Michael Hanselmann <han...@google.com>
Reviewed-by: Guido Trotter <ultr...@google.com>


Reply all
Reply to author
Forward
0 new messages