I have a normal resource definition to ensure the user ‘steves’ does not exist. On one puppet client, even though the user definitely does not exist, puppet still tries to remove it giving this error:
change from present to absent failed: Could not delete user steves: Execution of ‘/usr/sbin/userdel steves’ returned 6: userdel: user steves does not exist
user { steves: ensure=>absent; }
Why is the puppet agent still trying to delete the user when it definitely is not there? I have checked /etc/shadow, /etc/passwd and /etc/group and it is not mentioned in any of them.
This also affects another user, but not every one! I have a list of about six that are deleted if found on all hosts, but on this host only two of them result in this error.
Puppet agent is running on Linux RHEL5 32bit.
You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account
Further tests show this happens when the /etc/shadow file ends without a final \n (at least, the problem went away when I added one), though this happens when the normal tools are used to add a new user. Not sure why it should cause puppet to this these users exist. Also, adding a new user locally may well end up in the same situation again.
Steve Shipway wrote:
Further tests show this happens when the /etc/shadow file ends without a final \n (at least, the problem went away when I added one), though this happens when the normal tools are used to add a new user. Not sure why it should cause puppet to this these users exist. Also, adding a new user locally may well end up in the same situation again.
Ah. That probably reveals the cause of the trouble: you had a corrupt database file, which presumably lead to NSS returning the user from getpwent or equivalent, but the tool that puppet invokes failing to delete the entry. That would lead to exactly this pattern of failure.
Out of curiosity, were you running a long-running puppet daemon there, or was this exiting and starting a new puppet process every run where that problem happened?
The problem occurs with both a long-running daemon, and if I run from the command line (puppet agent -t). I’ve tried restarting the daemon a few times, too, though that doesn’t fix it.
The problem is not that the user exists and puppet is unable to delete it, but more that the user DOESNT exist, and puppet THINKS it does, so tries to delete it (and fails because it doesnt exist…)
The ‘steves’ user doesn’t exist anywhere that I can see, and all the OS commands (adduser, deluser, usermod etc) all agree the file is fine and the user doesnt exist. I would not count this as a corrupt file. However puppet seems to think that it does, but only when there is no terminating newline in the file.
Since the passwd file is only manipulated by OS tools this could well happen other times. I believe that puppet should correctly handle this, and certainly default to ‘user does not exist’ rather than ‘user exists’.
Steve Shipway wrote:
The problem occurs with both a long-running daemon, and if I run from the command line (puppet agent -t). I’ve tried restarting the daemon a few times, too, though that doesn’t fix it.
Thanks. That helps confirm my expectation that…
The problem is not that the user exists and puppet is unable to delete it, but more that the user DOESNT exist, and puppet THINKS it does, so tries to delete it (and fails because it doesnt exist…)
…this is a problem with the NSS portion of your C library. Puppet delegates directly to the get*nam family of functions to determine if the named entity exists. In your case the system is convinced that it does – and by validating that this occurs even if you restart the Puppet agent, or run it on the command line, you confirmed that it isn’t cached inside the Puppet process.
(Technically, we delegate to get*nam in the Ruby Etc module, but that doesn’t do much beyond delegate to NSS as far as I know.)
The ‘steves’ user doesn’t exist anywhere that I can see, and all the OS commands (adduser, deluser, usermod etc) all agree the file is fine and the user doesnt exist. I would not count this as a corrupt file. However puppet seems to think that it does, but only when there is no terminating newline in the file.
libc does; the external tools you reference presumably manipulate the databases directly (which is reasonable, because libc / POSIX have no standard write operations for these databases), and their implementation behaves differently.
I wonder if no-such-user would also be considered valid by Puppet when this problem was in place. :)
Since the passwd file is only manipulated by OS tools this could well happen other times. I believe that puppet should correctly handle this, and certainly default to ‘user does not exist’ rather than ‘user exists’.
We trust the NSS portion of the OS, because we pretty much have to: anything else will miss users that exist out of, eg, LDAP or other data sources configured in the NSS.
I couldn’t replicate this issue by removing the newline at the end of /etc/shadow on a Debian 5.0 VM with a fresh installation of Puppet Enterprise 2.0.0. Steve, are you able to put together a reproducible case? Here’s what I did:
agent2:~# puppet --version
2.7.6 (Puppet Enterprise 2.0.0)
agent2:~# ls -l /etc/shadow*
-rw-r--r-- 1 root root 730 2011-12-07 22:38 /etc/shadow
-rw------- 1 root root 730 2011-12-07 22:38 /etc/shadow-
agent2:~# diff /etc/shadow /etc/shadow-
agent2:~# head -c 729 /etc/shadow > /etc/shadow.nonewline
agent2:~# diff /etc/shadow /etc/shadow.nonewline
26c26
< pe-puppet:*:15315:0:99999:7:::
---
> pe-puppet:*:15315:0:99999:7:::
\ No newline at end of file
agent2:~# mv /etc/shadow.nonewline /etc/shadow
agent2:~# cp /etc/shadow /etc/shadow-
agent2:~# ls -l /etc/shadow*
-rw-r--r-- 1 root root 729 2011-12-07 22:41 /etc/shadow
-rw------- 1 root root 729 2011-12-07 22:42 /etc/shadow-
agent2:~# puppet resource user no_such_user ensure=absent
user { 'no_such_user':
ensure => 'absent',
}
agent2:~#
I’m not able to replicate at will, and strangely it doesn’t seem to affect ALL ‘nonexistent’ users. The ‘steves’ user that it incorrectly thinks exists used to exist once but was deleted (using the OS tools). Another nonexistent user doesnt produce this problem.
Since puppet uses the OS calls to make its decistions, and this only seems to affect RHEL4 and RHEL5 (but not RHEL6), I’m willing to agree that it is not puppet’s fault but the OS library’s fault, though which function call is specifically used I’m not sure! Possibly there remains some vestige of the previously deleted user in a file somewhere but I wouldn’t know where (passwd, shadow and group are all clear)
I did a cursory search for a relevant RHEL bug, but didn’t come up with anything. If you find one, please reopen this ticket and add the URL to track.
Steve Shipway wrote:
I’m not able to replicate at will, and strangely it doesn’t seem to affect ALL ‘nonexistent’ users. The ‘steves’ user that it incorrectly thinks exists used to exist once but was deleted (using the OS tools). Another nonexistent user doesnt produce this problem.
Since puppet uses the OS calls to make its decistions, and this only seems to affect RHEL4 and RHEL5 (but not RHEL6), I’m willing to agree that it is not puppet’s fault but the OS library’s fault, though which function call is specifically used I’m not sure! Possibly there remains some vestige of the previously deleted user in a file somewhere but I wouldn’t know where (passwd, shadow and group are all clear)
Literally, getpwnam down in the libc NSS component is what you are after there; if that returns data for the user we agree that the user exists. Ditto groups and all.