On Sun, 2021-12-26 at 20:50 -0800, Reik Reid wrote:
> Ah, I think I understand the situation better now. All the error
> messages are generated by glibc code and there is no way to control
> them.
Not *exactly*, the error _messages_ are still generated by bup, but the
_error_ is in the underlying code.
Now, mind you, all I've been saying is sort of still conjecture, but I
think the reason for the error messages is that getpwuid() returns an
error, with errno==ENOENT. In bup's C code/python integration that turns
into an IOError with code==ENOENT, which gets thrown by the C code
returning NULL. Then bup's python code catches the exception and prints
the error message(s).
> I think one can make the error messages go away by having
>
> # /etc/nsswitch.conf
> passwd: compat systemd
> AS OPPOSED TO
> passwd: systemd compat
>
> When systemd sssd.service is tried first, but is not running or
> properly configured, it generates the error, then getpwuid() falls
> back on "compat" which I guess means "look up using /etc/passwd
> directly".
I think though this would mean "prefer looking up directly", which is
not what you want? Or maybe it is?
If this is the case though, then that means that the sssd-nss
integration into NSS is broken in some way though, because shouldn't a
fallback happen if you can't figure it out? OTOH, maybe the intent was
that the fallback should only happen if you can't find it, but now you
can't even reach the service that's supposed to resolve it ...
> > I suppose bup might try to do something like "getpwuid(0)" and if
> > that
> > even fails (as I believe it did in your setup) then perhaps we
> > should
> > just give up.
>
> Yeah, that might be reasonable, but I'm not even sure the error is
> detectable to bup because the fallback to "compat" is then used. But
> if you have a way of detecting the error then perhaps an error
> message indicating that sssd.service is not working would be helpful,
> with a hint about maybe changing /etc/nsswitch.conf as a workaround.
Well clearly bup _was_ getting the error (see my explanation above), so
it should work. The question is if that's really even sufficient,
because it's possible that 0 is hardcoded anyway to bypass sssd, etc.?
IOW, trying to detect it based on a special UID like 0 might not be a
good idea, but trying to detect it with any other random UID might be
rather fragile too?
> I see how this is tricky. I can't help but think that glibc.getpwuid()
> ought at the very least produce a meaningful error message, and also a
> meaningful error/warning code. Something like: "getpwuid() failed when
> attempting to use sssd.service, falling back to plain /etc/passwd
> lookup. Please investigate /etc/nsswitch.conf and the systemd
> sssd.service setup". I'm already significantly outside the limits of
> my expertise so I will leave it to the experts to ponder what can be
> done about glibc.
That'd be nice I guess, but given that we just get an errno and glibc
isn't usually in the business of printing anything, I don't even really
see how this is feasible, unfortunately.
Ultimately, I'm somewhat surprised you even got into this situation,
because if your user/group mechanics in the system isn't working ... how
did it even let you log in etc.?
johannes