bup-0.32 runtime problem with python3.9

44 views
Skip to first unread message

Reik Reid

unread,
Dec 18, 2021, 3:04:46 AM12/18/21
to bup-...@googlegroups.com
Hello all,

I compiled+installed bup-0.32 on ubuntu 21.04 with PYTHON=python3.9 specified. I get a runtime error as follows

 bup -d /extra-im2/bup-linux save -n root-ub20.r3600.`date +%Y-%m%d` /root-ub20 | & tee log.bup.save.root-ub20
Traceback (most recent call last):
  File "/usr/local/lib/bup/cmd/bup-save", line 307, in <module>
    for (transname,ent) in r.filter(extra, wantrecurse=wantrecurse_during):
  File "/usr/local/lib/bup/cmd/../bup/index.py", line 499, in filter
    assert(pe)
AssertionError
Command exited with non-zero status 1
0.18user 0.02system 0:00.20elapsed 99%CPU (0avgtext+0avgdata 36864maxresident)k
0inputs+0outputs (0major+12365minor)pagefaults 0swaps

Any ideas what is going on here?  I did not have much better luck with bup-0.31, it did not compile.

Thx.

Johannes Berg

unread,
Dec 18, 2021, 3:58:05 AM12/18/21
to Reik Reid, bup-...@googlegroups.com
On Sat, 2021-12-18 at 00:04 -0800, Reik Reid wrote:
>
>  bup -d /extra-im2/bup-linux save -n root-ub20.r3600.`date +%Y-%m%d`
> /root-ub20 | & tee log.bup.save.root-ub20
> Traceback (most recent call last):
>   File "/usr/local/lib/bup/cmd/bup-save", line 307, in <module>
>     for (transname,ent) in r.filter(extra,
> wantrecurse=wantrecurse_during):
>   File "/usr/local/lib/bup/cmd/../bup/index.py", line 499, in filter
>     assert(pe)
> AssertionError

I think that just means "/root-ub20" doesn't exist in your index?

Also I think I have a patch somewhere that makes that a bit clearer, but
I'd have to check and probably see if it really makes it much clearer
...

(I remember that I didn't want to see assertions due to some new checks,
and that's why I changed this)

johannes

Reik Reid

unread,
Dec 18, 2021, 8:07:39 AM12/18/21
to Johannes Berg, bup-...@googlegroups.com
You;re right, I  had forgotten to run first the index command

 bup -d /extra-im2/bup-linux index -ux /root-ub20/

However, when I do run the index command, I get a huge number of error messages of the type
[Errno 2] No such file or directory. The output ends like this:

...
...
[Errno 2] No such file or directory                                                                                                                    
[Errno 2] No such file or directory                                                                                                                    
Indexing: 829381, done (7504 paths/s).                                                                                                                  
bup: merging indexes (2079633/2079633), done.                                                                                                          
WARNING: 33416 errors encountered.                                                                                                                      

Command exited with non-zero status 1
134.40user 17.51system 3:10.84elapsed 79%CPU (0avgtext+0avgdata 487860maxresident)k
888992inputs+847288outputs (1807major+75515minor)pagefaults 0swaps
134.406u 17.515s 3:10.84 79.6%  0+0k 888992+847288io 1807pf+0w

Maybe the error is that I used -u since this was the first time I indexed the path /root-ub20/ in this repository? If I repeat the index -ux I get
the same amount of errors, so maybe those are files that are in the index from some other path (say /root-fc25, note to self) but do not exist in /root-ub20?? If the errors are essentially harmless, is there a way to get rid of the error messages?






Reik Reid

unread,
Dec 18, 2021, 11:56:20 AM12/18/21
to Johannes Berg, bup-...@googlegroups.com
I tried strace as follows

% bup -d /extra-im2/bup-linux index -ux /root-ub20/ &
% strace -p PID |& tee log.strace

The log.strace contains a lot of stuff, but in particular many repetitions of these lines
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
connect(14, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = -1 ENOENT (No such file or directory)
write(2, "[Errno 2] No such file or direct"..., 71) = 71

How many repetitions? Well, to get a rough idea:

 % grep "No such file" log.strace | wc
  13569  152698 1278951
 % grep /var/lib/sss/mc/passwd log.strace | wc
   6460   71060  665380
% grep /var/lib/sss/pipes/nss log.strace | wc
   3406   40872  384878

Does that explain anything to you? If you need the whole log.strace I can probably upload it somewhere./

Thanks
RR

On Sat, Dec 18, 2021 at 5:12 AM Johannes Berg <joha...@sipsolutions.net> wrote:
I need to check, not sure off the top of my head. If I find time tonight I'll take a look.

Nothing indicated which file wasn't found, or what it was processing?

Maybe running the index command under strace might give you a hint, but not sure if that makes it really slow.

johannes
--
Sent from my phone.

Reik Reid

unread,
Dec 24, 2021, 11:47:04 AM12/24/21
to bup-list
I mailed 2-3 additional emails with useful info on this subject, including a more firm diagnosis and a workaround, but those email do not appear here in the mailing list!! Should I repost them from the web interface? Can the admin try and figure out why some emails went missing?

Johannes Berg

unread,
Dec 24, 2021, 4:47:55 PM12/24/21
to Reik Reid, bup-list
On Fri, 2021-12-24 at 08:47 -0800, Reik Reid wrote:
> I mailed 2-3 additional emails with useful info on this subject,
> including a more firm diagnosis and a workaround, but those email do
> not appear here in the mailing list!! Should I repost them from the
> web interface? Can the admin try and figure out why some emails went
> missing?


Oh. I thought at least some of those went to me privately only? :)

Anyway. I'll reply over on the other email - I have an idea of what this
is, just didn't really have a chance to reply.

johannes

Johannes Berg

unread,
Dec 24, 2021, 4:54:20 PM12/24/21
to Reik Reid, bup-...@googlegroups.com
Sorry for the late reply - Christmas time I guess ...

On Sat, 2021-12-18 at 08:56 -0800, Reik Reid wrote:
> I tried strace as follows
>
> % bup -d /extra-im2/bup-linux index -ux /root-ub20/ &
> % strace -p PID |& tee log.strace
>
> The log.strace contains a lot of stuff, but in particular many
> repetitions of these lines
> openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1
> ENOENT (No such file or directory)
> openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1
> ENOENT (No such file or directory)
> connect(14, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"},
> 110) = -1 ENOENT (No such file or directory)
> write(2, "[Errno 2] No such file or direct"..., 71) = 71
>
> How many repetitions? Well, to get a rough idea:
>
>  % grep "No such file" log.strace | wc
>   13569  152698 1278951
>  % grep /var/lib/sss/mc/passwd log.strace | wc
>    6460   71060  665380
> % grep /var/lib/sss/pipes/nss log.strace | wc
>    3406   40872  384878
>
> Does that explain anything to you? If you need the whole log.strace I
> can probably upload it somewhere./

I think this basically has to be getpwnam() or getpwuid(), I think in
index we'd load metadata, so getpwuid().

Now, why does your libc do that? I don't know! And why does it go away
only with a recompile (as you reported), I don't know either...

I guess we could cache errors (i.e. if looking up uid 0 resulted in an
error, don't try again but assume it still will result in an error next
time - we cache only non-error cases today), but ultimately I think if
your name/uid resolver is broken, you probably _want_ to be alerted to
that fact rather loudly?

johannes

Reik Reid

unread,
Dec 25, 2021, 11:11:26 AM12/25/21
to bup-list
(repost lost email number1)

Reik Reid
Sat, Dec 18, 10:34 AM (7 days ago)

to Johannes, bup-list
Hmm, I get a sinking feeling that the error messages are somehow related to systemd. See below log.strace snippet for some context . I just upgraded from ubuntu 20.04.3 to 21.04, and in the process I probably chose to keep my existing systemd config. Which broke something?

Where there is smoke, there is fire, again see below. Anyone know what the actual problem is? It is amazing the ability that systemd has to cost you a day of work. But let's not get into that particular topic any further right now.

Please take a look and see if anyone understands what I have to do to fix it.
RR

Reik Reid

unread,
Dec 25, 2021, 11:12:53 AM12/25/21
to bup-list
(repost of lost mail number2)

Reik Reid
Dec 18, 2021, 1:33 PM (7 days ago

to Johannes, bup-list
UPSHOT: bup-0.32 one way or the other wants to use sssd if there is some indication (at compile time apparently) that ssd exists on your machine. This leads to all kinds of trouble and voluminous  error messages if sssd exists but does not work as required.

RECOMMENDATION: It would be considerably better if bup somewhat quietly did a fallback to whatever alternative method it can use when sssd fails, rather than keep trying thouands of times and reporting errors. There ought to be ONE understandable error message, as well.

THE LONG STORY VERSION:

Alright, so I am slowly coming to the conclusion that bup-0.32 and/or python3.9 or some library  assumes that service.sssd is running (service.sssd is not running on my machine, apparently due to lack of configuration) . This leads to all the error messages.

So, the question is, how can I tell bup (bup-index and bup-save in particular) NOT to try and use service.sssd and not to  assume the existence of related pipe files? The dependency on the existence of pipe files such as

"/var/lib/sss/pipes/nss"
"/var/lib/sss/mc/passwd"

is generating thousand of indistinct error messages that all say


[Errno 2] No such file or directory     
[Errno 2] No such file or directory 

I can't find the word sssd anywhere in the bup source code or configuration files. My ubuntu 20.04.3 does not even have sssd installed, and bup worked fine there using some other mechanism (maybe it just reads /etc/passwd for user data, I dunno).

What's the solution here? I went as far as uninstalling sssd from the 21.04 machine,

% apt remove sssd

and recompiling bup-0.32-gitclone, but the problem is not going away. Okay, try one more thing: AUTOREMOVE, as recommended by apt remove sssd output:

% apt autoremove  
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
  ldap-utils libbasicobjects0 libcollection4 libdhash1 libini-config5 libipa-hbac0 libnfsidmap2 libnss-sss libpath-utils1 libref-array1 libsss-certmap0
  libsss-idmap0 libsss-nss-idmap0 python3-sss sssd-ad sssd-ad-common sssd-common sssd-ipa sssd-krb5 sssd-krb5-common sssd-ldap sssd-proxy
0 upgraded, 0 newly installed, 22 to remove and 0 not upgraded.
After this operation, 9121 kB disk space will be freed.
Do you want to continue? [Y/n]
(Reading database ... 498232 files and directories currently installed.)
Removing ldap-utils (2.4.57+dfsg-2ubuntu1) ...
Removing sssd-proxy (2.4.0-1ubuntu6.1) ...
Removing sssd-ldap (2.4.0-1ubuntu6.1) ...
Removing sssd-ad (2.4.0-1ubuntu6.1) ...
Removing sssd-krb5 (2.4.0-1ubuntu6.1) ...
Removing sssd-ipa (2.4.0-1ubuntu6.1) ...
Removing libipa-hbac0 (2.4.0-1ubuntu6.1) ...
Removing libnss-sss:amd64 (2.4.0-1ubuntu6.1) ...
Checking NSS setup...
Removing sssd-ad-common (2.4.0-1ubuntu6.1) ...
Removing sssd-krb5-common (2.4.0-1ubuntu6.1) ...
Removing sssd-common (2.4.0-1ubuntu6.1) ...
Removing libini-config5:amd64 (0.6.1-2build1) ...
Removing libbasicobjects0:amd64 (0.6.1-2build1) ...
Removing libcollection4:amd64 (0.6.1-2build1) ...
Removing libdhash1:amd64 (0.6.1-2build1) ...
Removing libnfsidmap2:amd64 (0.25-6) ...
Removing libpath-utils1:amd64 (0.6.1-2build1) ...
Removing libref-array1:amd64 (0.6.1-2build1) ...
Removing libsss-certmap0 (2.4.0-1ubuntu6.1) ...
Removing libsss-idmap0 (2.4.0-1ubuntu6.1) ...
Removing libsss-nss-idmap0 (2.4.0-1ubuntu6.1) ...
Removing python3-sss (2.4.0-1ubuntu6.1) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.33-0ubuntu5) ...
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/libpng15.so.15 is not a symbolic link

16.853u 7.914s 0:24.94 99.2%    0+0k 23952+147648io 212pf+0w

Then recompile bup-0.32-gitclone,AND NOW IT WORKS. No more error messages.

CONCLUSION: bup-0.32 one way or the other wants to use sssd if there is some indication (at compile time apparently) that it exists. This leads to all kinds of trouble.

Johannes Berg

unread,
Dec 25, 2021, 5:02:32 PM12/25/21
to Reik Reid, bup-list
So looking at this again now that you reposted (I saw this before, even
if it never made it to the list) ...

> The following packages will be REMOVED:
>   ldap-utils libbasicobjects0 libcollection4 libdhash1 libini-config5
> libipa-hbac0 libnfsidmap2 libnss-sss libpath-utils1 libref-array1
> libsss-certmap0
>   libsss-idmap0 libsss-nss-idmap0 python3-sss sssd-ad sssd-ad-common

I can't really say for sure, but I suspect you had some issues with
libnss-sss or libsss-nss-idmap0 or such, with those configuring sss into
the nss system, and thus bup using it.

I'm not sure why that would've required a _rebuild_ of bup, however, but
maybe you just didn't try without a rebuild, i.e. it may have been fixed
just changing the nss config?


I'm still mulling over whether we should cache errors from the lookups
(per UID of course), you're almost certainly better off having noticed
all these issues, even if it was quite annoying. It might've been worse
to not be able to restore user/group ownership properly (only by IDs,
not by names) ...

johannes

Reik Reid

unread,
Dec 26, 2021, 11:39:13 AM12/26/21
to Johannes Berg, bup-list
In the big picture, It is near impossible for a regular user to tell whether the hailstorm of errors are indicative of a complete failure of bup-save or just an annoyance, as you say. It gravely undermines trust in bup backups when one cannot tell whether an error is fatal or not, or even what the actual problem is.

Given that bup (or some python library used) clearly falls back to some alternate mechanism than sssd when sssd is not present in the system, I think bup also should fall back on the alternate method if sssd is just not running, or if calls to sssd generate errors.

In other words, I think the problem observed should be solved by proper error checking, and switching gracefully to the fallback method if the sssd-based method is failing. Maybe that is what you are already saying, Johannes :-)

PS: (Perhaps not so important in the big picture) I agree it is possible that the rebuiid after the auto-remove of all the sss-related packages didn't actually change the bup code. Quite possibly some code detects that sssd is not present and proceeds to the fallback method.





Johannes Berg

unread,
Dec 26, 2021, 1:22:22 PM12/26/21
to Reik Reid, bup-list
On Sun, 2021-12-26 at 08:39 -0800, Reik Reid wrote:
> In the big picture, It is near impossible for a regular user to tell
> whether the hailstorm of errors are indicative of a complete failure
> of bup-save or just an annoyance, as you say. It gravely undermines
> trust in bup backups when one cannot tell whether an error is fatal or
> not, or even what the actual problem is.
>
> Given that bup (or some python library used) clearly falls back to
> some alternate mechanism than sssd when sssd is not present in the
> system, I think bup also should fall back on the alternate method if
> sssd is just not running, or if calls to sssd generate errors.
>
> In other words, I think the problem observed should be solved by
> proper error checking, and switching gracefully to the fallback method
> if the sssd-based method is failing. Maybe that is what you are
> already saying, Johannes :-)

I agree there's a problem - but I'm not sure we _can_ do much about it,
we really just call getpwuid(). We do that in C code now, not python,
but if that returns an error we don't really have much else to try...

What I was trying to say is that maybe the flurry of errors is actually
better than only printing an error once, since it more clearly indicates
that you have a problem.

It is, however, pretty difficult to diagnose?

I suppose bup might try to do something like "getpwuid(0)" and if that
even fails (as I believe it did in your setup) then perhaps we should
just give up?

> PS: (Perhaps not so important in the big picture) I agree it is
> possible that the rebuiid after the auto-remove of all the sss-related
> packages didn't actually change the bup code. Quite possibly some code
> detects that sssd is not present and proceeds to the fallback method.

No, I'm pretty sure there's no such code - the detection seems like it
must be in (g)libc, and thus behind getpwuid(), and thus no impact on
the code in bup, I'd think.

johannes

Reik Reid

unread,
Dec 26, 2021, 11:50:19 PM12/26/21
to Johannes Berg, bup-list
Ah, I think I understand the situation better now. All the error messages are generated by glibc code and there is no way to control them.
I think one can make the error messages go away by having

# /etc/nsswitch.conf
passwd:         compat systemd
AS OPPOSED TO
passwd:         systemd compat

When systemd sssd.service is tried first, but is not running or properly configured, it generates the error, then getpwuid() falls back on "compat" which I guess means "look up using /etc/passwd directly".

>I suppose bup might try to do something like "getpwuid(0)" and if that
>even fails (as I believe it did in your setup) then perhaps we should
>just give up.

Yeah, that  might be reasonable, but I'm not even sure the error is detectable to bup because the fallback to "compat" is then used. But if you  have a way of detecting the error then perhaps an error message indicating that sssd.service is not working would be helpful, with a hint about maybe changing /etc/nsswitch.conf as a workaround.

I see how this is tricky. I can't help but think that glibc.getpwuid() ought at the very least produce a meaningful error message, and also a meaningful error/warning code. Something like: "getpwuid() failed when attempting to use sssd.service, falling back to plain /etc/passwd lookup. Please investigate /etc/nsswitch.conf and the systemd sssd.service setup". I'm already significantly outside the limits of my expertise so I will leave it to the experts to ponder what can be done about glibc.

RR

Johannes Berg

unread,
Dec 27, 2021, 4:47:04 PM12/27/21
to Reik Reid, bup-list
On Sun, 2021-12-26 at 20:50 -0800, Reik Reid wrote:
> Ah, I think I understand the situation better now. All the error
> messages are generated by glibc code and there is no way to control
> them.

Not *exactly*, the error _messages_ are still generated by bup, but the
_error_ is in the underlying code.

Now, mind you, all I've been saying is sort of still conjecture, but I
think the reason for the error messages is that getpwuid() returns an
error, with errno==ENOENT. In bup's C code/python integration that turns
into an IOError with code==ENOENT, which gets thrown by the C code
returning NULL. Then bup's python code catches the exception and prints
the error message(s).

> I think one can make the error messages go away by having
>
> # /etc/nsswitch.conf
> passwd:         compat systemd
> AS OPPOSED TO
>  passwd:         systemd compat
>
> When systemd sssd.service is tried first, but is not running or
> properly configured, it generates the error, then getpwuid() falls
> back on "compat" which I guess means "look up using /etc/passwd
> directly".

I think though this would mean "prefer looking up directly", which is
not what you want? Or maybe it is?

If this is the case though, then that means that the sssd-nss
integration into NSS is broken in some way though, because shouldn't a
fallback happen if you can't figure it out? OTOH, maybe the intent was
that the fallback should only happen if you can't find it, but now you
can't even reach the service that's supposed to resolve it ...


> > I suppose bup might try to do something like "getpwuid(0)" and if
> > that
> > even fails (as I believe it did in your setup) then perhaps we
> > should
> > just give up.
>
> Yeah, that  might be reasonable, but I'm not even sure the error is
> detectable to bup because the fallback to "compat" is then used. But
> if you  have a way of detecting the error then perhaps an error
> message indicating that sssd.service is not working would be helpful,
> with a hint about maybe changing /etc/nsswitch.conf as a workaround.

Well clearly bup _was_ getting the error (see my explanation above), so
it should work. The question is if that's really even sufficient,
because it's possible that 0 is hardcoded anyway to bypass sssd, etc.?
IOW, trying to detect it based on a special UID like 0 might not be a
good idea, but trying to detect it with any other random UID might be
rather fragile too?

> I see how this is tricky. I can't help but think that glibc.getpwuid()
> ought at the very least produce a meaningful error message, and also a
> meaningful error/warning code. Something like: "getpwuid() failed when
> attempting to use sssd.service, falling back to plain /etc/passwd
> lookup. Please investigate /etc/nsswitch.conf and the systemd
> sssd.service setup". I'm already significantly outside the limits of
> my expertise so I will leave it to the experts to ponder what can be
> done about glibc.

That'd be nice I guess, but given that we just get an errno and glibc
isn't usually in the business of printing anything, I don't even really
see how this is feasible, unfortunately.


Ultimately, I'm somewhat surprised you even got into this situation,
because if your user/group mechanics in the system isn't working ... how
did it even let you log in etc.?

johannes

Reik Reid

unread,
Jan 12, 2022, 5:23:00 PM1/12/22
to Johannes Berg, bup-list
You wrote about how to detect the error. What not simply keep count of how many times you get an IOError with code==ENOENT? If you see more than, say, *10*  of those, print an warning/error message that indicates something informative in the middle and at the end of program execution, so that users have a fighting chance to understand what the problem is. Maybe also keep track of which uids are failing the lookup, on the odd chance that it depends on the uid.

You asked me whether I had trouble logging into my system when sssd.service is not working properly,. I have no problem. There is some sort of graceful fallback in effect for logins,

RR

Reply all
Reply to author
Forward
0 new messages