Hi,
Stefano Babic wrote on Fri, Apr 09, 2021 at 12:29:31PM +0200:
> > I did not name any thread.
> > You said the only thread started but I see four clone() system calls
> > before first and only fork in a strace, so there are five threads in
> > play (four if you exclude the main forking thread)
> >
>
> You are mixing two concepts. There are processes and threads. A fork creates
> a new processes, that is with a copy of the parent's data space and all file
> descriptors are duplicated in the child. The thread shares the same data
> space and the same file descriptors as the caller (the caller of
> pthread_create). So even if the kernel treats them at the same level because
> fork will call clone() and the kernel just see processes, there is a big
> diferrence how they are used.
>
> When SWUpdate starts (main process) , it creates a *thread* network thread.
> After that, it creates "subprocesses" , they are processes. The number of
> processes depends on your configuration and how SWUpdate is called, but they
> usually do not create threads. Each interface has a separate process:
> Webserver, downloader, suricatta.
I'm not sure what makes you think I misunderstand what threads and
processes are?
> So the "five threads" make no sense, it is not clear enough. I am also not
> sure how musl handle this
I'm not sure why it doesn't make sense to you either. Threads can be
identified, and are real entities, so can be counted?
swupdate starts, creates the following four threads that I will identify
with the function they start:
- notifier_thread
- network_initializer
- network_thread
- progress_bar_thread
(so there are five in total with the "main" thread), then processes on
forking its other helpers.
There are other threads in the code base, but these are all started
*before* the fork (or can be, for network_thread, as it is started from
network_initializer the timing can vary -- but other three always are
there before)
As a matter of fact, for linux, threads and processes are very
similar -- they're both tasks in the scheduler and the main difference
is that threads happen to share the same memory mappings. glibc actually
uses the same system call (clone) for both on x86_64, I was surprised to
see fork() in strace the other day...
While we are doing trivia, a possible enhancement to make debugging
easier would be to call pthread_setname_np() in each of these functions
to clearly identify each thread.
Doing that would clearly show what thread is what in gdb or other
debugging tools; that would have saved me a few minutes on Friday and I
would not have been vague like that as threads would really "have names"
> > The problem is quite clear to me, what can I do to convince you it is as
> > I say?
>
> The problem (lock on a resource) is clear, the cause not, and even not if
> this is really an issue inside SWUpdate.
The problem is the child blocking.
The cause is the *parent's thread* having the lock when fork happens in
parallel.
I'm not sure what other cause you could be looking for.
> Let's start : FILE are buffered and their handling is resolved by libc. It
> is duty of the library to resolve locking if the locks are not done by the
> single processes, as in this case.
>
> In other words: a program could also SEGV, exit or whatever - this should
> not lead to a deadlock or to a crash of the whole system.
Locks are per process, I don't understand why you keep repeating other
processes can impact the rest of the system.
I'm only referring to the state of the parent at the moment it is
forked, which is when its memory is copied (well, there is copy-on-write
so shallow-copied?) when the fork happens.
> Which is the musl version you are using ? At least 1.2.2 ?
Yes.
> There is a list of bugs in MUSL changelog that can be the cause of problem,
> for example:
>
> "soft deadlock regression in stdio FILE locks with >2 threads contending"
>
> and several others.
>
> We should first detect if there is something in SWUpdate that is not
> correct, and if yes, then fix it, or the cause is in another component (in
> this case, MUSL).
You can argue that musl should register atfork handlers to clear up the
lock when forks happen so children can't block like this, like glibc
does, but this is hardly a new problem so I'm not sure they're
interested...
Well, it can't hurt to ask if you think that's the way forward.
> - FILE are handled by the libc and SWUpdate does not set any mutex / lock,
> and in any case on stdio / stdout / stderr.
> - if a task is started and fd (or FILE) are available, it is perfectly
> correct that it make use of them. Cleanup and resource manging is solved by
> the caller, that is inside fork() / pthread_create().
Yes, the child can use the fd or FILE if the parent left them in usable
state.
The problem is it is not the case here due to concurrent access with
fork().
> > Looking at traces the accept() also happens after
> > the fork(), along with unlink/bind on /tmp/sockinstctrl in that order
> > (sorry, didn't think of taking timestamps for strace, but the order
> > should be correct)
> >
> > 1489 fork( <unfinished ...>
> > 1491 socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0 <unfinished ...>
> > 1492 <... socket resumed>) = 5
> > 1492 unlink("/tmp/swupdateprog" <unfinished ...>
> > 1492 bind(5, {sa_family=AF_UNIX, sun_path="/tmp/swupdateprog"}, 110
> > <unfinished ...>
> > 1492 accept(5, <unfinished ...>
> > 1493 socket(AF_UNIX, SOCK_STREAM, 0) = 6
> > 1493 unlink("/tmp/sockinstctrl") = 0
> > 1493 bind(6, {sa_family=AF_UNIX, sun_path="/tmp/sockinstctrl"}, 110) = 0
> > 1493 accept(6, <unfinished ...>
> >
>
> Yes, it seems correct and it is expected.
That is very surprising to me... but I see below
>
> > (bonus: swupdateprog has the same "problem" if children are expected to
> > be able to connect to it -- I don't know if this is the case, but code
> > path can definitely print messages until that thread is in accept state)
> >
> >
> > So nothing prevents the children from trying to connect to swupdateprog
> > before the network thread has done its job, and in highly loaded
> > environment where the linux scheduler does weird things I'm sure I could
> > reproduce a child process that cannot connect once in a while -- for
> > example by making sure the filesystem is very slow by having a hundred
> > other process create/write/destroy files in /tmp in parallel.
>
> But this is not your case here. This is also the reason why the network
> thread is called *earlier*. The clients have a mechanism to retry a
> connection if first time fails (and if not can be added). This avoids to
> have a sync at startup that seems overkilling.
Ok, so yes there is an ugly loop for 3 attempts with a sleep(1) -- that
explains why download sometimes felt sluggish during my tests I was
wondering why the logs stalled from time to time when running swupdate
in a loop....
Well, now I konw there's the retry there's no harm in just trying (I'm
not suggesting this as a real fix) to move all the thread creations
after the forks, and illustrate that the problem no longer occurs?
That seems obvious to me so I would normally not do it, but if that
helps convincing you that it is about other threads accessing resources
during fork I can spend the time to do it.
> > (Curiosity also got me to look at the other two threads, one is the
> > progress_bar_thread which creates /tmp/swupdateprog and has the same
> > race: this one would either need synchronisation if required for forks
> > or could be moved after forks.
> > The last one is the binds on @"NotifyServer so would be the
> > notifier_thread started in notify_init which also has no synchronisation
> > so in theory could mean logs from children process can get lost)
>
> They should be queued, until UDS socket is full.
I'm not sure what UDS is, but for reference I did try:
- a process does bind on an address + recvfrom
- a process does sendto on same address
with various timings.
The result is:
- messages sent before the bind are just plain lost, as I would have
expected.
- messages sent after bind, before recvfrom are queued properly. If the
queue gets full the sendto blocks.
While doing that I noticed you also bind a socket on the client (with a
different name), I tried that too with no difference -- I don't think it
would be useful unless the server would send replies acknolwedging logs,
prehaps that was done in the past? But it could be removed now.
There is also a loop where the server tries to use different names if
the socket is not available.
With that in mind I see at least two problems:
- the child won the race and tries to send messages before the parent
process bound the socket, in which case the messages are lost until the
parent did its job.
- the child starts with one socket name, then parent does bind and the
socket name is not free for a reason or another and name changes.
Messages from the child go for the original socket name. Depending on
the reason it could be something like:
one preexisting swupdate process; new one forks
-> messages go to older swupdate
at which point old one exits
-> messages start getting lost.
I am sorry, but I really do think synchronisation between threads before
forking is useful.
> Let's go back: you find the problem (block on a resource), you do not find
> the cause (issue in SWUpdate or in MUSL), you get the symptoms, and you want
> try to work around the symptoms.
>
> Again: it should be not forbidden by a client to log / do something. This
> should not lead to a deadlock. If deadlock happens, it is caused by the one
> who handles the lock, and for stdio / stdout / stderr this is not SWUpdate.
Again, the client can use the file if and only if the lock was not taken
by swupdate while the fork happens.
You can argue it's libc doing the actual lock so it's libc's job to
cleanup the lock on fork, and it's certainly possible to do that
cleanup, but for me it is a clear case of "there are certain functions
that can be called while fork happens and others not" from the article I
linked.
--
Dominique