Good, posting a short excerpt is the way to go. Too bad you forgo two
options (-tt and -T), there's no timing info there...
> 9725 write(4, "\0", 1 <unfinished ...>
> 9956 <... select resumed> ) = 1 (in [3])
> 9725 <... write resumed> ) = -1 EAGAIN (Resource
> temporarily unavailable)
> I can send you the complete strace output on your mail id if needed.
You already did, and there, the -tt -T options were not forgotten ;-)
After a bit of reformatting (please use attachments, not text pasting,
those line breaks are a PAIN) and filtering, I extracted the following
from your trace. It is only the read/writes on the notifier pipe.
8633 11:24:47.624788 write(4, "\0", 1 <unfinished ...>
8633 11:24:47.624803 <... write resumed> ) = 1 <0.000009>
8673 11:24:47.624865 read(3, <unfinished ...>
8633 11:24:47.624871 write(4, "\0", 1 <unfinished ...>
8673 11:24:47.624878 <... read resumed> "\0", 1) = 1 <0.000009>
8633 11:24:47.624885 <... write resumed> ) = 1 <0.000008>
8673 11:24:47.624974 read(3, <unfinished ...>
8633 11:24:47.624980 write(4, "\0", 1 <unfinished ...>
8673 11:24:47.624987 <... read resumed> "\0", 1) = 1 <0.000009>
8633 11:24:47.624994 <... write resumed> ) = 1 <0.000009>
8633 11:24:47.625068 write(4, "\0", 1 <unfinished ...>
8633 11:24:47.625083 <... write resumed> ) = 1 <0.000009>
8673 11:24:47.625142 read(3, <unfinished ...>
8673 11:24:47.625154 <... read resumed> "\0", 1) = 1 <0.000008>
8633 11:24:47.625177 write(4, "\0", 1 <unfinished ...>
8633 11:24:47.625192 <... write resumed> ) = 1 <0.000009>
8673 11:24:47.625254 read(3, <unfinished ...>
8633 11:24:47.625260 write(4, "\0", 1 <unfinished ...>
8673 11:24:47.625267 <... read resumed> "\0", 1) = 1 <0.000009>
8633 11:24:47.625274 <... write resumed> ) = 1 <0.000009>
8673 11:24:47.625363 read(3, <unfinished ...>
8633 11:24:47.625369 write(4, "\0", 1 <unfinished ...>
8673 11:24:47.625375 <... read resumed> "\0", 1) = 1 <0.000008>
8633 11:24:47.625383 <... write resumed> ) = -1 EAGAIN (Resource
temporarily unavailable) <0.000009>
As you can see, in the small time of this snapshot (600 microseconds),
one can already see that the read side is slower than the write side
(7 write, 5 reads).
To confirm this analysis, a strace from the beginning (restricted to
reads and writes: strace -tt -T -f -o file.tra -w
trace=read,write ...) would show that the "drift" between reads and
writes reaches one pipe-buffer (typically one page, ie 4k), hence the
EAGAIN. It would be nice if you could verify this.
The next questions, then, are:
(1) what circumstances lead to such an overrun (the core sending
config updates to the notifier thread, at a faster pace than
reasonably sustainable) ?
(2) why doesn't the core use blocking writes here.
Could you please elaborate on your app to investigate (1) ?
I will pursue (2) with other core maintainers.
-Alex