Some of you may remember my latest question where I was having weird node
timeout issues that I couldn't explain and I thought it might be related to
the messages I was passing between my nodes. Well, I pinpointed the problem
to a call to zlib:gzip/1. At first I was really surprised by this, as such
a harmless line of code surely should have nothing to do with the ability
for my nodes to communicate. However, as I dug further I realized gzip was
implemented as a linked-in driver and I remember reading things about how
one has to take care with them because they can trash the VM with them. I
don't remember reading anything about them blocking code, and even if they
do I fail to see why my SMP enabled node (16 cores) would allow this one
thread to block the tick. It occurred to me that maybe the scheduler
responsible for that process is the one blocked by the driver. Do processes
have scheduler affinity? That would make sense, I guess.
I've "fixed" this problem simply by using a plain port (i.e. run in it's own
OS process). For my purposes, this actually makes more sense in the
majority of the places I was making use of gzip. Can someone enlighten me
as to exactly what is happening behind the scenes?
To reproduce I create a random 1.3GB file:
dd if=/dev/urandom of=rand bs=1048576 count=1365
Then start two named nodes 'foo' and 'bar', connect them, read in the file,
and then compress said file. Sometime later (I think around 60+ seconds)
the node 'bar' will claim that 'foo' is not responding.
[pro...@chinaski.local ~/tmp_code/node_timeout] erl -name foo
Erlang R14B (erts-5.8.1) [source] [64-bit] [smp:2:2] [rq:2]
[async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.8.1 (abort with ^G)
(f...@chinaski.local)1> net_adm:ping('b...@chinaski.local').
pong
(f...@chinaski.local)2> nodes().
['b...@chinaski.local']
(f...@chinaski.local)3> {ok,Data} = file:read_file("rand").
{ok,<<103,5,115,210,177,147,53,45,250,182,51,32,250,233,
39,253,102,61,73,242,18,159,45,185,232,80,33,...>>}
(f...@chinaski.local)4> zlib:gzip(Data).
<<31,139,8,0,0,0,0,0,0,3,0,15,64,240,191,103,5,115,210,
177,147,53,45,250,182,51,32,250,233,...>>
(f...@chinaski.local)5>
[pro...@chinaski.local ~/tmp_code/node_timeout] erl -name bar
Erlang R14B (erts-5.8.1) [source] [64-bit] [smp:2:2] [rq:2]
[async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.8.1 (abort with ^G)
(b...@chinaski.local)1> nodes().
['f...@chinaski.local']
(b...@chinaski.local)2>
=ERROR REPORT==== 18-Jan-2011::17:16:10 ===
** Node 'f...@chinaski.local' not responding **
** Removing (timedout) connection **
Thanks,
-Ryan
Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. Some resource limit may be holding back the system. That said zlib should not ever cause this issue.
________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org
This is what I have on the actual production machine:
Erlang R14A (erts-5.8) [source] [64-bit] [smp:16:16] [rq:16]
[async-threads:0] [hipe] [kernel-poll:false]
To be certain, I ran the same example (except this time using two physical
machines) and achieved the same result. Namely, the 'bar' node claims 'foo'
is not responding and thus closes the connection. Whatever this is, I've
now easily reproduced it on two different OSs, with 2 different Erlang
versions.
-Ryan
-Ryan
In this case it's a bug in the zlib module (probably by me) gzip should
chunk up the input before invoking the driver.
What happens is that all schedulers go to sleep because there is no work to do,
except the one invoking the driver, a ping is received and wakes up
the "distribution" process
which gets queued up on only scheduler that is awake, but that
scheduler is blocked
in an "eternal" call. The pings never become processed and the
distributions times out.
You can wait for a patch or use zlib api to chunk up compression your self, see
implementation of gzip in zlib module.
/Dan
________________________________________________________________
Thanks for the reply, I'll be sure to chunk my data. I was using the gzip/1
call for convenience.
That said, I'm still a little fuzzy on something you said. Why is it that
the "distribution" process is scheduled on the same scheduler that's running
the call to the driver? Why not schedule it on one of the 15 other
schedulers that are currently sleeping? Does this mean any other message I
send will also be blocked? Dare I ask, how does the scheduling work
exactly?
-Ryan
If I have understood it correctly, it works like this:
If a scheduler do not have any work to do it will be disabled.
It will be disabled until a live thread discovers it have to much work and
wakes a sleeping scheduler. The run-queues are only checked when processes
are scheduled.
Since in this case the only living scheduler is busy for a very long time,
no queue checking will be done and the all schedulers will be blocked until
the call to the driver is complete.
We had a long discussion during lunch about it, and we didn't agree
how it should
work. :-)
I agree that zlib is broken and it should be fixed but I still believe that it
breaks the rule about least astonishment, if I have 16 schedulers and
one is blocked
in a long function call I still expect other code to be invoked.
Rickards thought is that
such call should never happen and should be called through an async
driver or a separate
thread. I guess it will take a couple of more lunches to come to a
conclusion :-)
/Dan
________________________________________________________________
This was the fear with nifs. With nifs developers has an easy tool to really
destroy the system in order to "increase performance" and implement 3rd
party libs. There are several cases with different impact,
1) destroy soft-real-time properties - reduction count badness
2) destroy concurrency with blocking calls - scheduler badness
3) destroy the system with faulty drivers (seg fault) - pure badness
Some of these issues can be mitigated if the developer implements async
threads, i.e. schedules operations to the async-pool.
I feel that this is not ideal and is a heritage of ancient times.
The problem in this case is that time does not progress in the system. Time
is measured in reductions and each call is a reduction. At least this is the
case with normal code. There are som special cases too, for instance a
message sent "bumbs" the reduction count of the sender. Since native code
(nif, bifs and drivers) do not increase reductions during its call but
instead penalize the process after the call, time does not progress during
the execution (as opposed to beam code). When a process reaches the
reduction-limit it is scheduled out. Why reduction counters instead of time
slices? Supposedly it much faster (according to Björn). It is a design
decision with trade-offs. The solution is fast and nimble, it has certain
characteristics that are favorable and has some characteristics that are
less favorable. I would favor time-slices since i think it would be fairer
to the system and potentially we could save a register. Exactly how it
should be done is a question for a different time.
The load balancing in the scheduler is checked when a certain reduction
count is reached for that scheduler. We do not want to check this too often
since it will then become a serialization point.
But fear not, there is a (beautiful) solution that is being discussed in the
erts-team. Hopefully we can agree on the details.
// Björn-Egil
2011/1/22 Dan Gudmundsson <dg...@erlang.org>
It would be even better if NIFs/Drivers were time-limited, not in that they could be stopped (I assume that is impractical), but in that their results would be thrown away and an error raised if they exceed the limit. This would make bad NIFs take the blame they deserve by being treated as errors when they take too long.
I fear a future in which third-party applications with NIFs/drivers become commonplace dependencies for all applications (similar to the frameworks of the Java world), and that the NIFs/Drivers they contain break the soft realtime behavior of Erlang.
Making bad NIFs and Drivers purposely unusable will avoid a gradual erosion of Erlang's soft realtime properties for many users.
We've seen a few references to this forthcoming solution on this list.
How about a hint?
Jeff Schultz
1. No run-queue checking if the only living scheduler (schedulers ?) is blocked.
2. zlib is written in a blocking way.
Both should be fixed though the first is the more serious. It will also become serious as NIFs become more used. While "hardliner me" says that NIF writers have themselves to blame if they block the system and that they should RTFM, "softliner me" says that we should probably try to help them and make it easier to get it right.
Robert
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questio...@erlang.org
--
Robert Virding, Erlang Solutions Ltd.
I guess the upshot is to be very careful with linked-in code, not only
because it can crash the VM (which is the common warning) but because it can
block critical proceses that will affect the system in unforeseen and
obscure ways.
-Ryan