Some of you may remember my latest question where I was having weird node timeout issues that I couldn't explain and I thought it might be related to the messages I was passing between my nodes. Well, I pinpointed the problem to a call to zlib:gzip/1. At first I was really surprised by this, as such a harmless line of code surely should have nothing to do with the ability for my nodes to communicate. However, as I dug further I realized gzip was implemented as a linked-in driver and I remember reading things about how one has to take care with them because they can trash the VM with them. I don't remember reading anything about them blocking code, and even if they do I fail to see why my SMP enabled node (16 cores) would allow this one thread to block the tick. It occurred to me that maybe the scheduler responsible for that process is the one blocked by the driver. Do processes have scheduler affinity? That would make sense, I guess.
I've "fixed" this problem simply by using a plain port (i.e. run in it's own OS process). For my purposes, this actually makes more sense in the majority of the places I was making use of gzip. Can someone enlighten me as to exactly what is happening behind the scenes?
To reproduce I create a random 1.3GB file:
dd if=/dev/urandom of=rand bs=1048576 count=1365
Then start two named nodes 'foo' and 'bar', connect them, read in the file, and then compress said file. Sometime later (I think around 60+ seconds) the node 'bar' will claim that 'foo' is not responding.
> Some of you may remember my latest question where I was having weird node > timeout issues that I couldn't explain and I thought it might be related to > the messages I was passing between my nodes. Well, I pinpointed the problem > to a call to zlib:gzip/1. At first I was really surprised by this, as such > a harmless line of code surely should have nothing to do with the ability > for my nodes to communicate. However, as I dug further I realized gzip was > implemented as a linked-in driver and I remember reading things about how > one has to take care with them because they can trash the VM with them. I > don't remember reading anything about them blocking code, and even if they > do I fail to see why my SMP enabled node (16 cores) would allow this one > thread to block the tick. It occurred to me that maybe the scheduler > responsible for that process is the one blocked by the driver. Do processes > have scheduler affinity? That would make sense, I guess.
> I've "fixed" this problem simply by using a plain port (i.e. run in it's own > OS process). For my purposes, this actually makes more sense in the > majority of the places I was making use of gzip. Can someone enlighten me > as to exactly what is happening behind the scenes?
> Then start two named nodes 'foo' and 'bar', connect them, read in the file, > and then compress said file. Sometime later (I think around 60+ seconds) > the node 'bar' will claim that 'foo' is not responding.
Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. Some resource limit may be holding back the system. That said zlib should not ever cause this issue.
To be certain, I ran the same example (except this time using two physical machines) and achieved the same result. Namely, the 'bar' node claims 'foo' is not responding and thus closes the connection. Whatever this is, I've now easily reproduced it on two different OSs, with 2 different Erlang versions.
On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> wrote: > On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
> > Hi everyone,
> > Some of you may remember my latest question where I was having weird node > > timeout issues that I couldn't explain and I thought it might be related > to > > the messages I was passing between my nodes. Well, I pinpointed the > problem > > to a call to zlib:gzip/1. At first I was really surprised by this, as > such > > a harmless line of code surely should have nothing to do with the ability > > for my nodes to communicate. However, as I dug further I realized gzip > was > > implemented as a linked-in driver and I remember reading things about how > > one has to take care with them because they can trash the VM with them. > I > > don't remember reading anything about them blocking code, and even if > they > > do I fail to see why my SMP enabled node (16 cores) would allow this one > > thread to block the tick. It occurred to me that maybe the scheduler > > responsible for that process is the one blocked by the driver. Do > processes > > have scheduler affinity? That would make sense, I guess.
> > I've "fixed" this problem simply by using a plain port (i.e. run in it's > own > > OS process). For my purposes, this actually makes more sense in the > > majority of the places I was making use of gzip. Can someone enlighten > me > > as to exactly what is happening behind the scenes?
> > Then start two named nodes 'foo' and 'bar', connect them, read in the > file, > > and then compress said file. Sometime later (I think around 60+ seconds) > > the node 'bar' will claim that 'foo' is not responding.
> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. Some > resource limit may be holding back the system. That said zlib should not > ever cause this issue.
So...can anyone explain to me why zlib:gzip/1 is causing the net_kernel tick to be blocked? Do linked-in drivers block it's scheduler like NIFs? I'm really curious on this one :)
> To be certain, I ran the same example (except this time using two physical > machines) and achieved the same result. Namely, the 'bar' node claims 'foo' > is not responding and thus closes the connection. Whatever this is, I've > now easily reproduced it on two different OSs, with 2 different Erlang > versions.
> -Ryan
> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> wrote:
>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
>> > Hi everyone,
>> > Some of you may remember my latest question where I was having weird >> node >> > timeout issues that I couldn't explain and I thought it might be related >> to >> > the messages I was passing between my nodes. Well, I pinpointed the >> problem >> > to a call to zlib:gzip/1. At first I was really surprised by this, as >> such >> > a harmless line of code surely should have nothing to do with the >> ability >> > for my nodes to communicate. However, as I dug further I realized gzip >> was >> > implemented as a linked-in driver and I remember reading things about >> how >> > one has to take care with them because they can trash the VM with them. >> I >> > don't remember reading anything about them blocking code, and even if >> they >> > do I fail to see why my SMP enabled node (16 cores) would allow this one >> > thread to block the tick. It occurred to me that maybe the scheduler >> > responsible for that process is the one blocked by the driver. Do >> processes >> > have scheduler affinity? That would make sense, I guess.
>> > I've "fixed" this problem simply by using a plain port (i.e. run in it's >> own >> > OS process). For my purposes, this actually makes more sense in the >> > majority of the places I was making use of gzip. Can someone enlighten >> me >> > as to exactly what is happening behind the scenes?
>> > Then start two named nodes 'foo' and 'bar', connect them, read in the >> file, >> > and then compress said file. Sometime later (I think around 60+ >> seconds) >> > the node 'bar' will claim that 'foo' is not responding.
>> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. >> Some resource limit may be holding back the system. That said zlib should >> not ever cause this issue.
All c-calls blocks a schedulers, if they are not pushed out to a thread.
In this case it's a bug in the zlib module (probably by me) gzip should chunk up the input before invoking the driver.
What happens is that all schedulers go to sleep because there is no work to do, except the one invoking the driver, a ping is received and wakes up the "distribution" process which gets queued up on only scheduler that is awake, but that scheduler is blocked in an "eternal" call. The pings never become processed and the distributions times out.
You can wait for a patch or use zlib api to chunk up compression your self, see implementation of gzip in zlib module.
On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> wrote: > So...can anyone explain to me why zlib:gzip/1 is causing the net_kernel tick > to be blocked? Do linked-in drivers block it's scheduler like NIFs? I'm > really curious on this one :)
> -Ryan
> On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski <rzeze...@gmail.com> wrote:
>> Apologies, the example I copied was run on my mac.
>> This is what I have on the actual production machine:
>> To be certain, I ran the same example (except this time using two physical >> machines) and achieved the same result. Namely, the 'bar' node claims 'foo' >> is not responding and thus closes the connection. Whatever this is, I've >> now easily reproduced it on two different OSs, with 2 different Erlang >> versions.
>> -Ryan
>> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> wrote:
>>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
>>> > Hi everyone,
>>> > Some of you may remember my latest question where I was having weird >>> node >>> > timeout issues that I couldn't explain and I thought it might be related >>> to >>> > the messages I was passing between my nodes. Well, I pinpointed the >>> problem >>> > to a call to zlib:gzip/1. At first I was really surprised by this, as >>> such >>> > a harmless line of code surely should have nothing to do with the >>> ability >>> > for my nodes to communicate. However, as I dug further I realized gzip >>> was >>> > implemented as a linked-in driver and I remember reading things about >>> how >>> > one has to take care with them because they can trash the VM with them. >>> I >>> > don't remember reading anything about them blocking code, and even if >>> they >>> > do I fail to see why my SMP enabled node (16 cores) would allow this one >>> > thread to block the tick. It occurred to me that maybe the scheduler >>> > responsible for that process is the one blocked by the driver. Do >>> processes >>> > have scheduler affinity? That would make sense, I guess.
>>> > I've "fixed" this problem simply by using a plain port (i.e. run in it's >>> own >>> > OS process). For my purposes, this actually makes more sense in the >>> > majority of the places I was making use of gzip. Can someone enlighten >>> me >>> > as to exactly what is happening behind the scenes?
>>> > Then start two named nodes 'foo' and 'bar', connect them, read in the >>> file, >>> > and then compress said file. Sometime later (I think around 60+ >>> seconds) >>> > the node 'bar' will claim that 'foo' is not responding.
>>> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. >>> Some resource limit may be holding back the system. That said zlib should >>> not ever cause this issue.
Thanks for the reply, I'll be sure to chunk my data. I was using the gzip/1 call for convenience.
That said, I'm still a little fuzzy on something you said. Why is it that the "distribution" process is scheduled on the same scheduler that's running the call to the driver? Why not schedule it on one of the 15 other schedulers that are currently sleeping? Does this mean any other message I send will also be blocked? Dare I ask, how does the scheduling work exactly?
On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> wrote: > All c-calls blocks a schedulers, if they are not pushed out to a thread.
> In this case it's a bug in the zlib module (probably by me) gzip should > chunk up the input before invoking the driver.
> What happens is that all schedulers go to sleep because there is no work to > do, > except the one invoking the driver, a ping is received and wakes up > the "distribution" process > which gets queued up on only scheduler that is awake, but that > scheduler is blocked > in an "eternal" call. The pings never become processed and the > distributions times out.
> You can wait for a patch or use zlib api to chunk up compression your self, > see > implementation of gzip in zlib module.
> /Dan
> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> wrote: > > So...can anyone explain to me why zlib:gzip/1 is causing the net_kernel > tick > > to be blocked? Do linked-in drivers block it's scheduler like NIFs? I'm > > really curious on this one :)
> > -Ryan
> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski <rzeze...@gmail.com> > wrote:
> >> Apologies, the example I copied was run on my mac.
> >> This is what I have on the actual production machine:
> >> To be certain, I ran the same example (except this time using two > physical > >> machines) and achieved the same result. Namely, the 'bar' node claims > 'foo' > >> is not responding and thus closes the connection. Whatever this is, > I've > >> now easily reproduced it on two different OSs, with 2 different Erlang > >> versions.
> >> -Ryan
> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> > wrote:
> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
> >>> > Hi everyone,
> >>> > Some of you may remember my latest question where I was having weird > >>> node > >>> > timeout issues that I couldn't explain and I thought it might be > related > >>> to > >>> > the messages I was passing between my nodes. Well, I pinpointed the > >>> problem > >>> > to a call to zlib:gzip/1. At first I was really surprised by this, > as > >>> such > >>> > a harmless line of code surely should have nothing to do with the > >>> ability > >>> > for my nodes to communicate. However, as I dug further I realized > gzip > >>> was > >>> > implemented as a linked-in driver and I remember reading things about > >>> how > >>> > one has to take care with them because they can trash the VM with > them. > >>> I > >>> > don't remember reading anything about them blocking code, and even if > >>> they > >>> > do I fail to see why my SMP enabled node (16 cores) would allow this > one > >>> > thread to block the tick. It occurred to me that maybe the scheduler > >>> > responsible for that process is the one blocked by the driver. Do > >>> processes > >>> > have scheduler affinity? That would make sense, I guess.
> >>> > I've "fixed" this problem simply by using a plain port (i.e. run in > it's > >>> own > >>> > OS process). For my purposes, this actually makes more sense in the > >>> > majority of the places I was making use of gzip. Can someone > enlighten > >>> me > >>> > as to exactly what is happening behind the scenes?
> >>> > To reproduce I create a random 1.3GB file:
> >>> > Then start two named nodes 'foo' and 'bar', connect them, read in the > >>> file, > >>> > and then compress said file. Sometime later (I think around 60+ > >>> seconds) > >>> > the node 'bar' will claim that 'foo' is not responding.
> >>> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. > >>> Some resource limit may be holding back the system. That said zlib > should > >>> not ever cause this issue.
Rickard who have implemented this should explain it.
If I have understood it correctly, it works like this: If a scheduler do not have any work to do it will be disabled. It will be disabled until a live thread discovers it have to much work and wakes a sleeping scheduler. The run-queues are only checked when processes are scheduled.
Since in this case the only living scheduler is busy for a very long time, no queue checking will be done and the all schedulers will be blocked until the call to the driver is complete.
We had a long discussion during lunch about it, and we didn't agree how it should work. :-)
I agree that zlib is broken and it should be fixed but I still believe that it breaks the rule about least astonishment, if I have 16 schedulers and one is blocked in a long function call I still expect other code to be invoked. Rickards thought is that such call should never happen and should be called through an async driver or a separate thread. I guess it will take a couple of more lunches to come to a conclusion :-)
On Fri, Jan 21, 2011 at 10:25 PM, Ryan Zezeski <rzeze...@gmail.com> wrote: > Dan,
> Thanks for the reply, I'll be sure to chunk my data. I was using the gzip/1 > call for convenience.
> That said, I'm still a little fuzzy on something you said. Why is it that > the "distribution" process is scheduled on the same scheduler that's running > the call to the driver? Why not schedule it on one of the 15 other > schedulers that are currently sleeping? Does this mean any other message I > send will also be blocked? Dare I ask, how does the scheduling work > exactly?
> -Ryan
> On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> wrote:
>> All c-calls blocks a schedulers, if they are not pushed out to a thread.
>> In this case it's a bug in the zlib module (probably by me) gzip should >> chunk up the input before invoking the driver.
>> What happens is that all schedulers go to sleep because there is no work to >> do, >> except the one invoking the driver, a ping is received and wakes up >> the "distribution" process >> which gets queued up on only scheduler that is awake, but that >> scheduler is blocked >> in an "eternal" call. The pings never become processed and the >> distributions times out.
>> You can wait for a patch or use zlib api to chunk up compression your self, >> see >> implementation of gzip in zlib module.
>> /Dan
>> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> wrote: >> > So...can anyone explain to me why zlib:gzip/1 is causing the net_kernel >> tick >> > to be blocked? Do linked-in drivers block it's scheduler like NIFs? I'm >> > really curious on this one :)
>> > -Ryan
>> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski <rzeze...@gmail.com> >> wrote:
>> >> Apologies, the example I copied was run on my mac.
>> >> This is what I have on the actual production machine:
>> >> To be certain, I ran the same example (except this time using two >> physical >> >> machines) and achieved the same result. Namely, the 'bar' node claims >> 'foo' >> >> is not responding and thus closes the connection. Whatever this is, >> I've >> >> now easily reproduced it on two different OSs, with 2 different Erlang >> >> versions.
>> >> -Ryan
>> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> >> wrote:
>> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
>> >>> > Hi everyone,
>> >>> > Some of you may remember my latest question where I was having weird >> >>> node >> >>> > timeout issues that I couldn't explain and I thought it might be >> related >> >>> to >> >>> > the messages I was passing between my nodes. Well, I pinpointed the >> >>> problem >> >>> > to a call to zlib:gzip/1. At first I was really surprised by this, >> as >> >>> such >> >>> > a harmless line of code surely should have nothing to do with the >> >>> ability >> >>> > for my nodes to communicate. However, as I dug further I realized >> gzip >> >>> was >> >>> > implemented as a linked-in driver and I remember reading things about >> >>> how >> >>> > one has to take care with them because they can trash the VM with >> them. >> >>> I >> >>> > don't remember reading anything about them blocking code, and even if >> >>> they >> >>> > do I fail to see why my SMP enabled node (16 cores) would allow this >> one >> >>> > thread to block the tick. It occurred to me that maybe the scheduler >> >>> > responsible for that process is the one blocked by the driver. Do >> >>> processes >> >>> > have scheduler affinity? That would make sense, I guess.
>> >>> > I've "fixed" this problem simply by using a plain port (i.e. run in >> it's >> >>> own >> >>> > OS process). For my purposes, this actually makes more sense in the >> >>> > majority of the places I was making use of gzip. Can someone >> enlighten >> >>> me >> >>> > as to exactly what is happening behind the scenes?
>> >>> > To reproduce I create a random 1.3GB file:
>> >>> > Then start two named nodes 'foo' and 'bar', connect them, read in the >> >>> file, >> >>> > and then compress said file. Sometime later (I think around 60+ >> >>> seconds) >> >>> > the node 'bar' will claim that 'foo' is not responding.
>> >>> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. >> >>> Some resource limit may be holding back the system. That said zlib >> should >> >>> not ever cause this issue.
I think his argument was that a driver or nif that does not use an async-thread for potentially blocking calls is a seriously broken driver. Consider the non-smp case. It will halt the beam and hinder important processes to be scheduled. I agree that in the smp case, one scheduler should not block the other schedulers in damaging calls. If a developer wants to destroy a scheduler with a broken driver, he should be free to do so.
This was the fear with nifs. With nifs developers has an easy tool to really destroy the system in order to "increase performance" and implement 3rd party libs. There are several cases with different impact, 1) destroy soft-real-time properties - reduction count badness 2) destroy concurrency with blocking calls - scheduler badness 3) destroy the system with faulty drivers (seg fault) - pure badness
Some of these issues can be mitigated if the developer implements async threads, i.e. schedules operations to the async-pool.
I feel that this is not ideal and is a heritage of ancient times.
The problem in this case is that time does not progress in the system. Time is measured in reductions and each call is a reduction. At least this is the case with normal code. There are som special cases too, for instance a message sent "bumbs" the reduction count of the sender. Since native code (nif, bifs and drivers) do not increase reductions during its call but instead penalize the process after the call, time does not progress during the execution (as opposed to beam code). When a process reaches the reduction-limit it is scheduled out. Why reduction counters instead of time slices? Supposedly it much faster (according to Björn). It is a design decision with trade-offs. The solution is fast and nimble, it has certain characteristics that are favorable and has some characteristics that are less favorable. I would favor time-slices since i think it would be fairer to the system and potentially we could save a register. Exactly how it should be done is a question for a different time.
The load balancing in the scheduler is checked when a certain reduction count is reached for that scheduler. We do not want to check this too often since it will then become a serialization point.
But fear not, there is a (beautiful) solution that is being discussed in the erts-team. Hopefully we can agree on the details.
> Rickard who have implemented this should explain it.
> If I have understood it correctly, it works like this: > If a scheduler do not have any work to do it will be disabled. > It will be disabled until a live thread discovers it have to much work and > wakes a sleeping scheduler. The run-queues are only checked when processes > are scheduled.
> Since in this case the only living scheduler is busy for a very long time, > no queue checking will be done and the all schedulers will be blocked until > the call to the driver is complete.
> We had a long discussion during lunch about it, and we didn't agree > how it should > work. :-)
> I agree that zlib is broken and it should be fixed but I still believe that > it > breaks the rule about least astonishment, if I have 16 schedulers and > one is blocked > in a long function call I still expect other code to be invoked. > Rickards thought is that > such call should never happen and should be called through an async > driver or a separate > thread. I guess it will take a couple of more lunches to come to a > conclusion :-)
> /Dan
> On Fri, Jan 21, 2011 at 10:25 PM, Ryan Zezeski <rzeze...@gmail.com> wrote: > > Dan,
> > Thanks for the reply, I'll be sure to chunk my data. I was using the > gzip/1 > > call for convenience.
> > That said, I'm still a little fuzzy on something you said. Why is it > that > > the "distribution" process is scheduled on the same scheduler that's > running > > the call to the driver? Why not schedule it on one of the 15 other > > schedulers that are currently sleeping? Does this mean any other message > I > > send will also be blocked? Dare I ask, how does the scheduling work > > exactly?
> > -Ryan
> > On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> > wrote:
> >> All c-calls blocks a schedulers, if they are not pushed out to a thread.
> >> In this case it's a bug in the zlib module (probably by me) gzip should > >> chunk up the input before invoking the driver.
> >> What happens is that all schedulers go to sleep because there is no work > to > >> do, > >> except the one invoking the driver, a ping is received and wakes up > >> the "distribution" process > >> which gets queued up on only scheduler that is awake, but that > >> scheduler is blocked > >> in an "eternal" call. The pings never become processed and the > >> distributions times out.
> >> You can wait for a patch or use zlib api to chunk up compression your > self, > >> see > >> implementation of gzip in zlib module.
> >> /Dan
> >> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> > wrote: > >> > So...can anyone explain to me why zlib:gzip/1 is causing the > net_kernel > >> tick > >> > to be blocked? Do linked-in drivers block it's scheduler like NIFs? > I'm > >> > really curious on this one :)
> >> > -Ryan
> >> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski <rzeze...@gmail.com> > >> wrote:
> >> >> Apologies, the example I copied was run on my mac.
> >> >> This is what I have on the actual production machine:
> >> >> To be certain, I ran the same example (except this time using two > >> physical > >> >> machines) and achieved the same result. Namely, the 'bar' node > claims > >> 'foo' > >> >> is not responding and thus closes the connection. Whatever this is, > >> I've > >> >> now easily reproduced it on two different OSs, with 2 different > Erlang > >> >> versions.
> >> >> -Ryan
> >> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> > >> wrote:
> >> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
> >> >>> > Hi everyone,
> >> >>> > Some of you may remember my latest question where I was having > weird > >> >>> node > >> >>> > timeout issues that I couldn't explain and I thought it might be > >> related > >> >>> to > >> >>> > the messages I was passing between my nodes. Well, I pinpointed > the > >> >>> problem > >> >>> > to a call to zlib:gzip/1. At first I was really surprised by > this, > >> as > >> >>> such > >> >>> > a harmless line of code surely should have nothing to do with the > >> >>> ability > >> >>> > for my nodes to communicate. However, as I dug further I realized > >> gzip > >> >>> was > >> >>> > implemented as a linked-in driver and I remember reading things > about > >> >>> how > >> >>> > one has to take care with them because they can trash the VM with > >> them. > >> >>> I > >> >>> > don't remember reading anything about them blocking code, and even > if > >> >>> they > >> >>> > do I fail to see why my SMP enabled node (16 cores) would allow > this > >> one > >> >>> > thread to block the tick. It occurred to me that maybe the > scheduler > >> >>> > responsible for that process is the one blocked by the driver. Do > >> >>> processes > >> >>> > have scheduler affinity? That would make sense, I guess.
> >> >>> > I've "fixed" this problem simply by using a plain port (i.e. run > in > >> it's > >> >>> own > >> >>> > OS process). For my purposes, this actually makes more sense in > the > >> >>> > majority of the places I was making use of gzip. Can someone > >> enlighten > >> >>> me > >> >>> > as to exactly what is happening behind the scenes?
> >> >>> > To reproduce I create a random 1.3GB file:
> >> >>> > Then start two named nodes 'foo' and 'bar', connect them, read in > the > >> >>> file, > >> >>> > and then compress said file. Sometime later (I think around 60+ > >> >>> seconds) > >> >>> > the node 'bar' will claim that 'foo' is not responding.
> >> >>> Your SMP node seems to be capped at smp:2:2 when it out to be > smp:16. > >> >>> Some resource limit may be holding back the system. That said zlib > >> should > >> >>> not ever cause this issue.
I'm mostly happy so long as the standard distribution is very careful to have its BIFs, NIFs and Port Drivers not run long. It should be hard or impossible by design to supply inputs for these that cause them to run long.
It would be even better if NIFs/Drivers were time-limited, not in that they could be stopped (I assume that is impractical), but in that their results would be thrown away and an error raised if they exceed the limit. This would make bad NIFs take the blame they deserve by being treated as errors when they take too long.
I fear a future in which third-party applications with NIFs/drivers become commonplace dependencies for all applications (similar to the frameworks of the Java world), and that the NIFs/Drivers they contain break the soft realtime behavior of Erlang.
Making bad NIFs and Drivers purposely unusable will avoid a gradual erosion of Erlang's soft realtime properties for many users.
On 2011-01-22, at 14:42, Wallentin Dahlberg <wallentin.dahlb...@gmail.com> wrote:
> I think his argument was that a driver or nif that does not use an async-thread for potentially blocking calls is a seriously broken driver. Consider the non-smp case. It will halt the beam and hinder important processes to be scheduled. I agree that in the smp case, one scheduler should not block the other schedulers in damaging calls. If a developer wants to destroy a scheduler with a broken driver, he should be free to do so.
> This was the fear with nifs. With nifs developers has an easy tool to really destroy the system in order to "increase performance" and implement 3rd party libs. There are several cases with different impact, > 1) destroy soft-real-time properties - reduction count badness > 2) destroy concurrency with blocking calls - scheduler badness > 3) destroy the system with faulty drivers (seg fault) - pure badness
> Some of these issues can be mitigated if the developer implements async threads, i.e. schedules operations to the async-pool.
> I feel that this is not ideal and is a heritage of ancient times.
> The problem in this case is that time does not progress in the system. Time is measured in reductions and each call is a reduction. At least this is the case with normal code. There are som special cases too, for instance a message sent "bumbs" the reduction count of the sender. Since native code (nif, bifs and drivers) do not increase reductions during its call but instead penalize the process after the call, time does not progress during the execution (as opposed to beam code). When a process reaches the reduction-limit it is scheduled out. Why reduction counters instead of time slices? Supposedly it much faster (according to Björn). It is a design decision with trade-offs. The solution is fast and nimble, it has certain characteristics that are favorable and has some characteristics that are less favorable. I would favor time-slices since i think it would be fairer to the system and potentially we could save a register. Exactly how it should be done is a question for a different time.
> The load balancing in the scheduler is checked when a certain reduction count is reached for that scheduler. We do not want to check this too often since it will then become a serialization point.
> But fear not, there is a (beautiful) solution that is being discussed in the erts-team. Hopefully we can agree on the details.
> // Björn-Egil
> 2011/1/22 Dan Gudmundsson <d...@erlang.org> > Rickard who have implemented this should explain it.
> If I have understood it correctly, it works like this: > If a scheduler do not have any work to do it will be disabled. > It will be disabled until a live thread discovers it have to much work and > wakes a sleeping scheduler. The run-queues are only checked when processes > are scheduled.
> Since in this case the only living scheduler is busy for a very long time, > no queue checking will be done and the all schedulers will be blocked until > the call to the driver is complete.
> We had a long discussion during lunch about it, and we didn't agree > how it should > work. :-)
> I agree that zlib is broken and it should be fixed but I still believe that it > breaks the rule about least astonishment, if I have 16 schedulers and > one is blocked > in a long function call I still expect other code to be invoked. > Rickards thought is that > such call should never happen and should be called through an async > driver or a separate > thread. I guess it will take a couple of more lunches to come to a > conclusion :-)
> /Dan
> On Fri, Jan 21, 2011 at 10:25 PM, Ryan Zezeski <rzeze...@gmail.com> wrote: > > Dan,
> > Thanks for the reply, I'll be sure to chunk my data. I was using the gzip/1 > > call for convenience.
> > That said, I'm still a little fuzzy on something you said. Why is it that > > the "distribution" process is scheduled on the same scheduler that's running > > the call to the driver? Why not schedule it on one of the 15 other > > schedulers that are currently sleeping? Does this mean any other message I > > send will also be blocked? Dare I ask, how does the scheduling work > > exactly?
> > -Ryan
> > On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> wrote:
> >> All c-calls blocks a schedulers, if they are not pushed out to a thread.
> >> In this case it's a bug in the zlib module (probably by me) gzip should > >> chunk up the input before invoking the driver.
> >> What happens is that all schedulers go to sleep because there is no work to > >> do, > >> except the one invoking the driver, a ping is received and wakes up > >> the "distribution" process > >> which gets queued up on only scheduler that is awake, but that > >> scheduler is blocked > >> in an "eternal" call. The pings never become processed and the > >> distributions times out.
> >> You can wait for a patch or use zlib api to chunk up compression your self, > >> see > >> implementation of gzip in zlib module.
> >> /Dan
> >> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> wrote: > >> > So...can anyone explain to me why zlib:gzip/1 is causing the net_kernel > >> tick > >> > to be blocked? Do linked-in drivers block it's scheduler like NIFs? I'm > >> > really curious on this one :)
> >> > -Ryan
> >> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski <rzeze...@gmail.com> > >> wrote:
> >> >> Apologies, the example I copied was run on my mac.
> >> >> This is what I have on the actual production machine:
> >> >> To be certain, I ran the same example (except this time using two > >> physical > >> >> machines) and achieved the same result. Namely, the 'bar' node claims > >> 'foo' > >> >> is not responding and thus closes the connection. Whatever this is, > >> I've > >> >> now easily reproduced it on two different OSs, with 2 different Erlang > >> >> versions.
> >> >> -Ryan
> >> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea <alain.o...@gmail.com> > >> wrote:
> >> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> wrote:
> >> >>> > Hi everyone,
> >> >>> > Some of you may remember my latest question where I was having weird > >> >>> node > >> >>> > timeout issues that I couldn't explain and I thought it might be > >> related > >> >>> to > >> >>> > the messages I was passing between my nodes. Well, I pinpointed the > >> >>> problem > >> >>> > to a call to zlib:gzip/1. At first I was really surprised by this, > >> as > >> >>> such > >> >>> > a harmless line of code surely should have nothing to do with the > >> >>> ability > >> >>> > for my nodes to communicate. However, as I dug further I realized > >> gzip > >> >>> was > >> >>> > implemented as a linked-in driver and I remember reading things about > >> >>> how > >> >>> > one has to take care with them because they can trash the VM with > >> them. > >> >>> I > >> >>> > don't remember reading anything about them blocking code, and even if > >> >>> they > >> >>> > do I fail to see why my SMP enabled node (16 cores) would allow this > >> one > >> >>> > thread to block the tick. It occurred to me that maybe the scheduler > >> >>> > responsible for that process is the one blocked by the driver. Do > >> >>> processes > >> >>> > have scheduler affinity? That would make sense, I guess.
> >> >>> > I've "fixed" this problem simply by using a plain port (i.e. run in > >> it's > >> >>> own > >> >>> > OS process). For my purposes, this actually makes more sense in the > >> >>> > majority of the places I was making use of gzip. Can someone > >> enlighten > >> >>> me > >> >>> > as to exactly what is happening behind the scenes?
> >> >>> > To reproduce I create a random 1.3GB file:
> >> >>> > Then start two named nodes 'foo' and 'bar', connect them, read in the > >> >>> file, > >> >>> > and then compress said file. Sometime later (I think around 60+ > >> >>> seconds) > >> >>> > the node 'bar' will claim that 'foo' is not responding.
> >> >>> Your SMP node seems to be capped at smp:2:2 when it out to be smp:16. > >> >>> Some resource limit may be holding back the system. That said zlib > >> should > >> >>> not ever cause this issue.
On Sat, Jan 22, 2011 at 07:12:04PM +0100, Wallentin Dahlberg wrote: > But fear not, there is a (beautiful) solution that is being discussed in the > erts-team. Hopefully we can agree on the details.
We've seen a few references to this forthcoming solution on this list. How about a hint?
Jeff Schultz
________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:erlang-questions-unsubscr...@erlang.org
1. No run-queue checking if the only living scheduler (schedulers ?) is blocked. 2. zlib is written in a blocking way.
Both should be fixed though the first is the more serious. It will also become serious as NIFs become more used. While "hardliner me" says that NIF writers have themselves to blame if they block the system and that they should RTFM, "softliner me" says that we should probably try to help them and make it easier to get it right.
> Rickard who have implemented this should explain it.
> If I have understood it correctly, it works like this: > If a scheduler do not have any work to do it will be disabled. > It will be disabled until a live thread discovers it have to much work > and > wakes a sleeping scheduler. The run-queues are only checked when > processes > are scheduled.
> Since in this case the only living scheduler is busy for a very long > time, > no queue checking will be done and the all schedulers will be blocked > until > the call to the driver is complete.
> We had a long discussion during lunch about it, and we didn't agree > how it should > work. :-)
> I agree that zlib is broken and it should be fixed but I still believe > that it > breaks the rule about least astonishment, if I have 16 schedulers and > one is blocked > in a long function call I still expect other code to be invoked. > Rickards thought is that > such call should never happen and should be called through an async > driver or a separate > thread. I guess it will take a couple of more lunches to come to a > conclusion :-)
> /Dan
> On Fri, Jan 21, 2011 at 10:25 PM, Ryan Zezeski <rzeze...@gmail.com> > wrote: > > Dan,
> > Thanks for the reply, I'll be sure to chunk my data. I was using > the gzip/1 > > call for convenience.
> > That said, I'm still a little fuzzy on something you said. Why is > it that > > the "distribution" process is scheduled on the same scheduler that's > running > > the call to the driver? Why not schedule it on one of the 15 other > > schedulers that are currently sleeping? Does this mean any other > message I > > send will also be blocked? Dare I ask, how does the scheduling > work > > exactly?
> > -Ryan
> > On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> > wrote:
> >> All c-calls blocks a schedulers, if they are not pushed out to a > thread.
> >> In this case it's a bug in the zlib module (probably by me) gzip > should > >> chunk up the input before invoking the driver.
> >> What happens is that all schedulers go to sleep because there is no > work to > >> do, > >> except the one invoking the driver, a ping is received and wakes > up > >> the "distribution" process > >> which gets queued up on only scheduler that is awake, but that > >> scheduler is blocked > >> in an "eternal" call. The pings never become processed and the > >> distributions times out.
> >> You can wait for a patch or use zlib api to chunk up compression > your self, > >> see > >> implementation of gzip in zlib module.
> >> /Dan
> >> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> > wrote: > >> > So...can anyone explain to me why zlib:gzip/1 is causing the > net_kernel > >> tick > >> > to be blocked? Do linked-in drivers block it's scheduler like > NIFs? I'm > >> > really curious on this one :)
> >> > -Ryan
> >> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski > <rzeze...@gmail.com> > >> wrote:
> >> >> Apologies, the example I copied was run on my mac.
> >> >> This is what I have on the actual production machine:
> >> >> To be certain, I ran the same example (except this time using > two > >> physical > >> >> machines) and achieved the same result. Namely, the 'bar' node > claims > >> 'foo' > >> >> is not responding and thus closes the connection. Whatever this > is, > >> I've > >> >> now easily reproduced it on two different OSs, with 2 different > Erlang > >> >> versions.
> >> >> -Ryan
> >> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea > <alain.o...@gmail.com> > >> wrote:
> >> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> > wrote:
> >> >>> > Hi everyone,
> >> >>> > Some of you may remember my latest question where I was > having weird > >> >>> node > >> >>> > timeout issues that I couldn't explain and I thought it might > be > >> related > >> >>> to > >> >>> > the messages I was passing between my nodes. Well, I > pinpointed the > >> >>> problem > >> >>> > to a call to zlib:gzip/1. At first I was really surprised by > this, > >> as > >> >>> such > >> >>> > a harmless line of code surely should have nothing to do with > the > >> >>> ability > >> >>> > for my nodes to communicate. However, as I dug further I > realized > >> gzip > >> >>> was > >> >>> > implemented as a linked-in driver and I remember reading > things about > >> >>> how > >> >>> > one has to take care with them because they can trash the VM > with > >> them. > >> >>> I > >> >>> > don't remember reading anything about them blocking code, and > even if > >> >>> they > >> >>> > do I fail to see why my SMP enabled node (16 cores) would > allow this > >> one > >> >>> > thread to block the tick. It occurred to me that maybe the > scheduler > >> >>> > responsible for that process is the one blocked by the > driver. Do > >> >>> processes > >> >>> > have scheduler affinity? That would make sense, I guess.
> >> >>> > I've "fixed" this problem simply by using a plain port (i.e. > run in > >> it's > >> >>> own > >> >>> > OS process). For my purposes, this actually makes more sense > in the > >> >>> > majority of the places I was making use of gzip. Can > someone > >> enlighten > >> >>> me > >> >>> > as to exactly what is happening behind the scenes?
> >> >>> > To reproduce I create a random 1.3GB file:
> >> >>> > Then start two named nodes 'foo' and 'bar', connect them, > read in the > >> >>> file, > >> >>> > and then compress said file. Sometime later (I think around > 60+ > >> >>> seconds) > >> >>> > the node 'bar' will claim that 'foo' is not responding.
> >> >>> Your SMP node seems to be capped at smp:2:2 when it out to be > smp:16. > >> >>> Some resource limit may be holding back the system. That said > zlib > >> should > >> >>> not ever cause this issue.
Interestingly enough, my coworker and I got into a debate about whether this is good behavior or not. He (my coworker) focused on the fact that there is a misbehaving entity that is doing something it shouldn't be doing (and therefore a "node down" is the correct response) whereas I focused on the semantics of the net_kernel tick. To me, the tick is nothing more than doing a ping every N seconds. If so many pings go unanswered, then the node must be assumed to be down. This makes perfect sense to me. However, in my case I have two nodes, each running Erlang SMP, each with 16 schedulers. One node decides to compress a very large file using a linked-in driver all in one go causing the only live scheduler to be blocked. Meanwhile, the other node sends a tick every 15s waiting for a response. It turns out that the tick response process happens to be scheduled on the same scheduler as the long-running, blocking process even though there are 15 idle schedulers waiting in the wings. The compression happens to take S seconds, which turns out to be greater than the threshold T that the tick response process had to respond. This causes the other node to consider it down and the socket is killed. This is wrong because a) the node was never down, and in fact was still doing work but the tick response got stuck behing a long-running process in the queue (like getting your license at the MVA :) ) and b) the network between the two was never compromised. Both nodes are still very much alive and doing work yet they disconnected from each other, which doesn't make any sense to me.
I guess the upshot is to be very careful with linked-in code, not only because it can crash the VM (which is the common warning) but because it can block critical proceses that will affect the system in unforeseen and obscure ways.
-Ryan
On Sun, Jan 23, 2011 at 11:04 AM, Robert Virding <
robert.vird...@erlang-solutions.com> wrote: > There are really two different problems:
> 1. No run-queue checking if the only living scheduler (schedulers ?) is > blocked. > 2. zlib is written in a blocking way.
> Both should be fixed though the first is the more serious. It will also > become serious as NIFs become more used. While "hardliner me" says that NIF > writers have themselves to blame if they block the system and that they > should RTFM, "softliner me" says that we should probably try to help them > and make it easier to get it right.
> > Rickard who have implemented this should explain it.
> > If I have understood it correctly, it works like this: > > If a scheduler do not have any work to do it will be disabled. > > It will be disabled until a live thread discovers it have to much work > > and > > wakes a sleeping scheduler. The run-queues are only checked when > > processes > > are scheduled.
> > Since in this case the only living scheduler is busy for a very long > > time, > > no queue checking will be done and the all schedulers will be blocked > > until > > the call to the driver is complete.
> > We had a long discussion during lunch about it, and we didn't agree > > how it should > > work. :-)
> > I agree that zlib is broken and it should be fixed but I still believe > > that it > > breaks the rule about least astonishment, if I have 16 schedulers and > > one is blocked > > in a long function call I still expect other code to be invoked. > > Rickards thought is that > > such call should never happen and should be called through an async > > driver or a separate > > thread. I guess it will take a couple of more lunches to come to a > > conclusion :-)
> > /Dan
> > On Fri, Jan 21, 2011 at 10:25 PM, Ryan Zezeski <rzeze...@gmail.com> > > wrote: > > > Dan,
> > > Thanks for the reply, I'll be sure to chunk my data. I was using > > the gzip/1 > > > call for convenience.
> > > That said, I'm still a little fuzzy on something you said. Why is > > it that > > > the "distribution" process is scheduled on the same scheduler that's > > running > > > the call to the driver? Why not schedule it on one of the 15 other > > > schedulers that are currently sleeping? Does this mean any other > > message I > > > send will also be blocked? Dare I ask, how does the scheduling > > work > > > exactly?
> > > -Ryan
> > > On Fri, Jan 21, 2011 at 5:16 AM, Dan Gudmundsson <d...@erlang.org> > > wrote:
> > >> All c-calls blocks a schedulers, if they are not pushed out to a > > thread.
> > >> In this case it's a bug in the zlib module (probably by me) gzip > > should > > >> chunk up the input before invoking the driver.
> > >> What happens is that all schedulers go to sleep because there is no > > work to > > >> do, > > >> except the one invoking the driver, a ping is received and wakes > > up > > >> the "distribution" process > > >> which gets queued up on only scheduler that is awake, but that > > >> scheduler is blocked > > >> in an "eternal" call. The pings never become processed and the > > >> distributions times out.
> > >> You can wait for a patch or use zlib api to chunk up compression > > your self, > > >> see > > >> implementation of gzip in zlib module.
> > >> /Dan
> > >> On Fri, Jan 21, 2011 at 2:48 AM, Ryan Zezeski <rzeze...@gmail.com> > > wrote: > > >> > So...can anyone explain to me why zlib:gzip/1 is causing the > > net_kernel > > >> tick > > >> > to be blocked? Do linked-in drivers block it's scheduler like > > NIFs? I'm > > >> > really curious on this one :)
> > >> > -Ryan
> > >> > On Tue, Jan 18, 2011 at 6:53 PM, Ryan Zezeski > > <rzeze...@gmail.com> > > >> wrote:
> > >> >> Apologies, the example I copied was run on my mac.
> > >> >> This is what I have on the actual production machine:
> > >> >> To be certain, I ran the same example (except this time using > > two > > >> physical > > >> >> machines) and achieved the same result. Namely, the 'bar' node > > claims > > >> 'foo' > > >> >> is not responding and thus closes the connection. Whatever this > > is, > > >> I've > > >> >> now easily reproduced it on two different OSs, with 2 different > > Erlang > > >> >> versions.
> > >> >> -Ryan
> > >> >> On Tue, Jan 18, 2011 at 6:04 PM, Alain O'Dea > > <alain.o...@gmail.com> > > >> wrote:
> > >> >>> On 2011-01-18, at 18:54, Ryan Zezeski <rzeze...@gmail.com> > > wrote:
> > >> >>> > Hi everyone,
> > >> >>> > Some of you may remember my latest question where I was > > having weird > > >> >>> node > > >> >>> > timeout issues that I couldn't explain and I thought it might > > be > > >> related > > >> >>> to > > >> >>> > the messages I was passing between my nodes. Well, I > > pinpointed the > > >> >>> problem > > >> >>> > to a call to zlib:gzip/1. At first I was really surprised by > > this, > > >> as > > >> >>> such > > >> >>> > a harmless line of code surely should have nothing to do with > > the > > >> >>> ability > > >> >>> > for my nodes to communicate. However, as I dug further I > > realized > > >> gzip > > >> >>> was > > >> >>> > implemented as a linked-in driver and I remember reading > > things about > > >> >>> how > > >> >>> > one has to take care with them because they can trash the VM > > with > > >> them. > > >> >>> I > > >> >>> > don't remember reading anything about them blocking code, and > > even if > > >> >>> they > > >> >>> > do I fail to see why my SMP enabled node (16 cores) would > > allow this > > >> one > > >> >>> > thread to block the tick. It occurred to me that maybe the > > scheduler > > >> >>> > responsible for that process is the one blocked by the > > driver. Do > > >> >>> processes > > >> >>> > have scheduler affinity? That would make sense, I guess.
> > >> >>> > I've "fixed" this problem simply by using a plain port (i.e. > > run in > > >> it's > > >> >>> own > > >> >>> > OS process). For my purposes, this actually makes more sense > > in the > > >> >>> > majority of the places I was making use of gzip. Can > > someone > > >> enlighten > > >> >>> me > > >> >>> > as to exactly what is happening behind the scenes?
> > >> >>> > To reproduce I create a random 1.3GB file:
> > >> >>> > Then start two named nodes 'foo' and 'bar', connect them, > > read in the > > >> >>> file, > > >> >>> > and then compress said file. Sometime later (I think around > > 60+ > > >> >>> seconds) > > >> >>> > the node 'bar' will claim that 'foo' is not responding.
> > >> >>> Your SMP node seems to be capped at smp:2:2 when it out to be > > smp:16. > > >> >>> Some resource limit may be holding back the system. That said > > zlib > > >> should > > >> >>> not ever cause this issue.