Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

asynchat sends data on async_chat.push and .push_with_producer

19 views
Skip to first unread message

ludvig....@gmail.com

unread,
May 13, 2008, 7:43:02 AM5/13/08
to
Hello,

My question concerns asynchat in particular. With the following half-
pseudo code in mind:

class Example(asynchat.async_chat):
def readable(self):
if foo:
self.push_with_producer(ProducerA())
return asynchat.async_chat.readable(self)

Now, asyncore will call the readable function just before a select(),
and it is meant to determine whether or not to include that asyncore
dispatcher in the select map for reading.

The problem with this code is that it has the unexpected side-effect
of _immediately_ trying to send, disregarding if the async_chat object
is indeed writable or not.

The asynchat.push_with_producer (and .push as well)
call .initiate_send(), which in turn calls .send if there's data
buffered. While this might seem logical, it isn't at all.

Insinuate that when Example.readable is called, the socket has already
been closed. There are two possible scenarios where it could be
closed. a) The remote endpoint closed the connection, and b) the
producer ProducerA somehow closed the connection (my case).

Obviously, calling send on a socket that has been closed will result
in an error - EBADF, "Bad file descriptor".

So, my question is: Why does asynchat.push* call self.initiate_send?
Nothing in the name "push" suggests that it'll transmit immediately,
disregarding potential "closedness". Removing the two calls
to .initiate_send() in the two push functions would still mean data is
sent, but only when data can be sent - which is, IMO, what should be
done.

Thankful for insights,
Ludvig.

Josiah Carlson

unread,
May 13, 2008, 11:59:04 AM5/13/08
to
Ludvig,

In a substantial way, I agree with you. Calling initiate_send()
within push or push_with_producer is arguably a misfeature (which you
have argued).

In a pure world, the only writing that is done would be within the
handle_send() callbacks within the select loop. Then again, in a
perfect world, calling readable() and writable() would have no strange
side affects (as your example below has), and all push*() calls would
be made within the handle_*() methods.

We do not live in a pure world, Python isn't pure (practicality beats
purity), and by attempting to send some data each time a .push*()
method is called, there are measurable increases in transfer rates.

In the particular case you are looking at (and complaining about ;) ),
if you want to bypass the initiate_send() call, you can dig into the
particular implementation of asynchat you are using (the internals may
change in 2.6 and 3.x versus 2.5 and previous), and append your output
to the outgoing queue. You could even abstract out the push*() calls
for a non-auto-sending version (easy), write your own initiate_send()
method that checks the stack to verify that it's being called from
handle_send() (also easy), or any one of many other work-arounds.

Yes, it would be convenient to not have push*() actually send data
when called in some cases, but in others, the increase in data
transfer rates and/or reduction in latency is substantial.

- Josiah

ludvig....@gmail.com

unread,
May 13, 2008, 12:35:14 PM5/13/08
to
> In a pure world, the only writing that is done would be within the
> handle_send() callbacks within the select loop.  Then again, in a
> perfect world, calling readable() and writable() would have no strange
> side affects (as your example below has), and all push*() calls would
> be made within the handle_*() methods.

It wouldn't have those side-effects if push really just pushed. :-P

> We do not live in a pure world, Python isn't pure (practicality beats
> purity), and by attempting to send some data each time a .push*()
> method is called, there are measurable increases in transfer rates.

-- 8< --

> Yes, it would be convenient to not have push*() actually send data
> when called in some cases, but in others, the increase in data
> transfer rates and/or reduction in latency is substantial.

If it increases transfer speed that much, the calling application
almost has to be broken, or at least not designed as it should be - of
course there are such applications, but you know...

Anyway, I went for a subclassing way of dealing with it, and it works
fine.

Thanks for the reply though, hadn't considered possibly "flawed"
applications where the asyncore loop isn't revisited as often as it
should. :->
Ludvig

Giampaolo Rodola'

unread,
May 13, 2008, 2:50:30 PM5/13/08
to
On 13 Mag, 17:59, Josiah Carlson <josiah.carl...@gmail.com> wrote:

> We do not live in a pure world, Python isn't pure (practicality beats
> purity), and by attempting to send some data each time a .push*()
> method is called, there are measurable increases in transfer rates.

Good point. I'd like to ask a question: if we'd have a default
asyncore.loop timeout of (say) 0.01 ms instead of 30 could we avoid
such problem?
I've always found weird that asyncore has such an high default timeout
value.
Twisted, for example, uses a default of 0.01 ms for all its reactors.

Giampaolo Rodola'

unread,
May 13, 2008, 2:52:03 PM5/13/08
to
On 13 Mag, 18:35, "ludvig.eric...@gmail.com"
<ludvig.eric...@gmail.com> wrote:

> Anyway, I went for a subclassing way of dealing with it, and it works
> fine.

As Josiah already stated pay attention to the changes that will be
applied to asyncore internals in Python 2.6 and 3.0 (in detail you
could take a look at how things will be changed by taking a look at
the patch provided in bug #1736190).
Your subclass could not work on all implementations.

--- Giampaolo
http://code.google.com/p/pyftpdlib

Jean-Paul Calderone

unread,
May 13, 2008, 4:16:13 PM5/13/08
to pytho...@python.org

I'm not sure this is right. What timeout are we talking about? Twisted
only wakes up when necessary.

Jean-Paul

Giampaolo Rodola'

unread,
May 13, 2008, 5:05:43 PM5/13/08
to

I'm talking about the asyncore.loop timeout parameter which defaults
to 30 (seconds).
I don't think that Twisted only wakes up when necessary (surely not by
using the select reactor).
Think about the schedule calls feature (reactor.callLater).
To have that work I guess that a continuous loop must always be kept
alive, regardless of the reactor used.

--- Giampaolo
http://code.google.com/p/pyftpdlib

Jean-Paul Calderone

unread,
May 13, 2008, 5:25:39 PM5/13/08
to pytho...@python.org

To support scheduling calls, you just have to know when the next call is
going to happen. Then, you can wake up at exactly that time. This is
what Twisted does, even for select reactor. ;)

An exception to this is that on Windows, in order to support ^C, the
reactor will wake up more often, because ^C does not interrupt select()
in Python on Windows (although this could probably be fixed).

Jean-Paul

Giampaolo Rodola'

unread,
May 13, 2008, 7:44:11 PM5/13/08
to

Yes but how do you know when it's the time to fire up a call without
using a thread?
You are forced to call time.time() periodically and check if that time
had come every time.
Take a look at twisted/internet/base/ReactorBase.runUntilCurrent.
That's where that should happen.


--- Giampaolo
http://code.google.com/p/pyftpdlib

Jean-Paul Calderone

unread,
May 13, 2008, 8:56:03 PM5/13/08
to pytho...@python.org
On Tue, 13 May 2008 16:44:11 -0700 (PDT), Giampaolo Rodola' <gne...@gmail.com> wrote:
>
> [snip]

>>
>> To support scheduling calls, you just have to know when the next call is
>> going to happen. Then, you can wake up at exactly that time. This is
>> what Twisted does, even for select reactor. ;)
>
>Yes but how do you know when it's the time to fire up a call without
>using a thread?
>You are forced to call time.time() periodically and check if that time
>had come every time.
>Take a look at twisted/internet/base/ReactorBase.runUntilCurrent.
>That's where that should happen.
>

Why? Isn't this why subtraction exists? If there is a call scheduled to
happen at T1 and the current time is T2, then I know that after (T1 - T2)
elapses, it will be time to run the call. Why do I have to do any checks
at all? I just tell select() to wait that long. Presumably this is just
what someone will do if they want to use asyncore with timed calls. Call
asyncore.loop() in a loop, always passing (T1 - T2) as the timeout value.

So, actually, I'm not sure what the disagreement is about. ;) The
default value for the timeout parameter to the loop function seems
somewhat irrelevant. If someone wants timed events in their loop,
asyncore isn't standing in their way. On the other hand, I didn't
this thread (or maybe just this part of the thread) start out with
a question about asyncore throughput? I have no idea what that
might have to do with this.

Jean-Paul

Jean-Paul Calderone

unread,
May 13, 2008, 8:59:28 PM5/13/08
to pytho...@python.org
On Tue, 13 May 2008 20:56:03 -0400, Jean-Paul Calderone <exa...@divmod.com> wrote:
>On Tue, 13 May 2008 16:44:11 -0700 (PDT), Giampaolo Rodola'
><gne...@gmail.com> wrote:
>>
>>[snip]
>>>
>>>To support scheduling calls, you just have to know when the next call is
>>>going to happen. Then, you can wake up at exactly that time. This is
>>>what Twisted does, even for select reactor. ;)
>>
>>Yes but how do you know when it's the time to fire up a call without
>>using a thread?
>>You are forced to call time.time() periodically and check if that time
>>had come every time.
>>Take a look at twisted/internet/base/ReactorBase.runUntilCurrent.
>>That's where that should happen.
>
>Why? Isn't this why subtraction exists? If there is a call scheduled to
>happen at T1 and the current time is T2, then I know that after (T1 - T2)
>elapses, it will be time to run the call. Why do I have to do any checks
>at all? I just tell select() to wait that long. Presumably this is just
>what someone will do if they want to use asyncore with timed calls. Call
>asyncore.loop() in a loop, always passing (T1 - T2) as the timeout value.

Ah, of course this is wrong, since asyncore.loop... loops. :P I meant to
say asyncore.poll() here.

Giampaolo Rodola'

unread,
May 13, 2008, 9:20:29 PM5/13/08
to
On 14 Mag, 02:56, Jean-Paul Calderone <exar...@divmod.com> wrote:

> Why?  Isn't this why subtraction exists?  If there is a call scheduled to
> happen at T1 and the current time is T2, then I know that after (T1 - T2)
> elapses, it will be time to run the call.  Why do I have to do any checks
> at all?  I just tell select() to wait that long.  Presumably this is just
> what someone will do if they want to use asyncore with timed calls.  Call
> asyncore.loop() in a loop, always passing (T1 - T2) as the timeout value.

That doesn't work if I decide to schedule one or more calls AFTER the
loop has started (see: "I already told select what timeout use").
As far as I've understood by reading the Twisted core what happens is
that there's a heap of scheduled calls and a loop that keeps calling
time.time() to check the scheduled functions due to expire soonest.
I've also proposed a patch for asyncore using the same approach:
http://bugs.python.org/issue1641

> So, actually, I'm not sure what the disagreement is about. ;)  

The current disagreement is about how Twisted timed events are
implemented. :)

> On the other hand, I didn't
> this thread (or maybe just this part of the thread) start out with
> a question about asyncore throughput?  I have no idea what that
> might have to do with this.

Nothing, we just finished OT. :)


--- Giampaolo
http://code.google.com/p/pyftpdlib

Josiah Carlson

unread,
May 13, 2008, 9:29:16 PM5/13/08
to
On May 13, 9:35 am, "ludvig.eric...@gmail.com"

<ludvig.eric...@gmail.com> wrote:
> > In a pure world, the only writing that is done would be within the
> > handle_send() callbacks within the select loop. Then again, in a
> > perfect world, calling readable() and writable() would have no strange
> > side affects (as your example below has), and all push*() calls would
> > be made within the handle_*() methods.
>
> It wouldn't have those side-effects if push really just pushed. :-P
>
> > We do not live in a pure world, Python isn't pure (practicality beats
> > purity), and by attempting to send some data each time a .push*()
> > method is called, there are measurable increases in transfer rates.
>
> -- 8< --
>
> > Yes, it would be convenient to not have push*() actually send data
> > when called in some cases, but in others, the increase in data
> > transfer rates and/or reduction in latency is substantial.
>
> If it increases transfer speed that much, the calling application
> almost has to be broken, or at least not designed as it should be - of
> course there are such applications, but you know...

It's not a matter of being broken at all, it's a matter of control
flow. When we immediately try to send whenever a .push() call is
made, the underlying TCP/IP stack will accept a reasonably large
amount of data before it actually fills up (the most recent FreeBSD,
from what I understand, will accept up to 1 meg, which is how they are
able to saturate 10Gbit links), and by tossing the data into the the
TCP/IP buffer early, the data gets sent earlier, thus reducing
latency.

Further, because we are making more actual calls to socket.send(),
assuming the underlying TCP/IP buffer isn't filled (which may or may
not be a good assumption), and assuming that the link has more
capacity than is being used (usually the case on LANs and high-speed
internet links), putting more data into the buffer to be handled by
the underlying link layers will also increase transfer speeds.

When the socket.send() calls are delayed until the next pass through
the loop, and we aren't doing an initial send, then we don't get the
benefit of the underlying TCP/IP socket layer buffering.

In my experience over high-speed connections (LANs, Gbit WAN links,
local machine connections), I have found that increasing block sizes
to 32k to significantly improve performance for bandwidth constrained
applications, as there are far fewer blocks to toss to the underlying
layers, less Python code execution (Python 2.5 has a default block
size of 512 bytes, or 64x as much Python execution to send the same
amount of data, and one of the proposed 2.6 changes is to up this to a
more reasonable 4096 bytes), and more effective use of the TCP/IP
buffers (which are typically 64k or larger).

> Anyway, I went for a subclassing way of dealing with it, and it works
> fine.
>
> Thanks for the reply though, hadn't considered possibly "flawed"
> applications where the asyncore loop isn't revisited as often as it
> should. :->
> Ludvig

Again, it's not about the application being flawed, it's a matter of
control flow. ;) Also, it's not a matter of any timeouts in the
select/poll loops (as Giampaolo suggested); if any socket is readable
or writable, those calls will return immediately (a few hundred
microseconds per call isn't bad).

- Josiah

Jean-Paul Calderone

unread,
May 14, 2008, 6:20:05 AM5/14/08
to pytho...@python.org
On Tue, 13 May 2008 18:20:29 -0700 (PDT), Giampaolo Rodola' <gne...@gmail.com> wrote:
>On 14 Mag, 02:56, Jean-Paul Calderone <exar...@divmod.com> wrote:
>
>> Why? Isn't this why subtraction exists? If there is a call scheduled to
>> happen at T1 and the current time is T2, then I know that after (T1 - T2)
>> elapses, it will be time to run the call. Why do I have to do any checks
>> at all? I just tell select() to wait that long. Presumably this is just
>> what someone will do if they want to use asyncore with timed calls. Call
>> asyncore.loop() in a loop, always passing (T1 - T2) as the timeout value.
>
>That doesn't work if I decide to schedule one or more calls AFTER the
>loop has started (see: "I already told select what timeout use").
>As far as I've understood by reading the Twisted core what happens is
>that there's a heap of scheduled calls and a loop that keeps calling
>time.time() to check the scheduled functions due to expire soonest.
>I've also proposed a patch for asyncore using the same approach:
>http://bugs.python.org/issue1641

This isn't how it works. You're misreading the Twisted implementation. I
don't know how to explain it any more clearly.

Jean-Paul

0 new messages