Hello Donald, all,
Some thoughts inline below.
> On 06 May 2016, at 18:11, Donald Stufft <don...@stufft.io> wrote:
>
> For an example, in traditional HTTP servers where you have an open connection
> associated with whatever view code you're running whenever the client
> disconnects you're given a few options of what you can do, but the most common
> option in my experience is that once the connection has been lost the HTTP
> server cancels the execution of whatever view code it had been running [1].
> This allows a single process to serve more by shedding the load of connections
> that have since been disconnected for some reason, however in ASGI since
> there's no way to remove an item from the queue or cancel it once it has begun
> to be processed by a worker proccess you lose out on this ability to shed the
> load of processing a request once it has already been scheduled.
In theory this effect is possible. However I don't think it will make a
measurable difference in practice. A Python server will usually process
requests quickly and push the response to a reverse-proxy. It should have
finished to process the request by the time it's reasonable to assume the
client has timed-out.
This would only be a problem when serving extremely large responses in Python,
which is widely documented as a performance anti-pattern that must be avoided
at all costs. So if this effect happens, you have far worse problems :-)
> This additional complexity incurred by the message bus also ends up requiring
> additional complexity layered onto ASGI to try and re-invent some of the
> "natural" features of TCP and/or HTTP (or whatever the underlying protocol is).
> An example of this would be the ``order`` keyword in the WebSocket spec,
> something that isn't required and just naturally happens whenever you're
> directly connected to a websocket because the ``order`` is just whatever bytes
> come in off the wire.
I'm somewhat concerned by this risk. Out-of-order processing of messages
coming from a single connection could cause surprising bugs. This is likely one
of the big tradeoffs of the async-to-sync conversion channels operates. I
assume it will have to be documented.
Could someone confirm that this doesn't happen for regular HTTP/1.1 requests?
I suppose channels encodes each HTTP/1.1 request as a single message.
Note that out of order processing is already possible without channels e.g.
due to network latency or high load on a worker.
The design of channels seems similar to HTTP/2 — a bunch of messages sent in
either direction with no pretense to synchronize communications. This is a
scary model but I guess we'll have to live with it anyway...
Does anyone know if HTTP/2 allows sending responses out of order? This would
make sub-optimal handling of HTTP/1.1 pipelining less of a concern going
forwards. We could live with a less efficient implementation.
> I believe the introduction of a message bus here makes things inherently more
> fragile. In order to reasonable serve web sockets you're now talking about a
> total of 3 different processes that need to be run (Daphne, Redis, and Django)
> each that will exhibit it's own failure conditions and introduces additional
> points of failure. Now this in itself isn't the worst thing because that's
> often times unavoidable anytime you scale beyond a single process, but ASGI
> adds that complication much sooner than more traditional solutions do.
Yes, that’s my biggest concern with channels. However I haven’t seen anyone
suggesting fewer than three systems:
- frontend + queue + worker (e.g. channels)
- regular HTTP + websockets + pub/sub (e.g. what Mark Lavin described)
I share Mark’s concerns about handling short- and long-lived connections in
the same process. Channels solves this elegantly by converting long-lived
connections to a series of events to handle.
This sounds a lot like the proof-of-concept I demonstrated at DjangoCon US
2013, eventually reaching the conclusion that this wasn't a workable model,
mainly due to:
- the impossibility of mixing async and sync code in Python, because of the
explicit nature of async code written on top of asyncio (which I still
believe is the right choice even though it's a problem for Django).
- the great difficulty of implementing the ORM's APIs on top of an async
solution (although I came up with new ideas since then; also Amber Brown
showed an interesting proof-of-concept on top of Twisted at Django under
the Hood 2015).
I think it's important to keep a straightforward WSGI backend in case we crack
this problem and build an async story that depends on asyncio after dropping
support for Python 2.
I don't think merging channels as it currently stands hinders this possibility
in any way, on the contrary. The more Django is used for serving HTTP/2 and
websockets, the more we can learn.
Sorry Andrew, that was yet another novel to read… I hope it helps anyway…
ISTM that the strongest argument in favor is that I think it _is_
significantly easier for a casual user to build and deploy their first
websockets app using Channels than using any other currently-available
approach with Django. Both channels and Django+whatever-async-server
require managing multiple servers, but channels makes a lot of decisions
for you and makes it really easy to keep all your code together. And (as
long as we still support plain WSGI) it doesn't remove the flexibility
for more advanced users who prefer different tradeoffs to still choose
other approaches. There's a lot to be said for that combination of
"accessible for the new user, still flexible for the advanced user", IMO.
In short, I think that the message bus adds an additional layer of complexity
that makes everything a bit more complex and complicated for very little actual
gain over other possible, but less complex solutions. This message bus also
removes a key part of the amount of control that the server which is *actually*
receiving the connection has over the lifetime and process of the eventual
request.
For an example, in traditional HTTP servers where you have an open connection
associated with whatever view code you're running whenever the client
disconnects you're given a few options of what you can do, but the most common
option in my experience is that once the connection has been lost the HTTP
server cancels the execution of whatever view code it had been running [1].
This allows a single process to serve more by shedding the load of connections
that have since been disconnected for some reason, however in ASGI since
there's no way to remove an item from the queue or cancel it once it has begun
to be processed by a worker proccess you lose out on this ability to shed the
load of processing a request once it has already been scheduled.
This additional complexity incurred by the message bus also ends up requiring
additional complexity layered onto ASGI to try and re-invent some of the
"natural" features of TCP and/or HTTP (or whatever the underlying protocol is).
An example of this would be the ``order`` keyword in the WebSocket spec,
something that isn't required and just naturally happens whenever you're
directly connected to a websocket because the ``order`` is just whatever bytes
come in off the wire. This also gets exposed in other features, like
backpressure where ASGI didn't currently have a concept of allowing the queue
to apply back pressure to the web connection but now Andrew has started to come
around to the idea of adding a bounding to the queue (which is good!) but if
the indirection of the message bus hadn't been added, then backpressure would
have naturally occurred whenever you ended up getting enough things processing
that it blocked new connections from being ``accept``d which would eventually
end up filling up the backlog and then making new connections hang block
waiting to connect. Now it's good that Andrew is adding the ability to bound
the queue, but that is something that is going to require care to tune in each
individual deployment (and will need regularly re-evaluated) rather than
something that just occurs naturally as a consequence of the design of the
system.
Anytime you add a message bus you need to make a few trade offs, the particular
trade off that ASGI made is that it should prefer "at most once" delivery of
messages and low latency to guaranteed delivery. This choice is likely one of
the sanest ones you can make in regards to which trade offs you make for the
design of ASGI, but in that trade off you end up with new problems that don't
exist otherwise. For example, HTTP/1 has the concept of pipelining which allows
you to make several HTTP requests on a single HTTP connection without waiting
for the responses before sending each one. Given the nature of ASGI it would be
very difficult to actually support this feature without either violating the
RFC or forcing either Daphne or the queue to buffer potentially huge responses
while it waits for another request that came before it to be finished whereas
again you get this for free using either async IO (you just don't await the
result of that second request until the first request has been processed) or
with WSGI if you're using generators (you just don't iterate over the result
until you're ready for it).
ASGI purports to make it easier to gracefully restart your servers by making it
possible to restart the worker servers (since there is no long live open
connections to them) and simply spin up new ones. However, that's not really
the whole story, because while that is true, it really only exists as long as
your code changes don't touch something that Daphne needs to be aware of in
order to process incoming requests. As soon as Daphne needs restarted then
you're back in the same boat of needing another solution to graceful restarts
and since Daphne depends on project specific code, it's going to require to be
restarted much more frequently than other solutions that don't. It appears to
me like it would be difficult to be able to automatically determine whether or
not Daphne needs a restart on any particular deployment, so it will be common
for people to just need to restart the whole stack anyways.
On May 6, 2016, at 1:45 PM, Andrew Godwin <and...@aeracode.org> wrote:Want to just cover a few more things I didn't in my reply to Aymeric.On Fri, May 6, 2016 at 9:11 AM, Donald Stufft <don...@stufft.io> wrote:
In short, I think that the message bus adds an additional layer of complexity
that makes everything a bit more complex and complicated for very little actual
gain over other possible, but less complex solutions. This message bus also
removes a key part of the amount of control that the server which is *actually*
receiving the connection has over the lifetime and process of the eventual
request.True; however, having a message bus/channel abstraction also removes a layer of complexity that is caring about socket handling and sinking your performance by even doing a slightly blocking operation.In an ideal world we'd have some magical language that let us all write amazing async code and that detected all possible deadlocks or livelocks before they happened, but that's not yet the case, and I think the worker model has been a good substitute for it in software design generally.
For an example, in traditional HTTP servers where you have an open connection
associated with whatever view code you're running whenever the client
disconnects you're given a few options of what you can do, but the most common
option in my experience is that once the connection has been lost the HTTP
server cancels the execution of whatever view code it had been running [1].
This allows a single process to serve more by shedding the load of connections
that have since been disconnected for some reason, however in ASGI since
there's no way to remove an item from the queue or cancel it once it has begun
to be processed by a worker proccess you lose out on this ability to shed the
load of processing a request once it has already been scheduled.But as soon as you introduce a layer like Varnish into the equation, you've lost this anyway, as you're no longer seeing the true client socket. Abandoned requests are an existent problem with HTTP and WSGI; I see them in our logs all the time.
ASGI purports to make it easier to gracefully restart your servers by making it
possible to restart the worker servers (since there is no long live open
connections to them) and simply spin up new ones. However, that's not really
the whole story, because while that is true, it really only exists as long as
your code changes don't touch something that Daphne needs to be aware of in
order to process incoming requests. As soon as Daphne needs restarted then
you're back in the same boat of needing another solution to graceful restarts
and since Daphne depends on project specific code, it's going to require to be
restarted much more frequently than other solutions that don't. It appears to
me like it would be difficult to be able to automatically determine whether or
not Daphne needs a restart on any particular deployment, so it will be common
for people to just need to restart the whole stack anyways.Daphne only depends on one tiny piece of project code, the channel layer configuration. I don't imagine that changing nearly as often as actual business logic. You're right that once there's a new Daphne version or that config changes, it needs a restart too, but that's not going to be very common.
I'm not saying my solution is perfect, I'm saying it's pragmatic given our current position and likely future position. Channels adds a spectrum to Django where you can run it on anything between a single process, a single machine (with the IPC channel layer), or a cluster of machines.I look forward to Python async being in a better place in five to ten years so we can revisit this and improve things (but hopefully keep a similar end-developer API, which I think is quite nice to use and reflects URL routing and view writing in a nice way), but I believe we need something that works well now, which means taking a few tradeoffs along the way; after all, it's not going to be forced on anyone, WSGI will still be there for a long time to come*.(*At least until I get around to working out what an in-process asyncio WSGI replacement with WebSocket support might look like)Andrew
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFwN1urfvxwUsGSsk3UHLMqZwrqTYfaCvgFQqfFqM%2BiGtkRUmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On 06 May 2016, at 19:59, Donald Stufft <don...@stufft.io> wrote:On May 6, 2016, at 1:45 PM, Andrew Godwin <and...@aeracode.org> wrote:On Fri, May 6, 2016 at 9:11 AM, Donald Stufft <don...@stufft.io> wrote:
So what sort of solution would I personally advocate had I the time or energy
to do so? I would look towards what sort of pure Python API (like WSGI itself)
could be added to allow a web server to pass websockets down into Django.
I agree with the want to use things like HAProxy in the stack, but I think your idea of handling WebSockets natively in Django is far more difficult and fragile than Channels is, mostly due to our ten-year history of synchronous code. We would have to audit a large amount of the codebase to ensure it was all async compatible, not to mention drop python 2 suport, before we'd even get close.You don’t need to write it asynchronously. You need an async server but that async server can execute synchronous code just fine using something like deferToThread. That’s how twistd -n web —wsgi works today. It gets a request and it deferToThread’s it to synchronous WSGI code.
On May 6, 2016, at 3:49 PM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:Sure, this works for WSGI, but barring significant changes to Django, it doesn’t make it convenient to handle WSGI synchronously and WebSockets asynchronously with the same code base, let alone in the same process.
On 06 May 2016, at 21:56, Donald Stufft <don...@stufft.io> wrote:On May 6, 2016, at 3:49 PM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:Sure, this works for WSGI, but barring significant changes to Django, it doesn’t make it convenient to handle WSGI synchronously and WebSockets asynchronously with the same code base, let alone in the same process.User level code would not be handling WebSockets asynchronously, that would be left up to the web server (which would call the user level code using deferToThread each time a websocket frame comes in). Basically similar to what’s happening now, except instead of using the network and a queue to allow calling sync user code from an async process, you just use the primitives provided by the async framework.