Channels integration plan, first draft

446 views
Skip to first unread message

Andrew Godwin

unread,
Dec 17, 2015, 6:35:18 AM12/17/15
to django-d...@googlegroups.com
Hi everyone,

One of the first steps I want to get done for Channels is get a rough plan in place for how things are going to work in terms of where code goes, supported versions, etc. I've written up my thoughts on this into a first draft of what I'm calling the "integration plan":


Feedback is much welcomed - I'd particularly like to hear people's thoughts on the ideas of releasing both natively in Django 1.10 and as a third-party addon for 1.8/1.9, plus the new (for Django) concept of releasing part of it as a separate codebase, though still under the Django umbrella.

Andrew

Mark Lavin

unread,
Dec 17, 2015, 9:05:49 AM12/17/15
to Django developers (Contributions to Django itself)
I have concerns about "built-in feature" means for existing applications (and future applications really). Under "Preserving Simplicty" you note that you should be able to run Django as simply as you do now. Is there a consideration to run "classic" WSGI applications without channels? This in-memory channel isn't free. It still requires serializing the incoming HTTP request to JSON, putting it into the channel queue, deserializing the JSON, <do work to generate response>, serializing the response to JSON, putting it into the response channel, deserializing the response, and sending to the client. That's two JSON serialization round trips which don't currently exist with no gain for existing applications. As you note later on this currently breaks non-root mounted (those using SCRIPT_NAME) applications but doesn't appear to have a plan to resolve it.

Best,

Mark

Carl Meyer

unread,
Dec 17, 2015, 11:50:39 AM12/17/15
to django-d...@googlegroups.com
Hi Andrew,
Thanks for your work on this. A few quick thoughts on first reading:

- On the packaging side, we don't really support or offer instructions
for any installation technique other than pip -- even if you manually
download a tarball, you should still install it with pip. Even our
"install the dev version from git" instructions use "pip install -e .".
So I think you should simplify the packaging/code-reuse plan to
"interface server and channel backends will be separate packages, listed
as dependencies where needed." AFAIK we've said for a while now that
we're ready for required dependencies, just waiting for a case where we
actually need them. I think this is that case, and it's a better option
than adding a bunch of build complexity where sometimes things are
bundled and sometimes they are not.

- I share Mark's concern about the performance (latency, specifically)
implications for projects that want to keep deploying old-style, given
all the new serialization that would now be in the request path. I think
some further discussion of this, with real benchmark numbers to refer
to, is a prerequisite to considering Channels as a candidate for Django
1.10. To take a parallel from Python, Guido has always said that he
won't consider removing the GIL unless it can be done without penalizing
single-threaded code. If you think a different approach makes sense here
(that is, that it's OK to penalize the simple cases in order to
facilitate the less-simple ones), can you explain your reasons for that
position?

- Re imports and builtin vs third-party, I think we should avoid magic
and just require try/except for portable code. It's a bit of
boilerplate, but only authors of reusable apps are likely to need it,
and it keeps things simple and explicit. The boilerplate can be easily
packaged up in a `compat` module if someone cares enough.

- I also share Mark's concern about the SCRIPT_NAME change. Non-root
deploys are a key use case for many people (and a lot of work over the
years has gone into making them work!). I don't think it's an option to
just shrug and break that entirely in Django 1.10, which is what it
currently sounds like you're proposing. What's the technical obstacle
here, exactly?

Carl


signature.asc

Florian Apolloner

unread,
Dec 17, 2015, 1:24:15 PM12/17/15
to Django developers (Contributions to Django itself)
It would be interesting to add a few sentences about file uploads and how they are going to work with the new system.

Florian Apolloner

unread,
Dec 17, 2015, 1:27:20 PM12/17/15
to Django developers (Contributions to Django itself)
On Thursday, December 17, 2015 at 5:50:39 PM UTC+1, Carl Meyer wrote:
- I share Mark's concern about the performance (latency, specifically)
implications for projects that want to keep deploying old-style, given
all the new serialization that would now be in the request path.

It would be worth investigating more involved protocols like https://capnproto.org/ and benchmark those in comparison to JSON. While JSON is nice and good, I am not sure it is the perfect transport structure for what we are trying to achieve (especially not for in memory transport).

Cheers,
Florian

Anssi Kääriäinen

unread,
Dec 17, 2015, 1:32:33 PM12/17/15
to django-d...@googlegroups.com
On Thursday, December 17, 2015, Carl Meyer <ca...@oddbird.net> wrote:
Hi Andrew,

- I share Mark's concern about the performance (latency, specifically)
implications for projects that want to keep deploying old-style, given
all the new serialization that would now be in the request path. I think
some further discussion of this, with real benchmark numbers to refer
to, is a prerequisite to considering Channels as a candidate for Django
1.10. To take a parallel from Python, Guido has always said that he
won't consider removing the GIL unless it can be done without penalizing
single-threaded code. If you think a different approach makes sense here
(that is, that it's OK to penalize the simple cases in order to
facilitate the less-simple ones), can you explain your reasons for that
position?

We would also need some form of streamed messages for streamed http responses.

Is it possible to handle old-style http the way it has always been handled?

 - Anssi 

Andrew Godwin

unread,
Dec 17, 2015, 1:50:56 PM12/17/15
to django-d...@googlegroups.com
To address the points so far:

 - I'm not yet sure whether "traditional" WSGI mode would actually run through the in memory backend or just be plugged in directly to the existing code path; it really depends on how much code would need to be moved around in either case. I'm pretty keen on keeping a raw-WSGI path around for performance/compatability reasons, and so we can hard fail if you try *any* channels use (right now the failure mode for trying to use channels with the wsgi emulation is silent failure)

- Streaming HTTP responses are already in the channels spec as chunked messages; you just keep sending response-style messages with a flag saying "there's more".

- File uploads are more difficult, due to the nature of the worker model (you can't guarantee all the messages will go to the same worker). My current plan here is to revise the message spec to allow infinite size messages and make the channel backend handle chunking in the best way (write to shared disk, use lots of keys, etc), but if there are other suggestions I'm open. This would also let people return large http responses without having to worry about size limits.

- Alternative serialisation formats will be looked into; it's up to the channel backend what to use, I just chose JSON as our previous research into this at work showed that it was actually the fastest overall due to the fact it has a pure C implementation, but that's a year or two old. Whatever is chosen needs large support and forwards compatability, however. The message format is deliberately specified as JSON-capable structures (dicts, lists, strings) as it's assumed any serialisation format can handle this, and so it can be portable across backends.

- I thought SCRIPT_NAME was basically unused by anyone these days, but hey, happy to be proved wrong. Do we have any usage numbers on it to know if we'd need it for a new standalone server to implement? It's really not hard to add it into the request format, just thought it was one of those CGI remnants we might finally be able to kill.

Andrew

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CALMtK1Gz%3DaYMLyFW2da2C6Wo_-c_V2T_4p6K9eh0vwrKB91dKw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Carl Meyer

unread,
Dec 17, 2015, 2:27:34 PM12/17/15
to django-d...@googlegroups.com
On 12/17/2015 11:50 AM, Andrew Godwin wrote:
> - I thought SCRIPT_NAME was basically unused by anyone these days, but
> hey, happy to be proved wrong. Do we have any usage numbers on it to
> know if we'd need it for a new standalone server to implement? It's
> really not hard to add it into the request format, just thought it was
> one of those CGI remnants we might finally be able to kill.

I'll admit to not being an expert on this use case at all, since I don't
generally do it, but AFAIK SCRIPT_NAME remains pretty key for
transparently deploying a Django site at non-root URL paths. If you grep
for SCRIPT_NAME in Django, you'll see that Django itself pays attention
to it (in order to support this use case) in the core WSGIHandler and in
the url reverser. Although it may be that passing SCRIPT_NAME in META to
user view code isn't actually critical to continuing to support non-root
deploys. Needs exploration.

Carl

signature.asc

Andrew Godwin

unread,
Dec 17, 2015, 2:34:57 PM12/17/15
to django-d...@googlegroups.com
Yeah, I definitely see the need for it, and things like making the webserver transparently change the paths isn't going to work since it can't rewrite output. At least we can rename it and call it request['root_path'] or something.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Raphael Michel

unread,
Dec 17, 2015, 2:38:08 PM12/17/15
to Carl Meyer, django-d...@googlegroups.com
Hi,

Am Thu, 17 Dec 2015 12:27:07 -0700
> I'll admit to not being an expert on this use case at all, since I
> don't generally do it, but AFAIK SCRIPT_NAME remains pretty key for
> transparently deploying a Django site at non-root URL paths.

I used this before and can confirm that it is currently very
conveniently possible to deploy a Django application on a non-root URL
using SCRIPT_PATH without modifying the application. It is not really
important to me, but I think it would be worth keeping it in, if it is
not too much hassle.

Cheers
Raphael

Anssi Kääriäinen

unread,
Dec 17, 2015, 4:02:02 PM12/17/15
to django-d...@googlegroups.com
Is the idea a large site using classic request-response architecture would get the requests at interface servers, these would then push the HTTP requests through channels to worker processes, which process the message and push the response through the channel backend back to the interface server and from there back to the client?

 - Anssi

Andrew Godwin

unread,
Dec 17, 2015, 4:48:31 PM12/17/15
to django-d...@googlegroups.com
Yes, that is the idea. While it obviously adds overhead (a millisecond or two in my first tests), it also adds natural load balancing between workers and then lets us have the same architecture for websockets and normal HTTP.

(The interface server does do all the HTTP parsing, so what gets sent over is slightly less verbose than normal HTTP and needs less work to use, but it's not a big saving)

Andrew

Sam Willis

unread,
Dec 17, 2015, 6:13:34 PM12/17/15
to Django developers (Contributions to Django itself)
Hi,

To support file uploads or a large message body the http.request message could have an file_channel or body_channel (much like its reply_channel)? These would be something like http.request.file.Dj3Hd9J and would stream chunked file or body content in the same way as the http.response message with a more_content attribute.

Sam
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

Andrew Godwin

unread,
Dec 17, 2015, 7:24:41 PM12/17/15
to django-d...@googlegroups.com
The problem is that even with a specially named channel you can't guarantee the messages will all go to the same worker so you can re-assemble them on local disk, which kind of makes it still a hard problem. It works for responses because there's only one interface server listening to any response channel.

The current plan I think is best is to change the message spec to allow for any size message, and then have the channel backend handle chunking itself (then, the backend can do what is best for the way it's designed, and stream bigger messages to disk before presenting them to the client).

It does mean going a bit beyond the normal dictionary interface for a message to allow a way to stream the contents of a key rather than just read it, but I think that would be easy-ish to do.

Andrew

To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CALMtK1FOVa6K-MMsZ9vACfcw0w0KHwdCXJ2vxu7_Y5Q9PHJ6Gg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Anssi Kääriäinen

unread,
Dec 18, 2015, 2:34:47 AM12/18/15
to django-d...@googlegroups.com
I have a gut feeling this isn't going to work that well. The reasons include:
  - Backwards compatibility: how is a large site going to upgrade from 1.9 to 1.10?
  - Complexity of setup.
  - Error conditions: for example, what happens when an interface server sends a request to worker, and then dies (that is, the response channel has zero listeners). Similarly for chunked messages.
  - Does the architecture really scale enough? The channel backend is going to be a bottleneck, it needs the ability to handle a huge amount of data and a huge amount of individual messages. In particular, the request channel is going to be contested. We know classic http scales, but is the same true for interface server architecture?
  - Performance. Each request and response needs two additional network roundtrips. One to save to the channel server, one to fetch from the channel server. If the messages are large, this adds a lot of latency.
  - Untested architecture: does any big site use this kind of architecture for all http handling?

A realistic test for this is to push a scalable amount of scalable sized requests through the stack. The stack should recover even if you shut down parts of the network, any single interface server, channel backend server or worker server. Of course, when a server or a part of the network recovers, the stack would need to recover from that. Compare the performance, simplicity of setup and ability to recover from error conditions to a setup with only classic Django http servers.

I'm sorry if this feels negative. But, you are planning to change the very core of what a Django server is, and I feel we need to know with certainty that the new architecture really works. And not only that, it needs to be at least as good as classic http handling for existing users.

 - Anssi

Andrew Godwin

unread,
Dec 18, 2015, 5:02:29 AM12/18/15
to django-d...@googlegroups.com
On Fri, Dec 18, 2015 at 7:34 AM, Anssi Kääriäinen <akaa...@gmail.com> wrote:
I have a gut feeling this isn't going to work that well. The reasons include:
  - Backwards compatibility: how is a large site going to upgrade from 1.9 to 1.10?

None of the core view API will change. A 1.9 codebase will boot and work on 1.10/channels with no code changes.
 
  - Complexity of setup.

Could you elaborate? If you don't want it, you don't need to configure anything, and if you do, for most people it's just getting a Redis server running and pointing a setting at it.
 
  - Error conditions: for example, what happens when an interface server sends a request to worker, and then dies (that is, the response channel has zero listeners). Similarly for chunked messages.

Then the response is dropped, the same way a WSGI worker drops a connection if it dies. Chunked responses will time out after a certain period and be dropped too, in the same way that a deadlocked normal server would.
 
  - Does the architecture really scale enough? The channel backend is going to be a bottleneck, it needs the ability to handle a huge amount of data and a huge amount of individual messages. In particular, the request channel is going to be contested. We know classic http scales, but is the same true for interface server architecture?

I believe it will - have you read through the sharding and scaling plan in the docs? Channels is carefully designed to have no state in anything but interface servers, all workers handling all message types and queuing of messages exactly so you can scale horizontally; you can divide a very large site into several clusters of interfaces and workers fronted by load balancers, and each cluster would have multiple Redis (e.g.) backends with requests equally sharded across them using consistent hashing.
 
  - Performance. Each request and response needs two additional network roundtrips. One to save to the channel server, one to fetch from the channel server. If the messages are large, this adds a lot of latency.

All Django requests already involve multiple round trips to database servers, and the Redis backend at least is much quicker on the processing side. You could make the same argument about not having separate load balancers and nginx serving static files - after all, you're just adding another network roundtrip to send traffic onward from the other server to Django.
 
  - Untested architecture: does any big site use this kind of architecture for all http handling?

I agree with you here - see below for my justification. I've seen it used for other things (data update networks, service calls), but not direct HTTP.
 

A realistic test for this is to push a scalable amount of scalable sized requests through the stack. The stack should recover even if you shut down parts of the network, any single interface server, channel backend server or worker server. Of course, when a server or a part of the network recovers, the stack would need to recover from that. Compare the performance, simplicity of setup and ability to recover from error conditions to a setup with only classic Django http servers.

I'm sorry if this feels negative. But, you are planning to change the very core of what a Django server is, and I feel we need to know with certainty that the new architecture really works. And not only that, it needs to be at least as good as classic http handling for existing users.

That's why a decent part of my proposal to Mozilla for funding was to help us fund hardware and time for extensive performance and scale testing. I've seen this architecture work at scale before for non-HTTP traffic, and I believe that it will work as well for HTTP and WebSockets.

Don't get me wrong - I don't believe this is a magical panacea to solve all problems, and we're going to have to do plenty of testing and development work to get the solution to the level of existing HTTP handling, but remember, it also brings positive results:

 - Downtime-less code deploys (if you stop workers, requests will just wait for new ones to appear until they hit timeout)
 - Ability to add and remove processing capacity live without loadbalancer reconfiguration
 - Lets you run different parts of the site on different Python runtimes, if you want (e.g. one part on PyPy, one part on CPython 2, one part on CPython 3)
 - Background task processing
 - And, of course, WebSockets/HTTP2/long-poll HTTP/other non-request-response protocol support

It's never going to be a solution that works for everyone, but the whole nice part about channels is that, like all the best parts of Django, you can just ignore it and not use it if you don't want it; we're not going to get rid of WSGI support, and the default shipping version in 1.10 isn't going to make you find a Redis server before you can even boot it up; it'll just work like it does now, with extra flexibility there if you want to go turn it on and read through the next part of the tutorial/docs.

I don't expect to get this past the community, core and technical board and approved into a release until it's proven itself to run and work at scale, and I already have several offers of testbeds to help prove this out with realistic web loads, which we can combine with synthetic load tests.

Put it this way - I do not see any other way to handle WebSockets that is as feasible as this. Most solutions either require us to run the whole of Django in an async Python environment, which comes with its own set of issues, or they're more stateful proxy servers or run-alongside-servers that don't seem to have a story for scaling them to hundreds of thousands of connections without blowing up every packet received into HTTP requests.

I think Django absolutely has to adapt to the modern web environment and move away from just rendering templates when browsers request them, and this to me is part of that. If there are other solutions to the same problems I think we should consider them as well; I've just not run across any that work as well in the two years I've been planning this out before I brought it out to be talked about.

Andrew

Mark Lavin

unread,
Dec 18, 2015, 9:00:27 AM12/18/15
to django-d...@googlegroups.com
Anssi criticisms are fair and I feel that some of these responses are glossing over the details. You've claimed this is the same or equivalent to a forked worker model but it isn't because there is no process management/link between the interface and worker and because you've chosen to make this network transparent. As much as you'd like to claim this isn't like Celery, the same issues that exist when trying to (ab)use Celery for blocking RPC calls is what you will have here.

> Then the response is dropped, the same way a WSGI worker drops a connection if it dies.

It isn't entirely the same because in this channel case the worker doesn't know the client/interface dropped the connection. It's still working hard to generate a response which will sit in the response channel (Redis, memory, etc) until it expires (assuming that all backends expire channel messages). That doesn't happen in the current WSGI interface.

> All Django requests already involve multiple round trips to database servers, and the Redis backend at least is much quicker on the processing side.

This isn't true. Database round trips are not a requirement for Django's current architecture. I'll concede that many views touch the DB but that's a choice of the developer, not Django's. When using channels the views will still need to make the same DB calls, though that processing happens at the worker. Putting Redis in between doesn't make it faster.

> You could make the same argument about not having separate load balancers and nginx serving static files - after all, you're just adding another network roundtrip to send traffic onward from the other server to Django.

There are notable differences here. These are persistent HTTP connections and don't suffer from the same problem of client/server drops previously noted. They are also cacheable in a known way allowing round-trips or bandwidth to be avoided. Many Django applications will continue to use load balancers with channels. It can't be denied that channels introduce two more network round trips that didn't exist before. Trying to paint this as "we already talk over the network, so what's a couple more" is not a compelling argument to me.

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/CZPvEE0WPi4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Andrew Godwin

unread,
Dec 18, 2015, 10:08:15 AM12/18/15
to django-d...@googlegroups.com
On Fri, Dec 18, 2015 at 2:00 PM, Mark Lavin <markd...@gmail.com> wrote:
Anssi criticisms are fair and I feel that some of these responses are glossing over the details.

I'm sorry if it comes across like that - a lot of these are things I've been considering for a while and so I can forget to provide context.
 
You've claimed this is the same or equivalent to a forked worker model but it isn't because there is no process management/link between the interface and worker and because you've chosen to make this network transparent. As much as you'd like to claim this isn't like Celery, the same issues that exist when trying to (ab)use Celery for blocking RPC calls is what you will have here.

I never claimed this isn't like Celery; it is quite a bit like it, but with specific changes (no guaranteed delivery, no single-response mechanism) that make it better at throughput.
 

> Then the response is dropped, the same way a WSGI worker drops a connection if it dies.

It isn't entirely the same because in this channel case the worker doesn't know the client/interface dropped the connection. It's still working hard to generate a response which will sit in the response channel (Redis, memory, etc) until it expires (assuming that all backends expire channel messages). That doesn't happen in the current WSGI interface.

True, though from what I understand of the WSGI spec you also don't know the client has disconnected until you try to write out content to it. Most views would still run the entire thing and make a response, only to drop it.

There is also a http.disconnect message type planned to be implemented for when a client disconnects before the response is entirely read, but that's more for long-poll usage where you want to keep track of who has an open connection.
 

> All Django requests already involve multiple round trips to database servers, and the Redis backend at least is much quicker on the processing side.

This isn't true. Database round trips are not a requirement for Django's current architecture. I'll concede that many views touch the DB but that's a choice of the developer, not Django's. When using channels the views will still need to make the same DB calls, though that processing happens at the worker. Putting Redis in between doesn't make it faster.

Right, and channels will not be a requirement for Django's future architecture. They'll just be a thing that I expect most people to turn on as they provide a feature set a lot of types of sites need. And I never said it would be faster - you'll see me repeatedly say channels provides no performance gain - at best, it might help smooth response times with the way the workers load balance jobs based on when they're free.
 

> You could make the same argument about not having separate load balancers and nginx serving static files - after all, you're just adding another network roundtrip to send traffic onward from the other server to Django.

There are notable differences here. These are persistent HTTP connections and don't suffer from the same problem of client/server drops previously noted. They are also cacheable in a known way allowing round-trips or bandwidth to be avoided. Many Django applications will continue to use load balancers with channels. It can't be denied that channels introduce two more network round trips that didn't exist before. Trying to paint this as "we already talk over the network, so what's a couple more" is not a compelling argument to me.

Sorry, I think my tone came across wrong there. I'm more just saying that it's equivalent to adding an extra layer of that kind of infrastructure, in that it also uses persistent connections and should only be a few milliseconds of delay, and for most Django sites a few milliseconds is perhaps a percentage point of their response time. Again, if someone wants higher performance, they don't have to use channels and can just connect directly.

You seem to be assuming I'm here to foist a brand new middle layer on everyone; I'm not. I'm here to make one that fits neatly into Django, that I think most people will want to turn on, and that provides a lot of value in exchange for a slight round-trip performance hit - my goal is sub-5ms, and preferably sub-3. If it starts being 10/20/30 milliseconds of cost, then we'll have to change our approach until it's acceptable.

If you don't want the new features and the resulting change in stack, Django as it is now will be there for you, but if you're that sensitive to performance then Django maybe isn't for you already unless you're heavily modifying it.

Andrew
 

Mark Lavin

unread,
Dec 18, 2015, 10:44:14 AM12/18/15
to django-d...@googlegroups.com
You seem to be assuming I'm here to foist a brand new middle layer on everyone; I'm not. I'm here to make one that fits neatly into Django, that I think most people will want to turn on, and that provides a lot of value in exchange for a slight round-trip performance hit - my goal is sub-5ms, and preferably sub-3. If it starts being 10/20/30 milliseconds of cost, then we'll have to change our approach until it's acceptable.

Yes that's how I read this plan and that's why I think it needs some clarity. I didn't mean for this to turn into a long discussion about performance. This was meant to be a discussion about the transition plan. To go back to my original message, I see no gain for existing WSGI applications to have this on by default, even using the in-memory channel, when they upgrade to 1.10 (or whenever this lands). The current plan reads as though it will. From what you are saying above this sounds more like a django.contrib.channels than a django.core.channels. Either way I feel the plan should provide more clarity in that regard.

Andrew, I thank you for your patience and civility in these discussions. I know this is something you've been working hard on and I'm not trying to be needlessly critical of your work.

Andrew Godwin

unread,
Dec 18, 2015, 11:17:12 AM12/18/15
to django-d...@googlegroups.com
On Fri, Dec 18, 2015 at 3:44 PM, Mark Lavin <markd...@gmail.com> wrote:
You seem to be assuming I'm here to foist a brand new middle layer on everyone; I'm not. I'm here to make one that fits neatly into Django, that I think most people will want to turn on, and that provides a lot of value in exchange for a slight round-trip performance hit - my goal is sub-5ms, and preferably sub-3. If it starts being 10/20/30 milliseconds of cost, then we'll have to change our approach until it's acceptable.

Yes that's how I read this plan and that's why I think it needs some clarity. I didn't mean for this to turn into a long discussion about performance. This was meant to be a discussion about the transition plan. To go back to my original message, I see no gain for existing WSGI applications to have this on by default, even using the in-memory channel, when they upgrade to 1.10 (or whenever this lands). The current plan reads as though it will.

I agree - that was my original intention and how the current version of channels works, but it's no longer my plan, and I should update the integration plan to be more specific and discuss things like introducing different HttpRequest subclasses other than WSGIRequest. 

To be clear, 1.10 might have a different request handling stack than 1.9 that isn't so WSGI-native, but there's still going to be a way to get a request in and a response out directly without hitting Channels. It's almost enforced by the channels design, actually, as the entire Django URL and view system would have to run as a single consumer there anyway.
 
From what you are saying above this sounds more like a django.contrib.channels than a django.core.channels. Either way I feel the plan should provide more clarity in that regard.

Also true - it's not immediately apparent why it shouldn't be contrib or even entirely separate, but there are good reasons, chief among them being that channels needs some low-level changes to the HTTP subclasses and session framework; it's somewhat similar to migrations in this regard. The 1.8/1.9 app versions are going to be full of monkeypatches and not perform as well as I won't be able to eliminate all the duplicate code being run.

I also have a philisophical belief that we should highlight the channels model as the new "base layer" of Django - teaching people how everything else builds on it to do view handling, socket handling, background tasks, etc. - but that isn't at odds with direct-WSGI handling; an in-memory backend and direct handling are basically indistinguishable from the outside.

I'll work on another draft that more clearly highlights WSGI and direct request handling in the "keeping Django the same part" - hopefully that will help make things clearer.
 

Andrew, I thank you for your patience and civility in these discussions. I know this is something you've been working hard on and I'm not trying to be needlessly critical of your work.

I honestly appreciate your feedback here too - it's very hard to see these things from the outside, so it's important to have these discussions. There's a reason I wanted to get a draft up and get comments!

Andrew
 

Anssi Kääriäinen

unread,
Dec 18, 2015, 12:28:45 PM12/18/15
to django-d...@googlegroups.com
On Friday, December 18, 2015, Andrew Godwin <and...@aeracode.org> wrote:


On Fri, Dec 18, 2015 at 3:44 PM, Mark Lavin <markd...@gmail.com> wrote:
You seem to be assuming I'm here to foist a brand new middle layer on everyone; I'm not. I'm here to make one that fits neatly into Django, that I think most people will want to turn on, and that provides a lot of value in exchange for a slight round-trip performance hit - my goal is sub-5ms, and preferably sub-3. If it starts being 10/20/30 milliseconds of cost, then we'll have to change our approach until it's acceptable.

Yes that's how I read this plan and that's why I think it needs some clarity. I didn't mean for this to turn into a long discussion about performance. This was meant to be a discussion about the transition plan. To go back to my original message, I see no gain for existing WSGI applications to have this on by default, even using the in-memory channel, when they upgrade to 1.10 (or whenever this lands). The current plan reads as though it will.

I agree - that was my original intention and how the current version of channels works, but it's no longer my plan, and I should update the integration plan to be more specific and discuss things like introducing different HttpRequest subclasses other than WSGIRequest. 


My concern (and critisism) was about running everything through channels.

Back to the original question about release schedule. Is the 1.8 and 1.9 external package going to require changes to Django core? If so, are the changes going to be substantial?

For 1.10, my vote goes to time based releases, that is, we don't decide at this point what must be in 1.10.

 - Anssi

Marc Tamlyn

unread,
Dec 18, 2015, 12:34:41 PM12/18/15
to django-d...@googlegroups.com

[Note: I have not read all the channels docs, sorry if some of these points are covered there.]

On a packaging note, is there a way to use django[channels] type syntax like flask does? I'm not familiar with the restrictions of this but it may remove the need for try/except imports.

I'm also curious about what a "small scale" deployment of channels would look like. It has features which are useful to small sites which deploy on a single server or heroku dyno. It would be great to be able to run a complete channels stack with in memory communication with a single process start - something as easy as gunicorn wsgi.py. I know you are intending this to be possible for users not using channels, but when it has features so useful for modern web apps like tasks and websockets, getting a no-setup deploy of this working would be a huge win over celery/other systems. Obviously you would lose the no down time deploys here.

I've said this in person to you, but I think a REDIS_SERVERS setting like DATABASES would be a hugely useful feature for django independently of channels, especially if it supported tests well. I'm yet to find a third party app which does this well.

Testing in general is an interesting question - how do you envisage a test environment would run? Would self.client go straight to the consumer? How would you test the channel setup? Will there be test utilities to test, say, that a given message has been sent to a given channel without consuming it?

You talk about being able to load balance between backend tasks and requests. Is there am easy way to not do this in a multi server setup, where say image processing is on a box with far more RAM than is needed for a request processing box?

Have you considered a means of asking the channels system how much load it is under so that systems could do intelligent autoscaling?

Marc

Andrew Godwin

unread,
Dec 18, 2015, 12:39:06 PM12/18/15
to django-d...@googlegroups.com
It won't require changes - monkeypatching can solve it all - though of course there are some small ones that could be made that would improve things. Not sure I want to make dot releases just to support channels, though.
 

For 1.10, my vote goes to time based releases, that is, we don't decide at this point what must be in 1.10.

Yes, I'll update the draft to indicate that 1.10 is the goal, not the definite plan. My hope is that it'll be definitely far enough along by feature freeze to include; the fact that the interface server will be separate helps there too.

Andrew
 

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Donald Stufft

unread,
Dec 18, 2015, 1:01:35 PM12/18/15
to django-d...@googlegroups.com
That syntax allows you to add extra, opt in lists of dependencies to install. It does not pass through to runtime.

Sent from my iPhone

Florian Apolloner

unread,
Dec 18, 2015, 1:04:23 PM12/18/15
to Django developers (Contributions to Django itself)
On Friday, December 18, 2015 at 6:34:41 PM UTC+1, Marc Tamlyn wrote:
I've said this in person to you, but I think a REDIS_SERVERS setting like DATABASES would be a hugely useful feature for django independently of channels, especially if it supported tests well. I'm yet to find a third party app which does this well.

And I'll again vote for a generic CONNECTIONS setting instead of adding REDIS_SERVERS :D

Cheers,
Florian

Andrew Godwin

unread,
Dec 18, 2015, 1:17:38 PM12/18/15
to django-d...@googlegroups.com
On Fri, Dec 18, 2015 at 5:34 PM, Marc Tamlyn <marc....@gmail.com> wrote:

[Note: I have not read all the channels docs, sorry if some of these points are covered there.]

On a packaging note, is there a way to use django[channels] type syntax like flask does? I'm not familiar with the restrictions of this but it may remove the need for try/except imports.

I'm not familiar with the concept, but are you suggesting overloading __getitem__ on the root Django module? That seems unwise.
 

I'm also curious about what a "small scale" deployment of channels would look like. It has features which are useful to small sites which deploy on a single server or heroku dyno. It would be great to be able to run a complete channels stack with in memory communication with a single process start - something as easy as gunicorn wsgi.py. I know you are intending this to be possible for users not using channels, but when it has features so useful for modern web apps like tasks and websockets, getting a no-setup deploy of this working would be a huge win over celery/other systems. Obviously you would lose the no down time deploys here.

Yes, the in memory backend combined with threads will do the trick - this is how runserver will work too. Basically, a couple of worker threads, and a copy of the interface server (daphne) in its own thread. Obviously, this is only if channels + daphne can work, otherwise it'll fall back to the normal wsgi setup - internally, I'll separate them into two submodules and the runserver command will pick at runtime.

For a production deploy, you'd probably want to use very similar code, but with some settings changed. Maybe it can be another command?

I've said this in person to you, but I think a REDIS_SERVERS setting like DATABASES would be a hugely useful feature for django independently of channels, especially if it supported tests well. I'm yet to find a third party app which does this well.

Agreed, but I'm not trying to tackle that here for scope reasons (plus, redis is just one of the backends, albeit the "main" one). Would not be hard to add this in later.
 

Testing in general is an interesting question - how do you envisage a test environment would run? Would self.client go straight to the consumer? How would you test the channel setup? Will there be test utilities to test, say, that a given message has been sent to a given channel without consuming it?

The existing test client will just go straight into the view stack as always, skipping channels - it skips WSGI right now, so I don't see why that would change.

Testing individual consumers will likely involve some kind of test channel backend, where you call a consumer with a message and see what other messages it sends out after the call completes; mock-like, basically.

Inspecting what's on a channel isn't possible in the general case, but I imagine the backend that records messages for playback fulfills the same need?
 

You talk about being able to load balance between backend tasks and requests. Is there am easy way to not do this in a multi server setup, where say image processing is on a box with far more RAM than is needed for a request processing box?

Yes, one of two ways (not sure what to go with yet):

 - Make separate channel clusters (it supports multiple backends), one for images and one for requests. Sending a message to another cluster is as easy as putting the backend alias as a keyword argument to it.

 - Make it so you can make workers listen to a subset of channel names; right now, they just look for all channel names they have consumers registered for, but this is easy to change. When you start a worker, you'd then give a pattern to match against for names (e.g. "http.*")
 

Have you considered a means of asking the channels system how much load it is under so that systems could do intelligent autoscaling?

No, but that's probably a good idea. It's easy to get the queue length for a given channel in both the redis and database backends, which is a good rough metric, though in systems like this I do prefer to know queue time (i.e. how long are items on the queue before consumption) - that's a lot harder though.

Might be a good idea to carve out a channel_length() API on backends for seeing how long a given channel is, then people can tie up metrics to log important ones like "http.request".

Andrew

 
Reply all
Reply to author
Forward
0 new messages