After somewhat hijacking another thread https://groups.google.com/d/msg/django-developers/t_zuh9ucSP4/eJ4TlEDMCAAJ I thought it was best to start fresh and clearly spell out my feelings about the Channels proposal. To start, this discussion of “Django needs a websocket story” reminds me very much of the discussions about NoSQL support. There were proof of concepts made and the sky is falling arguments about how Django would fail without MongoDB support. But in the end the community concluded that `pip install pymongo` was the correct way to integrate MongoDB into a Django project. In that same way, it has been possible for quite some time to incorporate websockets into a Django project by running a separate server dedicated for handling those connections in a framework such as Twisted, Tornado, Aiohttp, etc and establishing a clear means by which the two servers communicate with one another as needed by the application. Now this is quite vague and ad-hoc but it does work. To me this is the measuring stick by which to judge Channels. In what ways is it better or worse than running a separate server process for long-lived vs short-lived HTTP connections?
At the application development level, Channels has the advantage of a clearly defined interprocess communication which would otherwise need to be written. However, The Channel API is built more around a simple queue/list rather than a full messaging layer. The choices of backends are currently limited to in-memory (not suitable for production), the ORM DB (not suitable for production), and Redis. While Redis PUB/SUB is nice for fanout/broadcast messaging, it isn’t a proper message queue. It also doesn’t support TLS out of the box. For groups/broadcast the Redis Channel backend also doesn’t use PUB/SUB but instead emulates that feature. It likely can’t use PUB/SUB due to the choice of sharding. This seemingly ignores robust existing solutions like Kombu, which is designed around AMQP concepts. Kombu supports far more transports than the Channel backends while emulating the same features, such as groups/fanout, and more such as topic exchanges, QoS, message acknowledgement, compression, and additional serialization formats.
Architecturally, both of these approaches require running two processes. The current solution would run a WSGI server for short lived connections and an async server for long lived connections. Channels runs a front-end interface server, daphne, and the back-end worker servers. Which is more scalable? That’s hard to say. They both scale the same way: add more processes. It’s my experience that handling long-lived vs short-lived HTTP connections have different scaling needs so it is helpful to be able to scale them independently as one might do without Channels. That distinction can’t be made with Channels since all HTTP connections are handled by the interface servers. Channels has an explicit requirement of a backend/broker server which requires its own resources. While not required in the separate server setup, it’s likely that there is some kind of message broker between the servers so at best we’ll call this a wash in terms of resources. However, the same is not true for latency. Channels will handle the same short-lived HTTP connections by serializing the request, putting it into the backend, deserializing request, processing the response in the worker, serializing the response, putting it into the backend, deserializing response, and sending it to the client. This is a fair bit of extra work for no real gain since there is no concept of priority or backpressure. This latency also exists for the websocket message handling. While Channels may try to claim that it’s more resilient/fault tolerant because of this messaging layer, it claims “at most once” delivery which means that a message might never be delivered. I don’t think that claim has much merit. As noted in previous discussions, sending all HTTP requests unencrypted through the Channel backend (such as Redis) raises a number of potential security/regulatory issues which have yet to be addressed.
One key difference to me is that pushing Channels as the new Django standard makes Django’s default deployment story much more complicated. Currently this complication is the exception not the rule. Deployment is a frequent complaint, not just from people new to Django. Deployment of Python apps is a pain and this requires running two of them even if you aren’t using websockets. To me that is a huge step in the wrong direction for Django in terms of ease of deployment and required system resources.
Channels claims to have a better zero-downtime deployment story. However, in practice I’m nTot convinced that will be true. A form of graceful reload is supported by the most popular WSGI servers so it isn’t really better than what we currently have. The Channel docs note that you only need to restart the workers when deploying new code so you won’t drop HTTP connections. But the interface application definition and the worker code live in the same code base. It will be difficult to determine whether or not you need to restart the interface or not on a given deployment so many people will likely error on the side of restarting the interface as well. With a separate async server, likely in a separate code base, it would be easy to deploy them independently and only restart the websocket connections when needed. Also, it’s better if your application can handle gracefully disconnections/reconnections for the websocket case anyway since you’ll have to deal with that reality on mobile data connections and terrible wifi.
There is an idea floating around of using Channels for background jobs/Celery replacement. It is not/should not be. The message delivery is not guaranteed and there is no retry support. This is explicitly outside of the stated design goals of the project. Allowing this idea to continue in any form does a disservice to the Django community who may use Channels in this way. It’s also a slap in the face to the Celery authors who’ve worked for years to build a robust system which is superior to this naive implementation.
So Channels is at best on par with the existing available approaches and at worst adds a bunch of latency, potentially dropped messages, and new points of failure while taking up more resources and locks everyone into using Redis. It does provide a clear message framework but in my opinion it’s too naive to be useful. Given the complexity in the space I don’t trust anything built from the ground up without having a meaningful production deployment to prove it out. It has taken Kombu many years to mature and I don’t think it can be rewritten easily.
I see literally no advantage to pushing all HTTP requests and responses through Redis. What this does enable is that you can continue to write synchronous code. To me that’s based around some idea that async code is too hard for the average Django dev to write or understand. Or that nothing can be done to make parts of Django play nicer with existing async frameworks which I also don’t believe is true. Python 3.4 makes writing async Python pretty elegant and async/await in 3.5 makes that even better.
Sorry this is so long. Those who saw the DjangoCon author’s panel know that quickly writing walls of unreadable text is my forte. It’s been building for a long time. I have an unsent draft to Andrew from when he wrote his first blog post about this idea. I deeply regret not sending it and beginning to engage in this discussion earlier. It’s hard for me to separate this work from the process by which it was created. Russ touched on my previous experience with the DEP process and I will admit that has jaded many of my interactions with the core team. Building consensus is hard and I’m posting this to help work towards the goal of community consensus. Thanks for taking the time to read this all the way through and I welcome any feedback.
Best,
After somewhat hijacking another thread https://groups.google.com/d/msg/django-developers/t_zuh9ucSP4/eJ4TlEDMCAAJ I thought it was best to start fresh and clearly spell out my feelings about the Channels proposal. To start, this discussion of “Django needs a websocket story” reminds me very much of the discussions about NoSQL support. There were proof of concepts made and the sky is falling arguments about how Django would fail without MongoDB support. But in the end the community concluded that `pip install pymongo` was the correct way to integrate MongoDB into a Django project. In that same way, it has been possible for quite some time to incorporate websockets into a Django project by running a separate server dedicated for handling those connections in a framework such as Twisted, Tornado, Aiohttp, etc and establishing a clear means by which the two servers communicate with one another as needed by the application. Now this is quite vague and ad-hoc but it does work. To me this is the measuring stick by which to judge Channels. In what ways is it better or worse than running a separate server process for long-lived vs short-lived HTTP connections?
At the application development level, Channels has the advantage of a clearly defined interprocess communication which would otherwise need to be written. However, The Channel API is built more around a simple queue/list rather than a full messaging layer. The choices of backends are currently limited to in-memory (not suitable for production), the ORM DB (not suitable for production), and Redis. While Redis PUB/SUB is nice for fanout/broadcast messaging, it isn’t a proper message queue. It also doesn’t support TLS out of the box. For groups/broadcast the Redis Channel backend also doesn’t use PUB/SUB but instead emulates that feature. It likely can’t use PUB/SUB due to the choice of sharding. This seemingly ignores robust existing solutions like Kombu, which is designed around AMQP concepts. Kombu supports far more transports than the Channel backends while emulating the same features, such as groups/fanout, and more such as topic exchanges, QoS, message acknowledgement, compression, and additional serialization formats.
Architecturally, both of these approaches require running two processes. The current solution would run a WSGI server for short lived connections and an async server for long lived connections. Channels runs a front-end interface server, daphne, and the back-end worker servers. Which is more scalable? That’s hard to say. They both scale the same way: add more processes.
It’s my experience that handling long-lived vs short-lived HTTP connections have different scaling needs so it is helpful to be able to scale them independently as one might do without Channels. That distinction can’t be made with Channels since all HTTP connections are handled by the interface servers.
Channels has an explicit requirement of a backend/broker server which requires its own resources. While not required in the separate server setup, it’s likely that there is some kind of message broker between the servers so at best we’ll call this a wash in terms of resources. However, the same is not true for latency. Channels will handle the same short-lived HTTP connections by serializing the request, putting it into the backend, deserializing request, processing the response in the worker, serializing the response, putting it into the backend, deserializing response, and sending it to the client. This is a fair bit of extra work for no real gain since there is no concept of priority or backpressure.
This latency also exists for the websocket message handling. While Channels may try to claim that it’s more resilient/fault tolerant because of this messaging layer, it claims “at most once” delivery which means that a message might never be delivered. I don’t think that claim has much merit. As noted in previous discussions, sending all HTTP requests unencrypted through the Channel backend (such as Redis) raises a number of potential security/regulatory issues which have yet to be addressed.
One key difference to me is that pushing Channels as the new Django standard makes Django’s default deployment story much more complicated. Currently this complication is the exception not the rule. Deployment is a frequent complaint, not just from people new to Django. Deployment of Python apps is a pain and this requires running two of them even if you aren’t using websockets. To me that is a huge step in the wrong direction for Django in terms of ease of deployment and required system resources.
Channels claims to have a better zero-downtime deployment story. However, in practice I’m nTot convinced that will be true. A form of graceful reload is supported by the most popular WSGI servers so it isn’t really better than what we currently have. The Channel docs note that you only need to restart the workers when deploying new code so you won’t drop HTTP connections. But the interface application definition and the worker code live in the same code base. It will be difficult to determine whether or not you need to restart the interface or not on a given deployment so many people will likely error on the side of restarting the interface as well.
With a separate async server, likely in a separate code base, it would be easy to deploy them independently and only restart the websocket connections when needed.
Also, it’s better if your application can handle gracefully disconnections/reconnections for the websocket case anyway since you’ll have to deal with that reality on mobile data connections and terrible wifi.
There is an idea floating around of using Channels for background jobs/Celery replacement. It is not/should not be. The message delivery is not guaranteed and there is no retry support. This is explicitly outside of the stated design goals of the project. Allowing this idea to continue in any form does a disservice to the Django community who may use Channels in this way. It’s also a slap in the face to the Celery authors who’ve worked for years to build a robust system which is superior to this naive implementation.
So Channels is at best on par with the existing available approaches and at worst adds a bunch of latency, potentially dropped messages, and new points of failure while taking up more resources and locks everyone into using Redis. It does provide a clear message framework but in my opinion it’s too naive to be useful. Given the complexity in the space I don’t trust anything built from the ground up without having a meaningful production deployment to prove it out. It has taken Kombu many years to mature and I don’t think it can be rewritten easily.
I see literally no advantage to pushing all HTTP requests and responses through Redis. What this does enable is that you can continue to write synchronous code. To me that’s based around some idea that async code is too hard for the average Django dev to write or understand. Or that nothing can be done to make parts of Django play nicer with existing async frameworks which I also don’t believe is true. Python 3.4 makes writing async Python pretty elegant and async/await in 3.5 makes that even better.
Sorry this is so long. Those who saw the DjangoCon author’s panel know that quickly writing walls of unreadable text is my forte. It’s been building for a long time. I have an unsent draft to Andrew from when he wrote his first blog post about this idea. I deeply regret not sending it and beginning to engage in this discussion earlier.
It’s hard for me to separate this work from the process by which it was created. Russ touched on my previous experience with the DEP process and I will admit that has jaded many of my interactions with the core team. Building consensus is hard and I’m posting this to help work towards the goal of community consensus. Thanks for taking the time to read this all the way through and I welcome any feedback.
On Thu, May 5, 2016 at 12:34 PM, Mark Lavin <markd...@gmail.com> wrote:The main gains are (in my opinion):- The same server process can serve both HTTP and WebSockets without path prefixing (auto-negotiation based on the Upgrade header); without this you need an extra web layer in front to route requests to the right backend server- HTTP long-polling is supported via the same mechanism (like WebSockets, it does not fit inside the WSGI paradigm in a performant way)- You get to run less processes overall
Firstly, nothing in channels uses pub/sub - channels deliver to a single reader of a queue, and thus cannot be built on a broadcast solution like pub/sub.
I've always tried to be clear that it is not a Celery replacement but instead a way to offload some non-critical task if required.
So Channels is at best on par with the existing available approaches and at worst adds a bunch of latency, potentially dropped messages, and new points of failure while taking up more resources and locks everyone into using Redis. It does provide a clear message framework but in my opinion it’s too naive to be useful. Given the complexity in the space I don’t trust anything built from the ground up without having a meaningful production deployment to prove it out. It has taken Kombu many years to mature and I don’t think it can be rewritten easily.
a) ASGI does not lock everyone into using Redis; it just so happens that is the first backend I have written. It is designed to run against other suitable datastores or socket protocols and we have the money to fund such an endeavour.b) Kombu solves a different problem - that of abstracting task queues - and it would still be my first choice for that; I have used it for many years and it would continue to be my choice for task queuing.
ASGI is essentially meant to be an implementation of the CSP/Go style of message-passing interprocess communication, but cross-network rather than merely cross-thread or cross-process as I believe that network transparency makes for a much better deployment story and the ability to build a more resilient infrastructure.
It’s hard for me to separate this work from the process by which it was created. Russ touched on my previous experience with the DEP process and I will admit that has jaded many of my interactions with the core team. Building consensus is hard and I’m posting this to help work towards the goal of community consensus. Thanks for taking the time to read this all the way through and I welcome any feedback.
I will put my hand up and say that this sidestepped the DEP process, and that's entirely my fault. It was not my intention; I've been working on this for over two years, and only last year did I go public with my semi-final design and start asking for feedback; I should probably have taken it into a DEP then, but failed to.The problem is likely that I kept discussing channels with various members of the core team and other people I know in the Django community, and always received implicit approval, which is a terrible way to go about being transparent.That said, I hope that my efforts over the last year to publicise and talk about this in every available avenue have gone somewhat towards alleviating the lack of a DEP; I have never tried to smuggle this in or be quiet about it, in fact very much the contrary. I've had the ASGI spec (which I potentially would like to push as a PEP) up for a while now, too, and have been trying to actively get feedback on it from both the Django and the wider Python community.I hope we can resolve our differences on this and both walk away happy; you have some very valid points about deployment, reliability, and the newness of all this code, but I also believe that the path from here to having this deployed widely will be a good one.I have been working on this problem for a long time, and between experiments both by myself and internally at Eventbrite where our engineers tried a large number of different messaging backends for message transport (in our case, for a SOA layer, though it performs a similar function and requires similarly low latency), Redis seemed like the best choice for a first and canonical transport implementation. (AMQP, Kafka, enterprise message buses all have different problems).I don't expect people to adopt Channels overnight and switch to running Daphne in front of all their traffic; if anything, I expect a lot of people will run it just for WebSockets (I likely would at the moment if faced with a very large deployment). That said, I believe it is certainly at the point where it can be included in Django, if nothing else because the very design of channels and ASGI means that the interface servers and transport layer are both improveable and swappable out of the context of Django core.
The patch to Django core is mostly routing and consumer design - an API I've tried hard to refine to make accessible for beginners while having flexibility for more advanced cases - and that's the only part that will be directly locked in stone for the future. The other components - interface servers and transport layers - exist outside the Django release cycle and have the potential for large improvement or complete replacement as the community starts using Channels and we start getting the feedback and communal knowledge that only the large deployment of this kind of thing can get.Sorry about circumventing the DEP process and pulling this off in a very strange way; it feels particularly guilty now it's been highlighted to me and I know that you are yourself working on a DEP, and it probably seems like I've abused my position on the core team to pull this off; please understand that was not my intention, and I've always wanted to have an open, frank discussion about channels in Django. In many ways, I'm glad someone has finally brought up all the things I thought would be valid counter-arguments but haven't really been advanced yet.Andrew
Thank you for your comments and I have some brief replies.
If I'm understanding it correctly, groups are an emulated broadcast. I'm saying it would be an advantage for it to use pub/sub but it does not.
I've always tried to be clear that it is not a Celery replacement but instead a way to offload some non-critical task if required.I don't agree that this has been clear. That is my primary criticism here. I don't think this should be encouraged. Ryan's reply continues with this confusion.
Yes the lock-in is an exaggeration, however, given the poor support/upkeep for third-party DB backends, I doubt the community will have better luck with Channel backends not officially supported by the Django core team. I'd be happy to be wrong here.
Kombu is not to be confused with Celery. Kombu is a general purpose AMQP/messaging abstraction library. I don't think we agree on its potential role here. Perhaps it's better stated that I think Channel's minimalist API is too minimalist. I would prefer if additional AMQP-like abstractions existed such as topic routing and QoS.
ASGI is essentially meant to be an implementation of the CSP/Go style of message-passing interprocess communication, but cross-network rather than merely cross-thread or cross-process as I believe that network transparency makes for a much better deployment story and the ability to build a more resilient infrastructure.Again I don't agree with this argument and I don't see anything in Channels which backs up this claim. I believe this is where we likely have a fundamental disagreement. I see this network transparency as additional latency. I see the addition of the backend/broker as another moving part to break.
What's done is done and I don't want to start another process discussion at this point. Maybe another day. I'm doing my best to focus on the technical aspects of the proposal. That isn't to say that I'm without bias and I'm trying to own that. The fact is I have looked into Channels, the docs and the code, and I remain unconvinced this should be the blessed solution for websockets and I've tried to make it clear why. I'd much prefer to continue to run Tornado/aiohttp for the websocket process. That's not a personal attack. I just don't see Channels as a meaningful improvement over that direction.
Hi Andrew,
On 05/05/2016 02:19 PM, Andrew Godwin wrote:
> I will put my hand up and say that this sidestepped the DEP process, and
> that's entirely my fault. It was not my intention; I've been working on
> this for over two years, and only last year did I go public with my
> semi-final design and start asking for feedback; I should probably have
> taken it into a DEP then, but failed to.
This isn't a past-tense question; it's not too late to write a DEP, and
I personally think that a DEP should be written and approved by the
technical board before the channels patch is merged. I actually assumed
that one was still on its way; perhaps I missed some communication at
some point that said it wouldn't be.
I think channels, multiple-template-engines, and
reworked-middleware (and migrations, for that matter) are all
rethinkings of long-standing core aspects of how Django works, which in
my mind makes them prime DEP candidates,
Yes I agree with the value of a standardized way of communicating between these processes and I listed that as a highlight of Channels, though it quickly shifted into criticism. I think that's where we are crossing paths with relation to Kombu/AMQP as well. I find the messaging aspect of Channels far more interesting and valuable than ASGI as a larger specification. Messaging I do think needs to be network transparent. I just don't like that aspect tied into the HTTP handling. At this point I'm not sure how to decouple the messaging aspect from the HTTP layer since I feel they are very tightly bound in ASGI.
Honestly I don't think Django *needs* tightly integrated websocket support but I do see the value in it so we aren't at a complete impasse. I suppose that's why it's my general preference to see a third-party solution gain traction before it's included. I played with integrating Django + aiohttp a few months ago. Nothing serious and I wouldn't call it an alternate proposal. It's barely a proof of concept: https://github.com/mlavin/aiodjango. My general inclination is that (insert wild hand waving) django.contrib.aiohttp/django.contrib.twisted/django.contrib.tornado would be the way forward for Django + websockets without a full scale rewrite of the WSGI specification.
Thank you, Mark, for starting this discussion. I, too, found myself simply accepting that channels was the right way to go, despite having the same questions you do. I realize this shouldn't be, so I've chimed in on some of your comments.
> On May 5, 2016, at 2:34 PM, Mark Lavin <markd...@gmail.com> wrote:
>
> [snip]
>
> The Channel API is built more around a simple queue/list rather than a full messaging layer. [snip] Kombu supports [snip].
The API was purposefully limited, because channels shouldn't need all those capabilities. All this is spelled out in the documentation, which I know you already understand because you've mentioned it elsewhere. I think that the choice to use a more limited API makes sense, though that doesn't necessarily mean that it is the right choice.
>
> [snip description of architecture]
First off, the concerns you mention make a lot of sense to me, and I've been thinking along the same lines.
I've been considering if having an alternative to Daphne that only used channels for websockets, but used WSGI for everything else. Or some alternative split where some requests would be ASGI and some WSGI. I've tested a bit the latency overhead that using channels adds (on my local machine even), and it's not insignificant. I agree that finding a solution that doesn't so drastically slow down the requests that we've already worked hard to optimize is important. I'm not yet sure the right way to do that.
As far as scaling, it is apparent to me that it will be very important to have the workers split out, in a similar way to how we have different celery instances processing different queues. This allows us to scale those queues separately. While it doesn't appear to exist in the current implementation, the channel names are obviously suited to such a split, and I'd expect channels to grow the feature of selecting which channels a worker should be processing (forgive me if I've just missed this capability, Andrew).
>
> [[ comments on how this makes deployment harder ]]
ASGI is definitely more complex that WSGI. It's this complexity that gives it power. However, to the best of my knowledge, there's not a push to be dropping WSGI. If you're doing a simple request/response site, then you don't need the complexity, and you probably should be using WSGI. However, if you need it, having ASGI standardized in Django will help the community build on the power that it brings.
>
> Channels claims to have a better zero-downtime deployment story. However, in practice I’m not convinced that will be true. [snip]
I've been concerned about this as well. On Heroku my web dynos don't go down, because the new ones are booted up while the old ones are running, and then a switch is flipped to have the router use the new dynos. Worker dynos, however, do get shut down. Daphne won't be enough to keep my site functioning. This is another reason I was thinking of a hybrid WSGI/ASGI server.
>
> There is an idea floating around of using Channels for background jobs/Celery replacement. It is not/should not be. [snip reasons]
It's not a Celery replacement. However, this simple interface may be good enough for many things. Anything that doesn't use celery's `acks_late` is a candidate, because in those cases even Celery doesn't guarantee delivery, and ASGI is a simpler interface than the powerful, glorious behemoth that is Celery.
There's an idea that something like Celery could be built on top of it. That may or may not be a good idea, since Celery uses native protocol features of AMQP to make things work well, and those may not be available or easy to replicate accurately with ASGI. I'll be sticking with Celery for all of those workloads, personally, at least for the foreseeable future.
>
> [snip] locks everyone into using Redis.
Thankfully, I know you're wrong about this. Channel layers can be built for other things, but Redis is a natural fit, so that's what he's written. I expect we'll see other channel layers for queues like AMQP before too long.
>
> I see literally no advantage to pushing all HTTP requests and responses through Redis.
It seems like a bad idea to push _all_ HTTP requests through Redis given the latency it adds, but long-running requests can still be a good idea for this case, because it separates the HTTP interface from the long-running code. This can be good, if used carefully.
> What this does enable is that you can continue to write synchronous code. To me that’s based around some idea that async code is too hard for the average Django dev to write or understand. Or that nothing can be done to make parts of Django play nicer with existing async frameworks which I also don’t believe is true. Python 3.4 makes writing async Python pretty elegant and async/await in 3.5 makes that even better.
Async code is annoying, at best. It can be done, and it's getting much more approachable with async/await, etc. But even when you've done all that, there's stuff that, for many reasons, either cannot be written async (using a non-async library), or isn't IO-bound and async could actually _hurt_ the performance. ASGI doesn't force you to write anything synchronous _or_ asynchronous, and that's part of the beauty: it doesn't care.
On May 6, 2016, at 7:21 AM, Mark Lavin <markd...@gmail.com> wrote:Ryan,Sorry if you felt I was ignoring your reply to focus on the discussion with Andrew. You both made a lot of the same points at about the same time but I did want to touch on a couple things.
On Thursday, May 5, 2016 at 4:21:59 PM UTC-4, Ryan Hiebert wrote:
[snip] Anything that doesn't use celery's `acks_late` is a candidate, because in those cases even Celery doesn't guarantee delivery, and ASGI is a simpler interface than the powerful, glorious behemoth that is Celery.This isn't the place for a long discussion about the inner workings of Celery but I don't believe this is true. [snip]
All of the examples I've seen have pushed all HTTP requests through Redis. I think some of the take-aways from this conversation will be to move away from that and recommend Channels primarily for websockets and not for WSGI requests.