Good to hear.
> Correct me if I'm wrong, but it seems like this punts on the response-
> altering middleware issue. This seems like the trickiest aspect of
> this whole issue, and is probably the thing I'd have the strongest
> opinions about.
It's not so much that the design punts on the issue as that I punted
on writing up my thoughts in the original post (:
One aspect I did mention is that introducing asynchronous aspects to
an app should not break existing synchronous aspects, including those
that use response-altering middleware. I do expect there to be
"middleware" that interacts with async responses, but I expect this
middleware will have to be aware of async responses and have
async-specific implementations, since the asynchronous model is so
fundamentally different from the synchronous mode.
"Middleware" for async responses will be a rather different beast than
middleware for sync responses. Whereas the later is a matter of data
transformation, the former could implement *event* manipulation -
things like suppression/interception, inject, transformation,
coalescing, aliasing, etc.
In some cases one might want to apply the behavior of existing
synchronous middleware to async responses. In these cases I suspect
simple generalizations of the existing middleware will be sufficient.
But you said you have strong thoughts here - care to share?
- Mark
Ring hasn't been around that long. Neither has Clojure. Thus, I think
it would be reasonable now (moreso than any future time) to take all
of the existing Clojure Ring middleware that doesn't tolerate a
streaming response, and to break it. Now. :-) Or rather, I think
such middleware is already fundamentally broken, because it doesn't
let HTTP be HTTP; HTTP is a streaming protocol.
I haven't dug through your reactor proposal (or the referenced sites)
in detail yet; I will. In the meantime, though, I wonder if another
core Clojure element could serve us well here.
Could the "data" part of the response payload, be a lazy seq? All the
layers could pull the next piece of it off as they need it, which
would enable streaming. Yet many kinds of trivial middleware would
remain just as trivial as they are without streaming. Beyond that,
some kinds of middleware (imagine something that wraps some data
around the start/end of the response) wouldn't require any more or
different code, they would "just work" with streaming.
Regarding the notion of an alternative means (reactor / non-reactor)
for handling all-at-once vs async response, that is obviously
excellent from the point of view of backward compatibility. However, I
fear that this would lead to most middleware in the wild, supporting
only the plain sync approach.
It all really depends on how important streaming is.
--
Kyle Cordes
http://kylecordes.com
Seqs are insufficiently general to represent arbitrary asynchronous
responses for both HTTP and duplex-style protocols like WebSockets.
You can actually use Seqs for :body in the current Ring, but it
interacts poorly with binding and error handling and creates
additional impedence mismatch. If you really felt likes Seqs were the
best way to represent your async responses, you could write a trivial
middleware that emited :chunk events as items were pulled off the
head.
> Regarding the notion of an alternative means (reactor / non-reactor)
> for handling all-at-once vs async response, that is obviously
> excellent from the point of view of backward compatibility. However, I
> fear that this would lead to most middleware in the wild, supporting
> only the plain sync approach.
If they did, that would be because the authors' felt that the
combination of the pervasiveness of synchronous responses and the
relative ease of writing synchronous middleware outweighed the loss of
generality of supporting both sync and async responses. That's a
reasonable decision to make and one that I'd like to allow.
- Mark
Ah, indeed, I was thinking only of a streaming response over time
(where a seq could probably be made to do the job), not of duplex
WebSockets. Ugh.
I quite agree. While I'm all for async, I don't want it to turn into a
requirement
for any middleware implementation.
Sometimes sync is the easiest/least complex way to approach, why pay
for the complexity if you don't need it.
--
Omnem crede diem tibi diluxisse supremum.
This seems to be the consensus. But I think to some extent it will be
a regrettable path, I am sure there is another path where the async
comes "for free" for most middleware as the default, and only gets
squashed to sync (or implemented as async with extra work) in the
cases where it would add complexity to do so.
I'm probably reaching too far for 2010 though.
There might well be a better answer that the initial proposal. We're
just beginning the processing of finding a good abstraction for async
behaviour in Ring. Don't resign yourself to accept something you don't
think is ideal. If you think of any particular approaches that might
afford more generality, be sure to let us know here. Meanwhile I'll be
continuing to consider alternatives as well.
- Mark
Very good point. I agree that the request and response objects should
"go all the way through".
> For the synchronous use case, I think this is unlikely to ever
> happen. But since the asynchronous API is rather low-level, the
> chances of a framework that attempts to hide it away completely is
> much more likely. Without a careful design for the middleware, Aleph
> would become such a framework, which I think would be a shame. I
> could make a pitch here for using a higher-level abstraction over the
> asynchronous API, but I'm not sure that's the correct approach in this
> situation.
I'm not sure either, but I think it would be helpful on a variety of
levels for you to articulate your particular ideas about a higher
level async abstraction. Even if Ring doesn't adopt such an
abstraction, it may help shape that one that Ring does adopt.
> I will note, though, that a lower-level API will likely
> cause fragmentation, since there will inevitably be useful middleware
> that targets a higher-level abstraction.
I think it would be helpful for us to write down some examples of
async middleware that one would prefer to write at a higher level of
abstraction and what it would look like at the high vs low levels. It
sounds like you've thought about async response middleware rather more
than I have, so would you be willing to start? Or is this more a
general design concern then a reflection of specific middleware ideas?
> Either way, I'm glad to be having the discussion. I look forward to
> the final result.
Thanks again for the comments.
Mark, the more I think about your protocol proposals, the less I see
wrong with them. Your proposals are always annoyingly well thought out
;)
I was initially uncertain whether it was a good idea to divide the
protocol up into :http and :websocket types. Websockets are
tantalizingly close to HTTP in design, it seems a shame to split them
up.
However, on reflection I think I agree with you. The Websocket
handshake requires transmitting an exact sequence of bytes. HTTP
transports like the Java Servlet specification are usually not precise
enough for that. I think we do need both :http and :async.
I'm not too sure about your choice of the :async key, but I can't
think of anything better to name it. :protocol is more descriptive,
but then you lose the part of the map that informs the user it's an
asynchronous response.
Presumably the :headers message can be sent more than once? If so,
then I think :chunk should be :body. It doesn't make much sense to use
:chunk if you can send chunks of headers, or chunks of the body. The
fact you're sending the data in pieces is implied by the fact it is
asynchronous. Also, calling it :body ties in nicely with the
synchronous request map.
It occurs to me that this implementation would benefit from a
ring.util.async namespace, similar in nature to to the
ring.util.response namespace. One could have functions like
"websocket" and "disconnect" that provide elegant ways of specifying
reactor and message maps:
(defn handler [req]
(websocket
:connect
(fn [send]
(send (message "Hello World"))
(send disconnect))))
Or perhaps:
(defn handler [req]
(websocket
:connect
(fn [channel]
(send channel "Hello World")
(disconnect channel))))
I guess you've probably had similar ideas.
I don't think we should worry about making current middleware
obsolete, for the following reasons:
1. Any middleware that works only on the request map will work on
synchronous and asynchronous handlers. This includes wrap-params,
wrap-cookies etc.
2. Any middleware that adds a binding won't work under any
implementation, because asynchronous responses don't necessarily all
take place in the same thread as the handler.
3. The few middleware functions that do alter the response, don't
really apply to asynchronous responses. For example, HTTP sessions and
CSS-formatted HTML stacktraces have little place in a websocket
stream.
4. In general, middleware designed to interact with a duplex stream is
going to have fundamentally different goals than one that returns a
static response.
That's all I can think of at the moment.
- James
HTTP is streaming by necessity, not by design. Historically,
well-behaved HTTP clients were supposed to download the response as
fast as possible, then close the connection.
We're also not really talking about streaming here. Ring already
supports streaming in the form of InputStreams and lazy seqs.
What this proposal provides is a way of opening a duplex channel,
either by abusing HTTP, or by using Websockets.
- James
Yes, that's true... It would be nice if the JVM supported
continuations. They'd come in handy for cases like this.
- James
> I think Rich changed this in 1.3. Bindings are now visible across
> thread boundaries.
Right, binding semantics are definitely being improved in 1.3.
> I'm not sure how this will impact Ring middleware
> under this new proposal. If the reactor function is called in the same
> thread as the request it should allow bindings to be visible within
> this function and in any threads that are started by this function. In
> the receive function, I am not sure where the thread boundaries are. I
> would guess that this function is called in a different thread. In any
> case, you can get the information you need before creating the receive
> function.
It's still not clear to me how exactly bindings in 1.2 vs 1.3 and
scopes in 1.3 or later will interact with async responses, mostly
because I don't know the details of the new functionality. I'll
definitely need to study this before moving the proposal into serious
consideration.
If someone has any insights on the new binding and scopes semantics,
I'd love to hear your thoughts.
> Mark, great job writing up this proposal. You have given us a lot to
> think about. I will try to spend some time using this before I give
> you my feedback.
Looking forward to it,
- Mark
> Thanks,
> Brenton
Ordering semantics are definitely important for reasoning about
inbound event handling. However, they are orthogonal to the use of
events or queues at the application layer: this is a choice that can
be made independently of the underlying event semantics. In
particular, one can accept and process unordered events directly,
accept unordered events and push them onto a queue to process later,
accept ordered events and process them directly, or accept ordered
events and push them onto a queue to process later.
The most natural approach for ordered protocols like HTTP and
WebSockets is to use the synchronization already required at the
parsing/framing level to emit strongly ordered events to the
application. The application can then choose to block the IO thread
and handle that event synchronously, to put the associated work into
another thread pool and handle it asynchronously - freeing the IO
thread at the risk of losing the event order, or to synchronously put
the associated work into a queue/ordered thread pool to process it
asynchronously while both preserving order and freeing the IO thread.
For example, the Jetty WebSocket implementation used by
ring-jetty-async takes this emit-strongly-ordered-events approach. The
WebSocket protocol parser/decoder is activated in a synchronized way
by the NIO framework as IO becomes available on the connections
channel; when the parser has accumulated enough state in its buffer to
present the application with a WebSocket-appropriate chunk of input
data, it makes a blocking call to the application code. Only when this
call returns does the parser continue on, with the possibility of
later making other calls back into application code.
How does Aleph+Lamina achieve strong ordering of events in the receive
callbacks? Does it rely on the strong ordering of incoming events or
does it do its own framing/ordering outside of the underlying
protocols?
- Mark
The consolation here is that the shape of the reactor is the same on
all cases (long-poll, chunked stream, websockets, maybe comet later).
> However, on reflection I think I agree with you. The Websocket
> handshake requires transmitting an exact sequence of bytes. HTTP
> transports like the Java Servlet specification are usually not precise
> enough for that. I think we do need both :http and :async.
>
> I'm not too sure about your choice of the :async key, but I can't
> think of anything better to name it. :protocol is more descriptive,
> but then you lose the part of the map that informs the user it's an
> asynchronous response.
I liked the idea of
(if (:async resp)
async-stuff
sync-stuff)
While also not needing to have a separate key for the async type (e.g.
{:async true :protocol :websockets :reactor (fn [x]...)}). Though
writing it down now it occurs to me that this could turn against us if
we wanted to add more keys to the async response - these new keys
would have explicit key names while that for the protocol would be
implicit (:async).
> Presumably the :headers message can be sent more than once? If so,
> then I think :chunk should be :body. It doesn't make much sense to use
> :chunk if you can send chunks of headers, or chunks of the body. The
> fact you're sending the data in pieces is implied by the fact it is
> asynchronous. Also, calling it :body ties in nicely with the
> synchronous request map.
My thinking here was that :body could be provided in middleware by
emitting the data of that event as a :chunk and then a :close event.
One the one hand :body seems symmetric with the synchronous map, but
I'm not sure about giving the same name to things that can be sent
only once in the sync case and any number of times in the async case.
I'm open to the idea of :header being the primitive instead :headers,
with :header as middleware that implements a series of :header events.
Would it be useful to emit individual headers like that? I'd want to
check that implementations could be asked to flush individual header
lines before committing to that interface.
I share these sentiments.
> That's all I can think of at the moment.
Thanks for the feedback!
- Mark
> - James
>
Right, so (1) definitely needs to be in place or we're going nowhere.
On (2) we have different understandings of the IO model. My
understanding is that the IO level will in general have multiple IO
threads and that callbacks for events on a single connection may come
from any number of these threads. However, callback 2 on thread B can
only be invoked strictly after callback 1 on thread A has returned.
This really has to be the case if you are going to use these callbacks
to enquque events in to a queue/channel for later orderly consumption:
if there was a race in using these callbacks without a channel then
there would be a race in enqueuing the items into a channel.
Given the strong semantics on (2) that we can get with a reactor,
there is nothing especially tricky about (3). You can assume the first
message that you process is the first message in the stream, and that
other events have the risk of be processed concurrently or before that
first event only if you leave the callback threads in an undisciplined
way, which is the same case as one would have with receiving events
form a channel.
It seems like a lot hinges on (2) - am I misunderstanding something there?
- Mark
I'm not sure I completely understand this argument.
Support for something like Lamina's channels could be provided via
middleware, as far as I can see.
This may lead to reimplementing functionality that already exists in
the library the adapter is using. However, Ring already does this for
parameters, cookies, sessions, etc. To my mind, a simpler
specification is worth some redundancy.
Is there any other disadvantage to implementing channels as standard
middleware, rather than baking them into the adapter?
- James
You suspect that a lot of reactors will want channel-like behavior;
let's start by picking one or a few of those and seeing what the
reactors would look like with and without channels, how bad the
fragmentation would be if two participants didn't both natively speak
one or the other protocols, how it would be improved by having a
native vs. third-party channel implementation, etc. Have any
particular examples in mind?
- Mark
Yes, I think something like this should be standard middleware in
ring.core (or perhaps ring.async).
My feeling is that the Ring SPEC should represent the minimum
necessary interface. If something can be factored out into a library,
it should be. For example, URL-encoded parameters can be derived from
the query-string and body of the HTTP request; therefore, parameters
are implemented via the wrap-params middleware, rather than part of
the SPEC.
This approach has a few advantages:
1. The SPEC is simpler to understand and simpler to implement.
2. It doesn't forces users (even a minority of users) to include
functionality they might not need.
3. Alternative implementations can be supported (e.g. JSON parameters
instead of URL-encoded parameters)
- James
We have a chat client, where each client first sends their name, and
then the chat room they want to join. After that, all messages will
be sent to everyone else in the channel with their name prepended. I
think the reactor implementation will be same no matter what
assumptions we make about (2), so really this is just test of (3) -
that is, how easy is it to program against a pure async model?
Does anyone have a strong opinion about the suitability of this
example? I can come with some others, but mine will probably be
overfit for channels, since that's how I've been thinking about things
for the last few months.
Zach
> I'm not sure it's a great acid test for async middleware, but here's
> an example I use in the Aleph documentation:
>
> We have a chat client, where each client first sends their name, and
> then the chat room they want to join. After that, all messages will
> be sent to everyone else in the channel with their name prepended. I
> think the reactor implementation will be same no matter what
> assumptions we make about (2), so really this is just test of (3) -
> that is, how easy is it to program against a pure async model?
>
> Does anyone have a strong opinion about the suitability of this
> example? I can come with some others, but mine will probably be
> overfit for channels, since that's how I've been thinking about things
> for the last few months.
I think your example is a good one.
I've got a couple more...
Warning: I don't have a lot to say about how this should be implemented in Ring or Aleph, other than I'd like using it to be easy for *me* to use :-)
The first example that's a tiny bit different and happens to be something I'm working on right now. I am building a staff scheduling system. It has a UI using a javascript 2D engine (raphael) to represent the combined problem definition and current solution. The user can manipulate aspects of the problem definition by mucking about with things in the UI (e.g. moving a shift, changing demand etc). These changes will cause a stream of data to be sent to the server over a websocket (say). It is important that the changes be processed in order. As a simplification, the responses from the server are of three kinds: minimal-feedback-to-user, update-problem-definition, and update-current-solution. All three will be 'deferred' in some way. minimal-feedback-to-user by maybe a fraction of a second so might reflect several changes. update-problem-definition will be after the user stops changing things, so will definitely reflect a perhaps large number of changes. And update-current-solution could take quite a long time (conceivably minutes, hopefully not hours, so maybe there are intermediate UI updates). If the user isn't logged in then no socket connection can be allowed. There could be a lot of users solving *different* problems, I don't think 'team scheduling' is going to be supported :-) This means that the source of the information needs to be matched to the problem being solved. The processing of each change from the user could be longer than the interval between changes, and since order of change matters, these changes will have to be serialised.
A second example, maybe more on the non-channel side of things... Many years ago I worked on a monitoring system. Basically this thing watched over a lot of hardware by analysing many streams of data (status reports) and looking for bad things (which could be quite complex situations involving the state over time or over several sources or both). The analysis was done by a bunch of CPUs working in parallel then filtering and queuing results for a second pass (well, there were more than 2 passes) that was order dependent. In our case time sensitive status would be separated by enough time that we could guarantee (this was a seriously hard realtime system) processing was complete before the second status report appeared. The concern is only that the related-but-time-separated data be processed in order, no ordering on any other processing is required (in this system there were all kinds of latencies so you don't know the real order anyway). All output would be to something that 'raised the alarm'.
So, to translate to async... A bunch of websockets are opened that stream a lot of data in really fast and expect no response at all. I would think that a pool of threads pickup the data as they arrive and do some filtering and processing. The processing can be done quickly enough that it'll be done before the next time-sensitive data arrives (this requirement isn't of concern for us, take it as a 'given'). The filtered results could be passed on to another websocket(s) or queue and so on to a second round of processing where the order does matter.
I think the first stage of this is most interesting. It doesn't need the data ordered, but it might be easier/better to implement (for *me*) as a channel?? Is thread management going to be a problem if you don't have a channel? There could be a lot of websockets to be watched over in this scenario. In the actual problem the input streams were multiplexed to something like 64 sockets (think 64 different NICs) with multiples of 100's of thousands of data sources.
Cheers,
Bob
A few more interesting pages:
* Channels proposal for Clojure Contrib:
http://dev.clojure.org/display/design/Channels
* .NET Reactive Extensions for async data: http://rxwiki.wikidot.com/start
In the former Rich says about the later "No one should try to do any
design work for async in Clojure without thoroughly understanding what
Erik Meijer has done with .Net Reactive Extensions."
I know I have some reading to do!
- Mark
> == Proposal
>
> I propose a variation on James' original channels interface for
> general async support in Ring.
>
> In the proposed design, synchronous handlers, middlewares, and
> adapters are completely unchanged and require no modification. If an
> application wants to respond asynchronously, it returns a response map
> indicatating the type of async operation desired and the 'reactor'
> that will implement async event handling:
>
> (defn handler [req]
> {:async :websocket
> :reactor <reactor>})
>
> In particular, the following event types may be exected:
>
> * websocket inbound - :connect, :message, :disconnect, :error
> * websocket outbound - :message, :disconnect
> * http inbound - :error
> * http outbound - :status, :headers, :chunk, :close
>
> I've implemented ring.adapter.jetty-async, a Ring adapter that uses
> the new Jetty 8 library to support both asynchronous HTTP and
> WebSockets through the proposed asynchronous Ring interface. The code
> is in a branch of the main Ring repo:
>
> http://github.com/mmcgrana/ring/tree/jetty-async
>
> I've also created a demo project that shows how one could use the
> proposed interface to combine synchronous HTTP, long-poll HTTP,
> streaming HTTP, and websockets in the same application. The demo is
> available as a Git repo:
>
> git clone git://gist.github.com/657694.git try-jetty-async
> cd try-jetty-async
> lein deps
>
> # in one terminal
> lein run -m server
>
> # in another terminal
> curl -i http://localhost:8080/sync
> curl -i http://localhost:8080/poll
> curl -i http://localhost:8080/stream
> ruby wsclient.rb ws://localhost:8080/websocket?name=bob
> => hi
> => quit
>
> The file src/server.clj implements all of these endpoints and is a
> good example of how asynchronous Ring handlers might be constructed
> and composed with synchronous handlers and middlewares.
>
>
> == Feedback
>
> I look forward to your feedback and criticisms. I'm interested in
> feedback both on the specific proposal and on the general goals that I
> laid out earlier.
>
> James, I'd be particularly interested in your feedback as the original
> author of the channel spec on which this proposal is based. Zach, I'd
> be interested in hearing how you think this would or wouldn't mesh
> with Aleph. In general I'd love to hear from people who are currently
> using async operations in production apps - Clojure, Java, or
> otherwise.
>
> That said, all comments are welcome and encouraged!
>
> - Mark
>
Zach
I quite like this syntax. The :protocol and :async keys are more
descriptive than just :async.
> My thinking here was that :body could be provided in middleware by
> emitting the data of that event as a :chunk and then a :close event.
> One the one hand :body seems symmetric with the synchronous map, but
> I'm not sure about giving the same name to things that can be sent
> only once in the sync case and any number of times in the async case.
Ah, I see your reasoning now. I think a utility function would be more
suited than middleware in this case, e.g.
(defn chunk [send data]
(send {:type :chunk, :data data}))
(defn close [send]
(send {:type close})
(defn body [send data]
(doto send
(chunk body)
(close)))
But I wonder whether "body-and-close" wouldn't be a more descriptive function:
(defn body [send data]
(send {:type :chunk, :data data}))
(defn body-and-close [send data]
(doto send
(chunk body)
(close)))
> I'm open to the idea of :header being the primitive instead :headers,
> with :header as middleware that implements a series of :header events.
> Would it be useful to emit individual headers like that? I'd want to
> check that implementations could be asked to flush individual header
> lines before committing to that interface.
I'm not certain how useful it would be, but it would have a certain
symmetry with the normal request maps.
Although again, I think utility functions would be better suited than
middleware.
- James
--
You received this message because you are subscribed to the Google Groups "Ring" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ring-clojure...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.