Some thoughts:
Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
| Method B.ByteString
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
data HttpVersion = Http09 | Http10 | Http11
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:
[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:
[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.
For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.
- Mark
Mark Lentczner
http://www.ozonehouse.com/mark/
IRC: mtnviewmark
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
I like this project! Thanks for resurrecting it!
Some thoughts:
Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
| Method B.ByteString
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
data HttpVersion = Http09 | Http10 | Http11
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:
[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]
Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:
[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.
For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.
* Are you sure that a strict bytestring is fine here? Can the request body
be used to upload large data? If so why not use a lazy one, not (only)
for its laziness but for being a list of chunks that fits well into memory
caches...
* I would call ResponseBody a ResponseReceiver given its use and the sigs
of the methods.
* Why not use sendLazyByteString in sendFile as a default method, this
will fix your "TODO" since I believe the chunk size would be a good one.
* Maybe a ResponseReceiver Handle instance could be provided. Since it
requires no data type definition and would make an orphan instance elsewhere.
Maybe a one for sockets would make sense as well.
> * This might just be bikeshedding, but renamed RequestMethod to Method to
> make names slightly shorter and more consistent.
Good for me
> * I implemented Mark's suggestions of adding support for arbitrary request
> methods and information on HTTP version.
Nice
> I've been having some off-list discussions about WAI, and have a few issues
> to bring up. The first is relatively simple: what do we do about consuming
> the entire request body? Do we leave that as a task to the application, or
> should the server ensure that the entire request body is consumed?
Good question, is there something in the HTTP spec about this. I don't think
so, and I think it would make sense to give up early if you consider the
input as garbage.
> Next, I have made the ResponseBodyClass typeclass specifically with the goal
> of allowing optimizations for lazy bytestrings and sending files. The former
> seems far-fetched; the latter provides the ability to use a sendfile system
> call instead of copying the file data into memory. However, in the presence
> of gzip encoding, how useful is this optimization?
It is useful anyway.
> Finally, there is a lot of discussion going on right now about enumerators.
> The question is whether the WAI protocol should use them. There are two
> places where they could replace the current offering: request body and
> response body.
>
> In my opinion, there is no major difference between the Hyena definition of
> an enumerator and the current response body sendByteString method. The
> former provides two extra features: there's an accumulating parameter passed
> around, and a method for indicating early termination. However, the
> accumulating parameter seems unnecesary to me in general, and when needed we
> can accomplish the same result with MVars. Early termination seems like
> something that would be unusual in the response context, and could be
> handled with exceptions.
IORefs could be sufficient (instead of MVars) but this seems a bit ugly
compared to the accumulator. In the other hand sometimes you don't need
the accumulator and so just pass a dump unit. If we live in IO yes exceptions
could do that. However the point of the Either type is to remind you that
you have two cases to handle.
> For the request body, there is a significant difference. However, I think
> that the current approach (called imperative elsewhere) is more in line with
> how most people would expect to program. At the same time, I believe there
> is no performance issue going either way, and am open to community input.
Why an imperative approach would be more in line when using a purely
functional language?
Regards,
--
Nicolas Pouillard
http://nicolaspouillard.fr
On Sat, 23 Jan 2010 21:31:47 +0200, Michael Snoyman <mic...@snoyman.com> wrote:* Are you sure that a strict bytestring is fine here? Can the request body
> Just as an update, I've made the following changes to my WAI git repo (
> http://github.com/snoyberg/wai):
>
> * I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe
> ByteString)". This is a good example of tradeoffs versus the enumerator
> approach (see below).
be used to upload large data? If so why not use a lazy one, not (only)
for its laziness but for being a list of chunks that fits well into memory
caches...
* I would call ResponseBody a ResponseReceiver given its use and the sigs
of the methods.
* Why not use sendLazyByteString in sendFile as a default method, this
will fix your "TODO" since I believe the chunk size would be a good one.
* Maybe a ResponseReceiver Handle instance could be provided. Since it
requires no data type definition and would make an orphan instance elsewhere.
Maybe a one for sockets would make sense as well.
> * This might just be bikeshedding, but renamed RequestMethod to Method toGood for me
> make names slightly shorter and more consistent.
Nice
> * I implemented Mark's suggestions of adding support for arbitrary request
> methods and information on HTTP version.
Good question, is there something in the HTTP spec about this. I don't think
> I've been having some off-list discussions about WAI, and have a few issues
> to bring up. The first is relatively simple: what do we do about consuming
> the entire request body? Do we leave that as a task to the application, or
> should the server ensure that the entire request body is consumed?
so, and I think it would make sense to give up early if you consider the
input as garbage.
[--snip--]
> Next, I have made the ResponseBodyClass typeclass specifically with the goal
> of allowing optimizations for lazy bytestrings and sending files. The former
> seems far-fetched; the latter provides the ability to use a sendfile system
> call instead of copying the file data into memory. However, in the presence
> of gzip encoding, how useful is this optimization?
[--snip--]
I'm hoping that the "Web" bit in your project title doesn't literally
mean that WAI is meant to be restricted to solely serving content to
browsers. With that caveat in mind:
For non-WWW HTTP servers it can be extremely useful to have sendfile. An
example is my Haskell UPnP Media Server (hums) application. It's sending
huge files (AVIs, MP4s, etc.) over the network and since these files are
already compressed as much as they're ever going to be, gzip would be
useless. The CPU load of my hums server went from 2-5% to 0% when
streaming files just from switching from a Haskell I/O based solution to
proper sendfile.
Lack of proper support for sendfile() was indeed one of the reasons that
I chose to roll my own HTTP server for hums. I should note that this was
quite a while ago and I haven't really gone back to reevaluate that
choice -- there's too many HTTP stacks to choose from right now and I
don't have the time to properly evaluate them all.
For this type of server, response *streaming* is also extremely
important for those cases where you cannot use sendfile, so I'd hate to
see a standard WAI interface preclude that. (No, lazy I/O is NOT an
option -- the HTTP clients in a typical UPnP media client behave so
badly that you'll run out of file descriptors in no time. Trust me, I've
tried.)
Cheers,
Good reason indeed.
> For this type of server, response *streaming* is also extremely
> important for those cases where you cannot use sendfile, so I'd hate to
> see a standard WAI interface preclude that. (No, lazy I/O is NOT an
> option -- the HTTP clients in a typical UPnP media client behave so
> badly that you'll run out of file descriptors in no time. Trust me, I've
> tried.)
Is the experiment easily re-doable? I would like to try using safe-lazy-io
instead.
--
Nicolas Pouillard
http://nicolaspouillard.fr