[Haskell-cafe] PROPOSAL: Web application interface

8 views
Skip to first unread message

Michael Snoyman

unread,
Jan 17, 2010, 11:01:03 AM1/17/10
to Haskell Cafe
Following up on the previous thread, I've started a github project for some ideas of a web application interface. It's borrowing from both Hyena and Hack, with a few of its own ideas. The project is available at http://github.com/snoyberg/wai, and the Network.Wai module is available at http://github.com/snoyberg/wai/blob/master/Network/Wai.hs. The repository also includes a port of hack-handler-simpleserver, and an incredibly simple webapp to demonstrate usage. I intend to make the demonstration slightly more sophisticated. Finally, the repository is now yet cabalized.

I consider this currently to be a straw-man proposal, intended to highlight the issues of contention that may arise. It would be wonderful is we could get the major players in the Haskell web space to get behind a single WAI.

The entire Network.Wai module right now weighs in at only 74 lines, so I do not consider this to be a heavy-weight proposal. Here as some design notes:

  • Most important point: RequestBody and ResponseBody. I will explain below.
  • I've renamed "Env" in Hack and "Environment" in Hyena to "Request." This seems more consistent with other technologies out there. However, I have no feelings on this subject at all, and can easily bend to public demand.
  • I've stuck with UrlScheme from Hack, while Hyena called it protocol. Similar, the RequestMethod constructors are ALLCAPS like Hack, unlike Hyena's Uppercase. Once again, no strong feelings.
  • I've sided with Hyena as far as making all representations in ByteString. Current exception is remoteHost, which is a Hack-only variable in any event.
  • Instead of representing the response as a tuple ala Hyena, created a data type like Hack.
  • The only dependency for this module is bytestring. It might be tempting to represent RequestBody and ResponseBody with a ReaderT IO monad, but this would introduce a dependency on either mtl or transformers, which I would consider a Very Bad Idea.
The main complaint against Hack is its lack of an enumerator interface for the request and response body. However, this simply words the complaint incorrectly; I don't think anyone is married to the need of an enumerator. Rather, we want to be able to efficiently handle arbitrarily lengthed content in constant space, without necesarily resorting to unsafeInterleaveIO (ie, lazy I/O).

There are a number of issues with left-fold enumerators IMO. It is basically promoting an inversion of control. This may be often times valuable. However, to make this the *only* interface precludes other use cases. The most basic one I ran into was wanting to interleave with read processes. I do not mean to say that it's impossible to interleave reads in such a manner, but I think it's more natural in the approach advocated by wai.

I consider RequestBody and ResponseBody to be mirroring the CGI protocol. Essentially, each handler (CGI, simpleserver, FastCGI, happstack server, etc) will define data types which instanciate RequestBodyClass and ResponseBodyClass. RequestBodyClass provides a single method, receiveByteString, to extract a chunk of data from the request body. ResponseBodyClass provides (currently) three methods, for sending strict bytestrings, lazy bytestrings, and files. While default implementations are provided for the last two based on the first, implementations can provide more efficient versions of them if desired. For example, sendFile might be replaced by a system call to sendfile.

Let me know your thoughts. I'm purposely leaving out many of my reasons for the decisions I've made for brevity, since this e-mail is long enough as is. I'm happy to answer any questions as to why I went in a certain direction. It's also possible that I simply overlooked a detail.

Michael

Mark Lentczner

unread,
Jan 18, 2010, 1:54:16 AM1/18/10
to Haskell Cafe
I like this project! Thanks for resurrecting it!

Some thoughts:

Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
| Method B.ByteString

Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.

The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.

The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
data HttpVersion = Http09 | Http10 | Http11

Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:

[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]

Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:

[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]

Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.

For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.

Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.

- Mark

Mark Lentczner
http://www.ozonehouse.com/mark/
IRC: mtnviewmark

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Michael Snoyman

unread,
Jan 18, 2010, 6:48:37 AM1/18/10
to Mark Lentczner, Haskell Cafe
Mark, thanks for the response, it's very well thought out. Let me state two things first to explain some of my design decisions.

Firstly, I'm shooting for lowest-common-denominator here. Right now, I see that as the intersection between the CGI backend and a standalone server backend; I think anything contained in both of those will be contained in all other backends. If anyone has a contrary example, I'd be happy to see it.

Secondly, the WAI is *not* designed to be "user friendly." It's designed to be efficient and portable. People looking for a user-friendly way to write applications should be using some kind of frontend, either a framework, or something like hack-frontend-monadcgi.

That said, let's address your specific comments.


On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner <ma...@glyphic.com> wrote:
I like this project! Thanks for resurrecting it!

Some thoughts:

Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
       | Method B.ByteString

Seems logical to me.
 
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.

Going to the standalone vs CGI example: in a CGI script, scriptName is a well defined variable. However, it has absolutely no meaning to a standalone handler. I think we're just feeding rubbish into the system. I'm also not certain how one could *use* scriptName in any meaningful manner, outside of trying to reconstruct a URL (more on this topic below).
 
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.

I think it's impossible to ever reconstruct a URL for a CGI application. I've tried it; once you start dealing with mod_rewrite, anything could happen. Given that I think we should encourage users to make pretty URLs via mod_rewrite, I oppose inserting such a function. When I need this kind of information (many of my web apps do), I've put it in a configuration file.

However, I don't think it's a good idea to hide information that is universal to all webapps. urlScheme in particular seems very important to me; for example, maybe when serving an app over HTTPS you want to use a secure static-file server as well. Frankly, I don't have a use case for serverName and serverPort that don't involve reconstructing URLs, but my gut feeling is better to leave it in the protocol in case it does have a use case.
 
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
       data HttpVersion = Http09 | Http10 | Http11

I had not thought of that at all, and I like it. However, do we want to hard-code in all possible HTTP versions? In theory, there could be more standards in the future. Plus, isn't Google currently working on a more efficient approach to HTTP that would affect this?
 
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:

       [(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]

Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:

       [(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]

This approach would make WAI much more top-heavy and prone to becoming out-of-date. I don't oppose having this module in a separate package, but I want to keep WAI itself as lite as possible.
 
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.

As I stated at the beginning of this response, there should be a framework or frontend sitting between WAI and the application. And given that the actual data on the wire will be represented as a stream of bytes, I'd rather stick with that.

For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.

Same comment of top-heaviness.
 
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.

Thanks for that information, I was unaware. However, I think it still makes sense to keep WAI as low-level as possible, which would mean a sequence of bytes.

Michael

Michael Snoyman

unread,
Jan 23, 2010, 2:31:47 PM1/23/10
to Mark Lentczner, Haskell Cafe
Just as an update, I've made the following changes to my WAI git repo (http://github.com/snoyberg/wai):

* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe ByteString)". This is a good example of tradeoffs versus the enumerator approach (see below).
* This might just be bikeshedding, but renamed RequestMethod to Method to make names slightly shorter and more consistent.
* I implemented Mark's suggestions of adding support for arbitrary request methods and information on HTTP version.

I've been having some off-list discussions about WAI, and have a few issues to bring up. The first is relatively simple: what do we do about consuming the entire request body? Do we leave that as a task to the application, or should the server ensure that the entire request body is consumed?

Next, I have made the ResponseBodyClass typeclass specifically with the goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization?

Finally, there is a lot of discussion going on right now about enumerators. The question is whether the WAI protocol should use them. There are two places where they could replace the current offering: request body and response body.

In my opinion, there is no major difference between the Hyena definition of an enumerator and the current response body sendByteString method. The former provides two extra features: there's an accumulating parameter passed around, and a method for indicating early termination. However, the accumulating parameter seems unnecesary to me in general, and when needed we can accomplish the same result with MVars. Early termination seems like something that would be unusual in the response context, and could be handled with exceptions.

For the request body, there is a significant difference. However, I think that the current approach (called imperative elsewhere) is more in line with how most people would expect to program. At the same time, I believe there is no performance issue going either way, and am open to community input.

Michael

Nicolas Pouillard

unread,
Jan 23, 2010, 7:38:18 PM1/23/10
to Michael Snoyman, Mark Lentczner, Haskell Cafe
On Sat, 23 Jan 2010 21:31:47 +0200, Michael Snoyman <mic...@snoyman.com> wrote:
> Just as an update, I've made the following changes to my WAI git repo (
> http://github.com/snoyberg/wai):
>
> * I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe
> ByteString)". This is a good example of tradeoffs versus the enumerator
> approach (see below).

* Are you sure that a strict bytestring is fine here? Can the request body
be used to upload large data? If so why not use a lazy one, not (only)
for its laziness but for being a list of chunks that fits well into memory
caches...

* I would call ResponseBody a ResponseReceiver given its use and the sigs
of the methods.

* Why not use sendLazyByteString in sendFile as a default method, this
will fix your "TODO" since I believe the chunk size would be a good one.

* Maybe a ResponseReceiver Handle instance could be provided. Since it
requires no data type definition and would make an orphan instance elsewhere.
Maybe a one for sockets would make sense as well.

> * This might just be bikeshedding, but renamed RequestMethod to Method to
> make names slightly shorter and more consistent.

Good for me

> * I implemented Mark's suggestions of adding support for arbitrary request
> methods and information on HTTP version.

Nice

> I've been having some off-list discussions about WAI, and have a few issues
> to bring up. The first is relatively simple: what do we do about consuming
> the entire request body? Do we leave that as a task to the application, or
> should the server ensure that the entire request body is consumed?

Good question, is there something in the HTTP spec about this. I don't think
so, and I think it would make sense to give up early if you consider the
input as garbage.

> Next, I have made the ResponseBodyClass typeclass specifically with the goal
> of allowing optimizations for lazy bytestrings and sending files. The former
> seems far-fetched; the latter provides the ability to use a sendfile system
> call instead of copying the file data into memory. However, in the presence
> of gzip encoding, how useful is this optimization?

It is useful anyway.

> Finally, there is a lot of discussion going on right now about enumerators.
> The question is whether the WAI protocol should use them. There are two
> places where they could replace the current offering: request body and
> response body.
>
> In my opinion, there is no major difference between the Hyena definition of
> an enumerator and the current response body sendByteString method. The
> former provides two extra features: there's an accumulating parameter passed
> around, and a method for indicating early termination. However, the
> accumulating parameter seems unnecesary to me in general, and when needed we
> can accomplish the same result with MVars. Early termination seems like
> something that would be unusual in the response context, and could be
> handled with exceptions.

IORefs could be sufficient (instead of MVars) but this seems a bit ugly
compared to the accumulator. In the other hand sometimes you don't need
the accumulator and so just pass a dump unit. If we live in IO yes exceptions
could do that. However the point of the Either type is to remind you that
you have two cases to handle.

> For the request body, there is a significant difference. However, I think
> that the current approach (called imperative elsewhere) is more in line with
> how most people would expect to program. At the same time, I believe there
> is no performance issue going either way, and am open to community input.

Why an imperative approach would be more in line when using a purely
functional language?

Regards,

--
Nicolas Pouillard
http://nicolaspouillard.fr

Michael Snoyman

unread,
Jan 24, 2010, 1:12:58 AM1/24/10
to Nicolas Pouillard, Haskell Cafe
On Sun, Jan 24, 2010 at 2:38 AM, Nicolas Pouillard <nicolas....@gmail.com> wrote:
On Sat, 23 Jan 2010 21:31:47 +0200, Michael Snoyman <mic...@snoyman.com> wrote:
> Just as an update, I've made the following changes to my WAI git repo (
> http://github.com/snoyberg/wai):
>
> * I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe
> ByteString)". This is a good example of tradeoffs versus the enumerator
> approach (see below).

* Are you sure that a strict bytestring is fine here? Can the request body
 be used to upload large data? If so why not use a lazy one, not (only)
 for its laziness but for being a list of chunks that fits well into memory
 caches...

Sorry, this is where I should have put in some documentation. The IO (Maybe ByteString) returns chunks of strict bytestring until it encounters the end of the body. If we were to use a lazy bytestring, we would either need lazy I/O or to read everything into memory (which is what we're trying to avoid).

The handler has the prerogative to determine chunk size.

* I would call ResponseBody a ResponseReceiver given its use and the sigs
 of the methods.

* Why not use sendLazyByteString in sendFile as a default method, this
 will fix your "TODO" since I believe the chunk size would be a good one.

* Maybe a ResponseReceiver Handle instance could be provided. Since it
 requires no data type definition and would make an orphan instance elsewhere.
 Maybe a one for sockets would make sense as well.

Sorry, I added a few more patches since sending this e-mail. I did away with ResponseBody as well, and replaced it with Either FilePath ((ByteString -> IO ()) -> IO ()). This is *very* close to the Hyena version in my opinion, with three differences (I think I've written these elsewhere, so sorry if I'm repeating myself).

1) It provides the option of providing optimized file sending, as per the Happstack sendfile system call. I was concerned at first that we might wish to provide sending multiple files, but I think people will prefer the simplicity that comes without having a typeclass. I'm completely open to revisiting this issue, as I have no strong feelings.
2) There is no "accumulating parameter" as there is with Hyena. In the general case, I don't think it's necesary, and when it is, we can use MVars.
3) There is no built in way to force early termination. I think this is a better approach, since early termination would be an exceptional situation. Forcing the application to check a return value each time would be overhead that would rarely be used, and we can achieve the same effect with an exception.

Sorry for not sending this update earlier, but I only finished at about 2:30 last night. I found it difficult to write coherently.

> * This might just be bikeshedding, but renamed RequestMethod to Method to
> make names slightly shorter and more consistent.

Good for me

> * I implemented Mark's suggestions of adding support for arbitrary request
> methods and information on HTTP version.

Nice

> I've been having some off-list discussions about WAI, and have a few issues
> to bring up. The first is relatively simple: what do we do about consuming
> the entire request body? Do we leave that as a task to the application, or
> should the server ensure that the entire request body is consumed?

Good question, is there something in the HTTP spec about this. I don't think
so, and I think it would make sense to give up early if you consider the
input as garbage.

What do you mean by this? That we don't need to consume that input at all, or that the server should be held responsible for "/dev/null"ing  the data?


Because I don't think it really *is* an imperative approach. For that matter, enumerators are frankly also an "imperative approach." It's frankly a silly distinction IMO. The question is whether this is a *good* approach. I think passing in an output function fits very nicely with Haskell.

The question to me lies more on the request side than the response side. Basically, should the application provide a caller or a callee for reading the request body? Most of the time, the latter is simpler to write I believe.

Ha! I finally found the article I'd read a while ago demonstrating this point in C. You can obviously disagree with the sentiment there, but I've found the point to be true in Haskell as well: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

Michael

Bardur Arantsson

unread,
Jan 24, 2010, 6:23:46 AM1/24/10
to haskel...@haskell.org
Michael Snoyman wrote:

[--snip--]


> Next, I have made the ResponseBodyClass typeclass specifically with the goal
> of allowing optimizations for lazy bytestrings and sending files. The former
> seems far-fetched; the latter provides the ability to use a sendfile system
> call instead of copying the file data into memory. However, in the presence
> of gzip encoding, how useful is this optimization?

[--snip--]

I'm hoping that the "Web" bit in your project title doesn't literally
mean that WAI is meant to be restricted to solely serving content to
browsers. With that caveat in mind:

For non-WWW HTTP servers it can be extremely useful to have sendfile. An
example is my Haskell UPnP Media Server (hums) application. It's sending
huge files (AVIs, MP4s, etc.) over the network and since these files are
already compressed as much as they're ever going to be, gzip would be
useless. The CPU load of my hums server went from 2-5% to 0% when
streaming files just from switching from a Haskell I/O based solution to
proper sendfile.

Lack of proper support for sendfile() was indeed one of the reasons that
I chose to roll my own HTTP server for hums. I should note that this was
quite a while ago and I haven't really gone back to reevaluate that
choice -- there's too many HTTP stacks to choose from right now and I
don't have the time to properly evaluate them all.

For this type of server, response *streaming* is also extremely
important for those cases where you cannot use sendfile, so I'd hate to
see a standard WAI interface preclude that. (No, lazy I/O is NOT an
option -- the HTTP clients in a typical UPnP media client behave so
badly that you'll run out of file descriptors in no time. Trust me, I've
tried.)

Cheers,

Nicolas Pouillard

unread,
Jan 24, 2010, 1:45:25 PM1/24/10
to Bardur Arantsson, haskel...@haskell.org
On Sun, 24 Jan 2010 12:23:46 +0100, Bardur Arantsson <sp...@scientician.net> wrote:
> Michael Snoyman wrote:
>
> [--snip--]
> > Next, I have made the ResponseBodyClass typeclass specifically with the goal
> > of allowing optimizations for lazy bytestrings and sending files. The former
> > seems far-fetched; the latter provides the ability to use a sendfile system
> > call instead of copying the file data into memory. However, in the presence
> > of gzip encoding, how useful is this optimization?
> [--snip--]
>
> I'm hoping that the "Web" bit in your project title doesn't literally
> mean that WAI is meant to be restricted to solely serving content to
> browsers. With that caveat in mind:
>
> For non-WWW HTTP servers it can be extremely useful to have sendfile. An
> example is my Haskell UPnP Media Server (hums) application. It's sending
> huge files (AVIs, MP4s, etc.) over the network and since these files are
> already compressed as much as they're ever going to be, gzip would be
> useless. The CPU load of my hums server went from 2-5% to 0% when
> streaming files just from switching from a Haskell I/O based solution to
> proper sendfile.
>
> Lack of proper support for sendfile() was indeed one of the reasons that
> I chose to roll my own HTTP server for hums. I should note that this was
> quite a while ago and I haven't really gone back to reevaluate that
> choice -- there's too many HTTP stacks to choose from right now and I
> don't have the time to properly evaluate them all.

Good reason indeed.

> For this type of server, response *streaming* is also extremely
> important for those cases where you cannot use sendfile, so I'd hate to
> see a standard WAI interface preclude that. (No, lazy I/O is NOT an
> option -- the HTTP clients in a typical UPnP media client behave so
> badly that you'll run out of file descriptors in no time. Trust me, I've
> tried.)

Is the experiment easily re-doable? I would like to try using safe-lazy-io
instead.

--
Nicolas Pouillard
http://nicolaspouillard.fr

Michael Snoyman

unread,
Jan 24, 2010, 4:00:23 PM1/24/10
to Bardur Arantsson, haskel...@haskell.org
Both sendfile and response streaming are in the top priorities in the WAI proposal. As far as "web," I think the term is just a synonym for HTTP here.

I'd be especially interested to hear input from people using Haskell for non-standard HTTP applications, because I want WAI to be as general as possible. Please let me know if you see anything that you would like added. The code is all available at http://github.com/snoyberg/wai

Michael

Michael Snoyman

unread,
Jan 24, 2010, 6:30:00 PM1/24/10
to haskel...@haskell.org
Minor spec question: what should be the defined behavior when an application requests that a file be sent and it does not exist?
Reply all
Reply to author
Forward
0 new messages