outstanding jsgi spec issues

32 views
Skip to first unread message

Dean Landolt

unread,
Sep 1, 2009, 6:14:50 PM9/1/09
to comm...@googlegroups.com
There's been a lot of talk about jsgi spec issues (perhaps too much), so in an interest in wrapping it all up and making it easier on jsgi server maintainers, here is a list of the issues that have been brought up and the options I'm aware of...


Response object vs. triple:

We could tally the votes on the last thread, but it seemed like the object fans won by a landslide. I believe Tom is changing jack to return objects today -- are there any final objections?



PATH_INFO *MUST NOT* be decoded:

Some time ago I'd proposed that PATH_INFO MUST NOT be decoded [1]. This one isn't just a shed color -- it's burned wsgi devs and will be changed in wsgi 2.0. Not being able to tell the difference between a raw and encoded slash could be very limiting for some applications.



Required jsgi.* keys should not use the dot separator:

I'd also proposed changing the jsgi keys to avoid dot notation [2]. I gave some valid reasons in the thread, but I recognize it's not a huge deal, just a mild inconvenience and sets a bad precedent for library authors. It doesn't make as much sense in jsgi like it did originally as put forth in wsgi. Here are three options:


(1) jsgi.input (no change)
(2) jsgi_input (to match CGI_KEY form)
(3) jsgiInput (more javascripty)

Any other proposals?



Response header case:

Should we spec a consistent case to avoid setting duplicate keys with differing cases?


(A) no change
(B) lower_case
(C) UPPER_CASE
(D) Mixed_Case (perhaps based on some authority list for edge cases like ETag)
(E) require and specify a HashP implementation



2E for me.


Other periphery issues:

Should it be specified whether an env can be submitted to an app multiple times in one request context (e.g. like jack's cascade middleware). This seems to break in persevere's jsgi implementation, but it's not spec'd so I couldn't say if it's a bug.

Should it be specified whether the input stream MUST be rewindable? Rack specifies this, but after mulling it over it seems like unnecessary overhead. If you want to peek at the input stream you could always accumulate it into a ByteString and replace jsgi.input.



[1] http://groups.google.com/group/commonjs/browse_thread/thread/dbb8c27664a6c778
[2] http://groups.google.com/group/commonjs/browse_thread/thread/7662580bb89fa991/4fa8c91f6b560b8a

Isaac Z. Schlueter

unread,
Sep 1, 2009, 11:45:55 PM9/1/09
to CommonJS
> PATH_INFO *MUST NOT* be decoded:

+1 We're decoding PATH_INFO? If so, that has to stop.

You're absolutely right. This isn't trivial. This is kind of a big
deal. I can say from painful personal experience that this will
surprise you with obnoxious errors if it hasn't already.

> (1) jsgi.input (no change)
> (2) jsgi_input (to match CGI_KEY form)
> (3) jsgiInput (more javascripty)

Minor issue, I think. 2 or 3 are a bit prettier, I think. If we go
with 2, then i'd prefer lower_snake_case so that it's clear which are
CGI keys and which are JSGI. That being said, I certainly don't want
to bother doing the work to change everything over, so IMO the jsgi
implementation maintainers should have the final word on this. If
they want to leave it as jsgi.input, then that's fine with me.

I think that requiring a specific HashP implementation in JSGI is a
mistake, either for this or for the response header case. JSGI should
be principally a data exchange protocol, and should not specify
algorithms.

--i

Tom Robinson

unread,
Sep 2, 2009, 12:01:09 AM9/2/09
to comm...@googlegroups.com

On Sep 1, 2009, at 3:14 PM, Dean Landolt wrote:

> There's been a lot of talk about jsgi spec issues (perhaps too
> much), so in an interest in wrapping it all up and making it easier
> on jsgi server maintainers, here is a list of the issues that have
> been brought up and the options I'm aware of...
>
>
> Response object vs. triple:
>
> We could tally the votes on the last thread, but it seemed like the
> object fans won by a landslide. I believe Tom is changing jack to
> return objects today -- are there any final objections?

+1, it was the overwhelming majority, and I'm starting to think it's a
good change.

I'm working on converting Jack now. I'll update the spec on jackjs.org
too.

Jack users: would you like me to include the temporary shims in all
the middleware/handlers that auto-converts array responses to objects
and warns?

Pros: your apps will still work without modification. jack will
provide warnings where applicable.
Cons: litters the code with unnecessary cruft. slight performance
impact (1 function call + "if(Array.isArray())" per middleware/handler
per request)

> PATH_INFO *MUST NOT* be decoded:
>
> Some time ago I'd proposed that PATH_INFO MUST NOT be decoded [1].
> This one isn't just a shed color -- it's burned wsgi devs and will
> be changed in wsgi 2.0. Not being able to tell the difference
> between a raw and encoded slash could be very limiting for some
> applications.

+1

> Required jsgi.* keys should not use the dot separator:
>
> I'd also proposed changing the jsgi keys to avoid dot notation [2].
> I gave some valid reasons in the thread, but I recognize it's not a
> huge deal, just a mild inconvenience and sets a bad precedent for
> library authors. It doesn't make as much sense in jsgi like it did
> originally as put forth in wsgi. Here are three options:
>
>
> (1) jsgi.input (no change)
> (2) jsgi_input (to match CGI_KEY form)
> (3) jsgiInput (more javascripty)
>
> Any other proposals?

The "." is meant to delimit the "namespace", so if anything I think we
should use _ rather than mixedCase. #2 gets my vote.

> Response header case:
>
> Should we spec a consistent case to avoid setting duplicate keys
> with differing cases?
>
>
> (A) no change
> (B) lower_case
> (C) UPPER_CASE
> (D) Mixed_Case (perhaps based on some authority list for edge cases
> like ETag)
> (E) require and specify a HashP implementation
>
> 2E for me.

+1

> Other periphery issues:
>
> Should it be specified whether an env can be submitted to an app
> multiple times in one request context (e.g. like jack's cascade
> middleware). This seems to break in persevere's jsgi implementation,
> but it's not spec'd so I couldn't say if it's a bug.

Yeah, probably a good idea to specify that it should be allowed. But
does that include rewinding input...?

> Should it be specified whether the input stream MUST be rewindable?
> Rack specifies this, but after mulling it over it seems like
> unnecessary overhead. If you want to peek at the input stream you
> could always accumulate it into a ByteString and replace jsgi.input.

I'm not sure about this one. It might be required for reusing the env
(see above), but for large requests it could be inefficient (I believe
Rack provides a rewindable input wrapper which buffers the request on
disk... which could be a problem on platforms where you don't have
disk access, like AppEngine)

-tom

George Moschovitis

unread,
Sep 3, 2009, 1:52:27 AM9/3/09
to CommonJS
> Required jsgi.* keys should not use the dot separator:
>
> I'd also proposed changing the jsgi keys to avoid dot notation [2]. I gave
> some valid reasons in the thread, but I recognize it's not a huge deal, just
> a mild inconvenience and sets a bad precedent for library authors. It
> doesn't make as much sense in jsgi like it did originally as put forth in
> wsgi. Here are three options:
>
> (1) jsgi.input (no change)
> (2) jsgi_input (to match CGI_KEY form)
> (3) jsgiInput (more javascripty)

+1 but I cannot decide between 2 and 3 :(

It's not a big deal to me though...

-g.

George Moschovitis

unread,
Sep 3, 2009, 1:54:20 AM9/3/09
to CommonJS
> Jack users: would you like me to include the temporary shims in all  
> the middleware/handlers that auto-converts array responses to objects  
> and warns?
>
>         Pros: your apps will still work without modification. jack will  
> provide warnings where applicable.
>         Cons: litters the code with unnecessary cruft. slight performance  
> impact (1 function call + "if(Array.isArray())" per middleware/handler  
> per request)

no shims please!

> The "." is meant to delimit the "namespace", so if anything I think we  
> should use _ rather than mixedCase. #2 gets my vote.

+1

Mike Wilson

unread,
Sep 3, 2009, 5:57:52 AM9/3/09
to comm...@googlegroups.com
Dean Landolt wrote:

There's been a lot of talk about jsgi spec issues (perhaps too much), so in an interest in wrapping it all up and making it easier on jsgi server maintainers, here is a list of the issues that have been brought up and the options I'm aware of...
[...] 
 
Required jsgi.* keys should not use the dot separator:

I'd also proposed changing the jsgi keys to avoid dot notation [2]. I gave some valid reasons in the thread, but I recognize it's not a huge deal, just a mild inconvenience and sets a bad precedent for library authors. It doesn't make as much sense in jsgi like it did originally as put forth in wsgi. Here are three options:


(1) jsgi.input (no change)
(2) jsgi_input (to match CGI_KEY form)
(3) jsgiInput (more javascripty)

Any other proposals? 
Following on to the dicussion about the "B" (Java/JavaScript descriptive-style) alternative for API naming, I could envision something less coupled to CGI and more inspired by the way Java's servlet API does things, ie promoting the most important variables as individual properties. Of course, with JavaScript you can argue that object (map) members are always properties, but I'm thinking in a more logical sense - ie, if you access with ["propname"] or .propname.
 
Using this scheme, instead of:
 
  function(env) {
    env["REQUEST_METHOD"]
    env["PATH_INFO"]
    env["QUERY_STRING"]
    env["HTTP_CONTENT_TYPE"]
    env["HTTP_CONTENT_LENGTH"]
    env["HTTP_<*>"]
      ...
    env["jsgi.input"]
  }
 
we would get something like:
 
  function(req) {
    req.method
    req.pathInfo
    req.queryString
    req.contentType
    req.contentLength
    req.headers{} // actual literal HTTP headers
      req.headers["Content-Type"]
      req.headers["Content-Length"]
      req.headers["Other-Header-Not-Exposed-As-Property"]
      ...
    req.inputStream
  }
 
That said, if so much descriptive style is not desired I'd put my vote on using xxx_xxx or XXX_XXX for anything that is meant to be interpreted as a "variable in a map". If it is desired to indicate namespacing I'd rather do that with a real namespacing object:
 
  env["jsgi"]["input"]
 
so that dot notation may be used in a natural way:
 
  env.jsgi.input
 
Best regards
Mike Wilson

Hannes Wallnoefer

unread,
Sep 3, 2009, 6:17:58 AM9/3/09
to comm...@googlegroups.com
2009/9/3 Mike Wilson <mik...@hotmail.com>:
I would welcome this change. It would likely induce frameworks to
extend the env/req object instead of completely replacing it, which I
argue is a good thing (less resources wasted in replicating
properties, more common ground).

> That said, if so much descriptive style is not desired I'd put my vote on
> using xxx_xxx or XXX_XXX for anything that is meant to be interpreted as a
> "variable in a map". If it is desired to indicate namespacing I'd rather do
> that with a real namespacing object:
>
>   env["jsgi"]["input"]
>
> so that dot notation may be used in a natural way:
>
>   env.jsgi.input

I think the difference between the flat

env.jsgi_input

and the nested

env.jsgi.input

is mostly cosmetical. I prefer both to the pseudo-nested status quo.

Hannes

> Best regards
> Mike Wilson
> >
>

Wes Garland

unread,
Sep 3, 2009, 9:33:16 AM9/3/09
to comm...@googlegroups.com
Mike:

I don't know Java's servlet API at all, but your suggestion also strongly mirrors the "environment" available when writing modules for Apache. (Apache can modules interleave in a way similar to JSGI, BTW).

 I find this style of data organization much more logical than the CGI-inspired mess that PHP uses, which is a lot like what I've seen in JSGI thus far.
 
  function(req) {
    req.method
    req.pathInfo
    req.queryString
    req.contentType
    req.contentLength
    req.headers{} // actual literal HTTP headers
      req.headers["Content-Type"]
      req.headers["Content-Length"]
      req.headers["Other-Header-Not-Exposed-As-Property"]
      ...
    req.inputStream
  }

^^^^ how do I vote for this? :)

Although, we don't want /literal/ HTTP headers: we want case-corrected HTTP headers. Subtle detail, important though.
 
Speaking of Apache modules, one lesson from those guys -- it's important to have something ilke a "notes field" to jot down ad-hoc notes for sharing between pieces of "friend"ly middleware.  Apache uses an apr_table_t for this, which is extremely similar to a plain JavaScript object.  Apache puts it in the request object, which is passed from module to module during the processing chain.

Wes


--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Dean Landolt

unread,
Sep 3, 2009, 10:20:33 AM9/3/09
to comm...@googlegroups.com
On Thu, Sep 3, 2009 at 9:33 AM, Wes Garland <w...@page.ca> wrote:
Mike:

I don't know Java's servlet API at all, but your suggestion also strongly mirrors the "environment" available when writing modules for Apache. (Apache can modules interleave in a way similar to JSGI, BTW).

 I find this style of data organization much more logical than the CGI-inspired mess that PHP uses, which is a lot like what I've seen in JSGI thus far.
 
  function(req) {
    req.method
    req.pathInfo
    req.queryString
    req.contentType
    req.contentLength
    req.headers{} // actual literal HTTP headers
      req.headers["Content-Type"]
      req.headers["Content-Length"]
      req.headers["Other-Header-Not-Exposed-As-Property"]
      ...
    req.inputStream
  }

^^^^ how do I vote for this? :)

While I like this a little better, my fear is that once we start going down this road we'll either end up bike shedding or end up with jack.request. In cases where having a nicely formatted request is helpful middleware could require jack (or some other library w/ a request implementation) to get this (plus some other niceties). If you tack it onto the env other middleware requiring a request obj could also take advantage of it.
 

Although, we don't want /literal/ HTTP headers: we want case-corrected HTTP headers. Subtle detail, important though.
 
Speaking of Apache modules, one lesson from those guys -- it's important to have something ilke a "notes field" to jot down ad-hoc notes for sharing between pieces of "friend"ly middleware.  Apache uses an apr_table_t for this, which is extremely similar to a plain JavaScript object.  Apache puts it in the request object, which is passed from module to module during the processing chain.

What stops friendly middleware from just adding anything they want to the env? If you have, for instance, authentication middleware and authorization middleware that needs info from the authentication, is there anything stopping the authentication middleware from just adding whatever fields (or objects) it wants to the env?

Dean Landolt

unread,
Sep 3, 2009, 10:27:11 AM9/3/09
to comm...@googlegroups.com

I hadn't considered the nesting approach (and aesthetically I prefer it), but I with Hannes that there's not much of a difference -- I'd go further and say the old saw "flat is better than nested" is more than cosmetic: you can easily loop through a flat request environ -- things get hairier if we start nesting objects.

Kevin Dangoor

unread,
Sep 3, 2009, 10:35:25 AM9/3/09
to comm...@googlegroups.com
On Thu, Sep 3, 2009 at 10:20 AM, Dean Landolt <de...@deanlandolt.com> wrote:

While I like this a little better, my fear is that once we start going down this road we'll either end up bike shedding or end up with jack.request. In cases where having a nicely formatted request is helpful middleware could require jack (or some other library w/ a request implementation) to get this (plus some other niceties). If you tack it onto the env other middleware requiring a request obj could also take advantage of it.

Actually, for a little bit of possibly useful history: in Python, I would guess that a majority of the web frameworks other than Django are using the request object from WebOb (which is really just request/response objects on top of WSGI).

On the one hand, I agree with the notion of using clearer names, rather than sticking with historical names for the sake of historical names.

On the other, if JSGI uses the common names and existing definitions of those names, it seems likely that web server adapters will come along more quickly, built atop the code written for Rack and WSGI.

Kevin

--
Kevin Dangoor

work: http://labs.mozilla.com/
email: k...@blazingthings.com
blog: http://www.BlueSkyOnMars.com

Wes Garland

unread,
Sep 3, 2009, 11:24:16 AM9/3/09
to comm...@googlegroups.com
> What stops friendly middleware from just adding anything they want to the env?
> If you have, for instance, authentication middleware and authorization middleware that needs info from the authentication,
> is there anything stopping the authentication middleware from just adding whatever fields (or objects) it wants to the env?

Nothing at all. This is in fact very useful in many cases where middleware is a "helper" of some description, like, say, mod_gzip.

If you're talking about the "notes" idea, having an object full of notes is saner than just spewing them at the request object; it's your basic namespace problem.

Wes

On Thu, Sep 3, 2009 at 10:20 AM, Dean Landolt <de...@deanlandolt.com> wrote:

Dean Landolt

unread,
Sep 3, 2009, 11:41:24 AM9/3/09
to comm...@googlegroups.com
On Thu, Sep 3, 2009 at 11:24 AM, Wes Garland <w...@page.ca> wrote:
> What stops friendly middleware from just adding anything they want to the env?
> If you have, for instance, authentication middleware and authorization middleware that needs info from the authentication,
> is there anything stopping the authentication middleware from just adding whatever fields (or objects) it wants to the env?

Nothing at all. This is in fact very useful in many cases where middleware is a "helper" of some description, like, say, mod_gzip.

If you're talking about the "notes" idea, having an object full of notes is saner than just spewing them at the request object; it's your basic namespace problem.


If I understand correctly you're proposing an env.notes object (by whatever name) to capture vars that may have relevance up and down the middleware chain? I agree we don't want to vomit randomly-named variables all over the env, but I think this can best be accomplished by convention. Libraries that need to tuck variables into the env should use a common key prefix (or key, if you prefer nesting in the env). WSGI's a good example -- library authors prefix their packages with "<library>.*" (or deeper for things like the repoze project: "repoze.who.*").

It doesn't make much of a difference in JSGI how keys are accessed: such as env.repoze_what_credentials, env.repoze.what.credentials or as it is now: env['repoze.what.credentials'] -- in all cases different libraries can, by convention, carve out a reasonable namespace in the env. As far as I can tell, adding a notes key just pushes this possible key contention down a level so it becomes e.g. env.notes.repoze_what_credentials. As long as people prefix by some convention (preferably that which is laid out by the JSGI spec with the jsgi.* vars) there's no problem.

Isaac Z. Schlueter

unread,
Sep 3, 2009, 1:47:13 PM9/3/09
to CommonJS
-1 to env.notes. If friends want to pass notes, they can do it on the
env object. Convention will protect us here.

-1 to temporary shims. Jack has the change on a branch right now, and
I think the correct approach is to give Jack users a period of time to
update their code, and then merge the change into the master branch.
Shims hide errors, add overhead, and set a bad precedent. Someday,
Jack 3.2 may need to make some bloated accommodation so as to not
break apps built on Jack 3.0, but at this stage, it's just not a good
idea. JSGI is young enough that I think everyone knows they're
building against a moving target.

+0.2 to env.jsgi.etc. Namespacing in Javascript can be done more
easily with nested objects. While it makes it just a tiny bit more
trivial to iterate over env, it makes it easier to iterate over
env.jsgi (no key.substr(0,5)=="jsgi_" guard for when you just want
those.) Iterating over the env is an edge case anyhow, and I don't
think it's a big deal.


On Sep 3, 2:57 am, "Mike Wilson" <mike...@hotmail.com> wrote:
> Following on to the dicussion about the "B" (Java/JavaScript
> descriptive-style) alternative for API naming, I could envision something
> less coupled to CGI and more inspired by the way Java's servlet API does
> things, ie promoting the most important variables as individual properties.

-1 to this. It's a good idea, but doesn't belong in the JSGI spec.
Turning env.REQUEST_METHOD into req.method is a job for helper classes
and/or middleware, and you can do this now. Otherwise it feels a
little too magical for my liking.

function (env) {
var req = MyRequestHelper(env);
req.method, req.queryString, req.pathInfo, etc.
}


Wow, -2.8. Feeling contrary today, I guess.

--i

Wes Garland

unread,
Sep 3, 2009, 2:03:12 PM9/3/09
to comm...@googlegroups.com
Hi, Dean!

You understand my proposal almost correctly. What you're missing is that I was trying to mis-apply a pattern because I forgot that JSGI passes around env between pieces of middleware and not request.  Passing around request is what Apache does, and I believe JSGI's env passing is the right solution.

'env.notes' is of course silly. I think the only thing that needs to be "done" is a gentle style note somewhere in the great ether reminding folks to namespace their notes on the env object.

> As long as people prefix by some convention (preferably that which is laid out by the JSGI spec with the jsgi.* vars) there's no problem.

We're in complete agreement here.  Thanks for the clairifed thinking.

Wes

Dean Landolt

unread,
Sep 3, 2009, 2:15:45 PM9/3/09
to comm...@googlegroups.com

+0.2 to env.jsgi.etc.  Namespacing in Javascript can be done more
easily with nested objects.  While it makes it just a tiny bit more
trivial to iterate over env, it makes it easier to iterate over
env.jsgi (no key.substr(0,5)=="jsgi_" guard for when you just want
those.)  Iterating over the env is an edge case anyhow, and I don't
think it's a big deal.

+0.2? Ha -- you must really kinda sorta like it. I buy your argument that looping over the whole env is an edge case (and easily accomplished with a helper). Plus being able to test for the fingerprints of an upstream middleware (has jack.auth touched the env yet?) would be a lot nicer this way.

+0.8 to nested env keys for me.

Mike Wilson

unread,
Sep 3, 2009, 5:56:33 PM9/3/09
to comm...@googlegroups.com
[several responses inline]
 
 
Wes Garland wrote:

I don't know Java's servlet API at all, but your suggestion also strongly mirrors the "environment" available when writing modules for Apache. (Apache can modules interleave in a way similar to JSGI, BTW). 
Yes, I think the difference is that Apache (and Java Servlets) regard their request object as just that, a structured request object partitioned into different parts that is made for actually being consumed by other code.
 
The CGI API is made for creating a new process for every request, where responses are generated by a separate executable, and thus has to resort to transferring request information through operating system environment variables.
Thus the flat list of environment variables that is made for being pushed through the narrow channel of a process environment, and not for being consumed by code.
 
Excuse the rant, but I think the CGI API *stinks* from top to bottom, all from the catastrophical scalability down to the mangling of names ("Content-Type" -> "HTTP_CONTENT_TYPE") and mixing of vastly different information inside one flat list of strings. I'm always skeptic when I notice that a platform uses the CGI scheme for web access.
FastCGI improves the scalability bit, but I still think an official JavaScript API should design something sane and nice which is not customized for a broken technology from the mid '90s. It should be possible to design a modern API that itself can easily be implemented on top of [Fast]CGI, if desired. The key is that this API should be translatable to CGI but not identical to it. I'd rather optimize the API for Apache's and others native modules (that are not at all like CGI).
Although, we don't want /literal/ HTTP headers: we want case-corrected HTTP headers. Subtle detail, important though. 
Could you explain what you mean with case-corrected headers? (I was referring to the raw header names as used on the wire, in contrast to the CGI-mangled names.)
Speaking of Apache modules, one lesson from those guys -- it's important to have something ilke a "notes field" to jot down ad-hoc notes for sharing between pieces of "friend"ly middleware.  Apache uses an apr_table_t for this, which is extremely similar to a plain JavaScript object.  Apache puts it in the request object, which is passed from module to module during the processing chain. 
In Java speak this would be filters sending along request attributes down the chain.
 
 
Dean Landolt wrote:

On Thu, Sep 3, 2009 at 9:33 AM, Wes Garland <w...@page.ca> wrote:
 
  function(req) {
    req.method
    req.pathInfo
    req.queryString
    req.contentType
    req.contentLength
    req.headers{} // actual literal HTTP headers
      req.headers["Content-Type"]
      req.headers["Content-Length"]
      req.headers["Other-Header-Not-Exposed-As-Property"]
      ...
    req.inputStream
  }

^^^^ how do I vote for this? :)
While I like this a little better, my fear is that once we start going down this road we'll either end up bike shedding or end up with jack.request. In cases where having a nicely formatted request is helpful middleware could require jack (or some other library w/ a request implementation) to get this (plus some other niceties). If you tack it onto the env other middleware requiring a request obj could also take advantage of it.
Yes, you are most correct that the Java Servlets API is actually both the web server interface and "request middleware", there's no separation there and they chose to standardize on that level. F ex, handling chunked responses and reading x-www-form-urlencoded POSTs is part of the standard.
 
Still, the prototypical Java-inspired API above, is actually much more similar to f ex Apache's native request format than CGI is (see below).
 
 
Isaac Z. Schlueter wrote:
>
> On Sep 3, 2:57 am, "Mike Wilson" <mike...@hotmail.com> wrote:
> > Following on to the dicussion about the "B" (Java/JavaScript
> > descriptive-style) alternative for API naming, I could
> > envision something
> > less coupled to CGI and more inspired by the way Java's
> servlet API does
> > things, ie promoting the most important variables as
> individual properties.
>
> -1 to this. It's a good idea, but doesn't belong in the JSGI spec.
> Turning env.REQUEST_METHOD into req.method is a job for helper classes
> and/or middleware, and you can do this now. Otherwise it feels a
> little too magical for my liking.
 
On the contrary I think env.REQUEST_METHOD is the magical bit ;-).
 
But it probably depends on your background; if you're used to CGI implementations then I guess REQUEST_METHOD and HTTP_-prefixed variables all make sense. But if you're used to (like me) looking at the HTTP specs and how web servers' native APIs are designed (no "REQUEST_METHOD" in any of them) then the CGI stuff just adds an extra level that detours from the real purpose.
I'd say that a natural low-level model is something like this:
 
request:
    // request line
    method
    uri
    httpVersion
    // headers
    headers{} // raw, no CGI mangling
    // something stream-like for reading body
    input
 
response:
    // different principles, discussed below
 
connection:
    // http info
    keepalive status, etc...
    // tcp info
    ip addresses, ports, etc...
    // link info
    mac addresses, etc...
 
server:
    url mapping info, base path, etc...
 
This f ex maps very well to the API you use when implementing an Apache Content Generator (native module) with most of the above info tucked inside a structured data object called "request".
 
Note that the API above is also quite similar to the Java API, sans "promoted" properties. It's because it is very close to the HTTP model.
I'll point out again how good it is to avoid name mangling; let's say I'm going to implement some HTTP cache handling and read up on the HTTP spec how to do some If-Modified-Since detection.
With the CGI API I would have to remember "oh darn, I can't use the header names from the HTTP spec, as I'm on JSGI which inherits from CGI", and have to convert the real header name to "HTTP_IF_MODIFIED_SINCE". Plus there will be code in the CGI API that does the reverse conversion. Two wrongs make a right, right? :-)
 
 
Instead, I would like to propose the following:
 
1) Specify a nice and modern low-level web server interface that maps HTTP, not CGI. This is what web servers do in their native models and it would look something like the listing above. To be clear, here I'm talking about strictly projecting HTTP and not mixing in any middleware features.
 
2) If there are people that want to use the CGI model in their applications, specify an optional shim looking like CGI (funky string conversions) that calls through (1).
 
3) For web server integrations that want to connect through that server's CGI model, or through FastCGI, create an implementation of (1) that calls to CGI.
 
 
Lastly, I'd just like to check a few things with you that are more knowledgeable in current JSGI:
 
Filters and attributes:
Regarding the discussion about sending along additional data between middleware layers, I wonder if JSGI will ever be used to implement filters in the web server's sense?
I guess not, and that would mean that passing extra data does not concern the web server API at all, but should be deferred to how middleware layers communicate with, or delegate to, each other.
F ex, will there be a standard filter chain handling in CommonJS, or is that deferred to each "request middleware" to define?
 
HTTP responses:
There are several ways of how to design how you write data to the HTTP response. The way I interpret current JSGI is a passive return where the framework later "pulls" the data out from the supplied data structure.
An Apache Content Generator on the other hand uses a "push" model, ie the data is explicitly written to the request such as in:
    ap_rputs("<html>...", req); // write string
    ap_send_fd(file, req); // write contents of file
Java Servlets also uses "push" by writing to an output stream, but separate request and response into different objects, so you f ex have distinct header collections.
Has there been any deep thinking about the "pull" vs "push" model?
 
Comet:
Web servers of all kinds need to adapt to a model where massive number of connections are kept open simultaneously for asynchronous delivery of data ("Comet"). There's been a couple of different server solutions for a few years in the Java space, and now things seem to standardize on "asynchronous servlets". Last time I looked, Apache had its own problems here as the default threaded MPM consumes a thread per open connection, causing scalability problems, and the Event MPM needed to solve that problem hadn't the full feature set of the standard MPM.
I haven't read up on these subjects lately, but have any of you, and verified that the JSGI request/response design will work nicely with the upcoming asynchronous offerings?
 
 
Best regards
Mike Wilson

Daniel Friesen

unread,
Sep 3, 2009, 7:27:14 PM9/3/09
to comm...@googlegroups.com
Mike Wilson wrote:
> ...
> /HTTP,/ not CGI. This is what web servers do in their native models
> and it would look something like the listing above. To be clear, here
> I'm talking about strictly projecting HTTP and not mixing in any
> middleware features.
>
> 2) If there are people that want to use the CGI model in their
> applications, specify an optional shim looking like CGI (funky string
> conversions) that calls through (1).
>
> 3) For web server integrations that want to connect through that
> server's CGI model, or through FastCGI, create an implementation of
> (1) that calls to CGI.
It's not my primary purpose for responding, so I'll keep it short. But I
to would prefer something tied properly to HTTP rather than the CGI
model. Why must strings me mangled when they don't need to be? It's not
like we program using the CGI model, Request objects are just going to
add yet another step where we're unnecessarily converting in between
http and cgi formats.

Under the current model: (! indicates that one of the sides forces that
model)
Using CGI to communicate (ie: a CGI/FastCGI application)
Client <!http> Websever <cgi!> CGI Handler <!cgi!> JSGI <!cgi> Request
object <http!> App
Using a server handler, ie: an apache module where you note the http
model is used:
Client <!http> Webserver <!http> mod_* Handler <cgi!> JSGI <!cgi>
Request object <http!> App
Using a http server internal to the app:
Client <!http> Appserver+Handler <cgi!> JSGI <!cgi> Request object
<http!> App

Using a http based model instead:
Client <!http> Webserver <cgi!> CGI Handler <!cgi> JSGI <!http> Request
object <http!> App
Using a server handler, ie: an apache module where you note the http
model is used:
Client <!http> Webserver <!http> mod_* Handler <http!> JSGI <!http>
Request object <http!> App
Using a http server internal to the app:
Client <!http> Appserver+Handler <http!> JSGI <!http> Request object
<http!> App

The only case where even touching CGI is necessary is when you are
actually using CGI. When using the CGI model in JSGI and nice Request
objects in apps we end up converting back and forth between http and cgi
in cases where we never actually need to make the conversion in the
first place.

>
> Lastly, I'd just like to check a few things with you that are more
> knowledgeable in current JSGI:
>
> Filters and attributes:
> Regarding the discussion about sending along additional data between
> middleware layers, I wonder if JSGI will ever be used to implement

> filters in the /web server's /sense?


> I guess not, and that would mean that passing extra data does not
> concern the web server API at all, but should be deferred to how
> middleware layers communicate with, or delegate to, each other.
> F ex, will there be a standard filter chain handling in CommonJS, or
> is that deferred to each "request middleware" to define?
>
> HTTP responses:
> There are several ways of how to design how you write data to the HTTP
> response. The way I interpret current JSGI is a passive return where
> the framework later "pulls" the data out from the supplied data structure.
> An Apache Content Generator on the other hand uses a "push" model, ie
> the data is explicitly written to the request such as in:
> ap_rputs("<html>...", req); // write string
> ap_send_fd(file, req); // write contents of file
> Java Servlets also uses "push" by writing to an output stream, but
> separate request and response into different objects, so you f ex have
> distinct header collections.
> Has there been any deep thinking about the "pull" vs "push" model?
>
> Comet:
> Web servers of all kinds need to adapt to a model where massive number
> of connections are kept open simultaneously for asynchronous delivery
> of data ("Comet"). There's been a couple of different server solutions
> for a few years in the Java space, and now things seem to standardize
> on "asynchronous servlets". Last time I looked, Apache had its own

> problems here as the default threaded MPM consumes a thread per /open
> connection/, causing scalability problems, and the Event MPM needed to

> solve that problem hadn't the full feature set of the standard MPM.
> I haven't read up on these subjects lately, but have any of you, and
> verified that the JSGI request/response design will work nicely with
> the upcoming asynchronous offerings?
>
>
> Best regards
> Mike Wilson

I believe this is relevant to one of my comments elsewhere:
http://groups.google.com/group/commonjs/msg/18b3369466d1929b
http://groups.google.com/group/commonjs/msg/a473b49eaf52124a

Trying to figure out how to explain it I ended up talking about
blocking. Push vs. pull was probably what I was trying to talk about.
I don't see how the pull model can work with async code.
From what I can understand, you can simulate a pull model using a push
model, but you can't go the other way. At least forEach wise. Using
forEach appears to completely lock us into blocking http requests where
one http request must finish before another http request can be handled.
To handle multiple requests the jsgi handler itself would have to start
up a separate thread for each and every request in order to be able to
stream data out without waiting. However using a nonblocking stream
based model rather than forEach I believe it's possible to sequentially
handle requests that are quick to output headers but stream data slowly
by calling the app for a request, putting the stream in a queue, and
asynchronously going through that queue and the thing responsible for
accepting requests to ask if there are any new requests or chunks of
data and handling them then moving on to another request or chunk
asynchronously.

In the end, inside MonkeyScript I'm probably going to support a push
based model, and support the forEach pull based model around that. (ie:
You can stick with the JSGI forEach model, but are also free to use a
push based Stream if you feel necessary) as I don't want to stop people
from writing nice async apps without an excess of threads using newer
technologies (And I hope that when I write a WebSocketServer I can use
as close a design as used for web requests)... Whether this is a
non-standard feature I'm adding, or is the standard is up to the group.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Tom Robinson

unread,
Sep 3, 2009, 8:22:24 PM9/3/09
to comm...@googlegroups.com
<snip>

JSGI uses the same *well-established* names for the HTTP request
properties as CGI, FastCGI, SCGI, PHP, WSGI, Rack, probably others.

The alternative is to invent yet another set of names that developers
will need to learn. IMO the place for that is in your frameworks, not
in the low level web server interface.

Also, the purpose of standardizing the request header names to
uppercase (aside from matching CGI) is to make lookups efficient,
otherwise you'd need to do a case-insensitive string comparison on
*every* header name.


-tom

Dean Landolt

unread,
Sep 3, 2009, 8:37:42 PM9/3/09
to comm...@googlegroups.com

I think Mike's gripe was less about case and more about the hyphen-to-underscore translation (and perhaps the "HTTP_" prefix). But as you say, *well-established* should hold some sway. The only well-established alternatives to CGI (like Apache's or Java servlet objects) would likely offer an impedance mismatch to js, so we'd be back to rolling our own -- certainly possible, but it would be a slow and painful process.

Isaac Z. Schlueter

unread,
Sep 3, 2009, 8:52:38 PM9/3/09
to CommonJS

On Sep 3, 5:22 pm, Tom Robinson <tlrobin...@gmail.com> wrote:
> JSGI uses the same *well-established* names for the HTTP request  
> properties as CGI, FastCGI, SCGI, PHP, WSGI, Rack, probably others.
>
> The alternative is to invent yet another set of names that developers  
> will need to learn. IMO the place for that is in your frameworks, not  
> in the low level web server interface.

Yes. You are correct. UPPER_CASE hurts my eyes, too, but it's the
standard, whether it makes sense or not. Changing it is not a good
idea.

I think JSGI has been stretched enough recently. Let's let it breathe
for a little while.

--i

Mike Wilson

unread,
Sep 4, 2009, 7:09:20 AM9/4/09
to comm...@googlegroups.com
Daniel Friesen wrote:
> The only case where even touching CGI is necessary is
> when you are actually using CGI. When using the CGI
> model in JSGI and nice Request objects in apps we end
> up converting back and forth between http and cgi in
> cases where we never actually need to make the
> conversion in the first place.

Yes, this is one of the points I am making. Taking the
detour over CGI is unnecessary.

Yes, I already read those and it's from there I started
suspecting that the current JSGI design maybe needs
improving for these cases. So consider my post a +1 for
your comments.

> I don't see how the pull model can work with async code.
> From what I can understand, you can simulate a pull model
> using a push model, but you can't go the other way. At
> least forEach wise. Using forEach appears to completely
> lock us into blocking http requests where one http request
> must finish before another http request can be handled.

Yes, the current model seems to lock the web server into
reserving the current thread for this request until it
closes the response. This is the model that everyone is
trying to move away from due to the scalability problems.

> To handle multiple requests the jsgi handler itself would
> have to start up a separate thread for each and every
> request

Yes, and I think the whole threading/multi-tasking thing
needs to be settled before the HTTP API can be reliably
designed. How should a JSGI provider handle concurrent
requests, and how could information pass between requests
and potential "worker threads"?

IMO, there is too much of CGI-thinking in script programming
communities - ie, "fire up a new request process and run like
hell with no delegation or coordination with the state of
the rest of the system, and hope for the best".

> In the end, inside MonkeyScript I'm probably going to
> support a push based model, and support the forEach pull
> based model around that. (ie: You can stick with the JSGI
> forEach model, but are also free to use a push based
> Stream if you feel necessary) as I don't want to stop
> people from writing nice async apps without an excess of
> threads using newer technologies (And I hope that when I
> write a WebSocketServer I can use as close a design as
> used for web requests)... Whether this is a non-standard
> feature I'm adding, or is the standard is up to the group.

Whichever way this comes out I think the current JSGI API
will fail in several areas. I think sooner or later there
will be enough momentum to specify a better API, but how
soon that happens depends on what apps that want to be
based off CommonJS, I think.

Best regards
Mike

Mike Wilson

unread,
Sep 4, 2009, 9:29:35 AM9/4/09
to comm...@googlegroups.com

Tom Robinson wrote:
> JSGI uses the same *well-established* names for the
> HTTP request properties as CGI, FastCGI, SCGI, PHP,
> WSGI, Rack, probably others.

Well-established within CGI that is, as all your examples are CGI-based. If you are designing a CGI interface, that's fine, but as I understand it you aren't. And these CGI names are not request properties, they are environment variable names, although JS allows you to access them as properties.

I would say that there are *well-established* names for the same things if you look outside the CGI world such as in HTTP specs, Apache module APIs, Java Servlets and .NET's System.Web library. And these are not designed for use as environment variables, but for structured object-based data, a perfect fit for the "B" alternative decided on for File API.

> The alternative is to invent yet another set of
> names that developers will need to learn.

There is a lot of prior art (mentioned above) that actually match the task at hand (HTTP comm) better than CGI does.

> Also, the purpose of standardizing the request
> header names to uppercase (aside from matching
> CGI) is to make lookups efficient, otherwise
> you'd need to do a case-insensitive string
> comparison on *every* header name.

The cost of user code potentially doing a couple of case-insensitive lookups is lower than converting *every* string to uppercase, prefixing with "HTTP_", and adding to a new collection. But really, I think you agree that neither is even close to being a problem performance-wise.


Dean Landolt wrote:
The only well-established alternatives to CGI (like Apache's or Java servlet objects) would likely offer an impedance mismatch to js
How do you come to that conclusion? Apache, Java, and .NET all model their HTTP requests using object/struct based data structures, just like JS. It's CGI that has the impedance mismatch.
 
 
Isaac Z. Schlueter wrote:
> I think JSGI has been stretched enough recently.  Let's let it breathe
> for a little while.
Speaking of which, I am a bit surprised how easily everyone accepted a non-CGI construct for the response:
 
    { status:..., headers:..., body:... }
 
but at the same time thinks that having a symmetric construct for the request is considered "magical", "non-standard" and what-not:
 
    { method:..., url:..., headers:..., body:... }
 
 
If CGI is preferred I think that CommonJS should state up-front that "this is the CommonJS CGI interface", and maybe name it accordingly, to make room for a real low-level web interface in the future.
 
But what also strikes me is that JSGI, as speced, doesn't adhere to CGI for the response cycle, but instead specs a lot of magic. Compare CGI:
 
    function handler() {
        var path = getenv("PATH_INFO");
        var clientver = getenv("HTTP_IF_MODIFIED_SINCE");
        printf("Content-type: text/html\r\n" +
            "\r\n" +
            "<html>...</html>");
    }
 
with JSGI:
 
    function handler(env) {
        var path = env["PATH_INFO"];
        var clientver = env["HTTP_IF_MODIFIED_SINCE"];
        return {
            status: 200,
            headers: {
                "Content-Type": "text/html"
            },
            body: [
                {
                    toBinaryString: function() {
                        return "<html>...</html>";
                    }
                }
            ]
        };
    }
 
and see how the response handling differs:
- CGI uses the output stream for everything, no magical properties
- CGI uses a push model, not pull
 
Also, JSGI uses a magical response object with a purpose unknown to me. Is it designed with background tasking in mind? Such as being able to return an open file and having the lower layers copy its contents to the output stream? If it is, I think this needs to be spelled out in the spec as I am sure there will need to be other API surfaces to make things like this possible (ref multi-tasking notes in my previous mail). If not, I can just say that this parts looks quite magical and more of something you would find in higher-level frameworks.
 
So, to sum up, after reviewing JSGI I can see ambitions for providing low-level web server connectivity (mainly spoken ambition), a CGI API (not equal to low-level web server APIs as CGI builds upon those), and some kind of high-level response framework (the return object), while failing at all three.
To address this I think the following should be done:
- rebrand/rename the API so it is clear its purpose is to provide a CGI model
- move the high-level response body stuff to frameworks
- switch to pushing responses, like CGI
- make response status and headers handling adhere more to CGI
 
Best regards
Mike

Hannes Wallnoefer

unread,
Sep 4, 2009, 10:53:30 AM9/4/09
to comm...@googlegroups.com
2009/9/4 Mike Wilson <mik...@hotmail.com>:
> Tom Robinson wrote:
>> JSGI uses the same *well-established* names for the
>> HTTP request properties as CGI, FastCGI, SCGI, PHP,
>> WSGI, Rack, probably others.

Depends on where you come from. I think the last time I worked within
a CGI named environment was in 1996 (and it was in fact Perl + CGI).

I think it's quite easy to explain. Most scripting languages such as
Perl, Python, PHP, Ruby etc started their life on the web behind a CGI
interface, so later interfaces tended to emulate the established one.
In contrast, platforms that started from scratch (Java, Apache)
naturally tended to follow their own platform naming conventions.

The question for us is: Should we carry on a convention for the sake
of continuity (CONTENT_LENGTH), or should we design something that
adheres to the coding conventions of our language (contentLength) like
Java and Apache have done. For me, it's clearly the latter, but I can
live with the old style names if it really must be.

> So, to sum up, after reviewing JSGI I can see ambitions for
> providing low-level web server connectivity (mainly spoken ambition), a CGI
> API (not equal to low-level web server APIs as CGI builds upon those), and
> some kind of high-level response framework (the return object), while
> failing at all three.
> To address this I think the following should be done:
> - rebrand/rename the API so it is clear its purpose is to provide a CGI
> model
> - move the high-level response body stuff to frameworks
> - switch to pushing responses, like CGI
> - make response status and headers handling adhere more to CGI

I think you're throwing out the baby with the bathwater. After all,
it's mostly about naming,

It's true that the response mechanism is quite complex (at least at
first view). But it's also true that it quite smartly exploits a
method that is readily available (Array.forEach) to allow both push
and pull responses.

Of course, the mechanism also has its pitfalls and inefficiencies. For
example, directly returning a ByteString as body will loop over all
bytes and write them out one by one.

Even if you return an array, if you put a ByteArray into it, it will
needlessly be converted to a ByteString before being sent to the
client, and a ByteString will needlessly create a copy of itself
(because that's what the spec says ByteString.toByteString() does).

(Side note: Do we really need the ByteString/ByteArray duality? At
least we should introduce a toBinary() that can either return a
ByteArray or ByteString without requiring a copy...)

Still, I think JSGI is a good starting point as long as we focus on
the actual issues and don't get lost in bikeshedding.

Hannes

> Best regards
> Mike
> >
>

Hannes Wallnoefer

unread,
Sep 4, 2009, 11:11:13 AM9/4/09
to comm...@googlegroups.com
2009/9/4 Hannes Wallnoefer <han...@gmail.com>:

>
> Of course, the mechanism also has its pitfalls and inefficiencies. For
> example, directly returning a ByteString as body will loop over all
> bytes and write them out one by one.

Oops, ByteString doesn't even have a forEach. Still, the problem
described below regarding needless copying still applies.

Wes Garland

unread,
Sep 4, 2009, 1:01:42 PM9/4/09
to comm...@googlegroups.com
The key is that this API should be translatable to CGI but not identical to it. I'd rather optimize the API for Apache's and others native modules (that are not at all like CGI).

You know, an HTTP-oriented JSGI could have a piece of middleware that "does" CGI -- this would help transition certain types of applications.
Although, we don't want /literal/ HTTP headers: we want case-corrected HTTP headers. Subtle detail, important though. 
Could you explain what you mean with case-corrected headers? (I was referring to the raw header names as used on the wire, in contrast to the CGI-mangled names.)

Yes. It's technically allowable for a client to send "content-type" rather than "Content-Type".   Using the literal headers, as on the wire, will make it difficult for middle users to "find" headers reliably. We should case-correct so that "content-type" can only ever be spelled one way. I don't care if it's "Content-Type", "content-type" or "Content-type", so long as its consistent.

In Java speak this would be filters sending along request attributes down the chain.

Apache-2 speak, I think, as well. I've read Nick Kew's lightsaber book, but haven't actually done any Apache-2 development. I have done a LOT of Apache-1.3 development, though. But not in the last ~ 5-7 years.  Will probably be back it soon enough, though; my rough timetable has GPSEE going into Apache-2 in November or so.
 


Note that the API above is also quite similar to the Java API, sans "promoted" properties. It's because it is very close to the HTTP model.

I believe this observation to be awfully critical: JSGI should really be thinking more about HTTP operations than what-CGI-does.  Doubly so because it has a completely different way of returning results!
 
I'll point out again how good it is to avoid name mangling;

Let me take this opportunity to point something interesting out: I have been developing with CGI on a moderately regular (and sometimes very heavy) basis since early 1997. That's more than 12 years.  EVERY TIME I need to look at something coming from the browser -- Accept-Language, Content-Type, whatever -- I have to hit a debug CGI I keep on my server to find out what the heck it's going to come out as.  I don't pretent to be some kind of savant, but you'd think after 12 years even a total moron would be able to keep the environment variables straight if they were well-named.
 
Instead, I would like to propose the following:
 
1) Specify a nice and modern low-level web server interface that maps HTTP, not CGI. This is what web servers do in their native models and it would look something like the listing above. To be clear, here I'm talking about strictly projecting HTTP and not mixing in any middleware features.

+1
 
2) If there are people that want to use the CGI model in their applications, specify an optional shim looking like CGI (funky string conversions) that calls through (1).
 
3) For web server integrations that want to connect through that server's CGI model, or through FastCGI, create an implementation of (1) that calls to CGI.
 

+1 -- I came to both these conclusions, too.
 
Last time I looked, Apache had its own problems here as the default threaded MPM consumes a thread per open connection, causing scalability problems, and the Event MPM needed to solve that problem hadn't the full feature set of the standard MPM.
I haven't read up on these subjects lately, but have any of you, and verified that the JSGI request/response design will work nicely with the upcoming asynchronous offerings?

I honestly haven't given anything beyond Apache threaded MPM much thought. The threaded MPM -- for better or worse -- maps well onto SpiderMonkey, which is my engine of choice.  This is because, very approximately, one JavaScript program per thread is about you can do without jumping through some pretty serious hoops. Of the "on fire" variety.

I'm not even sure what an asynchronous piece of web content delivery middleware would look like, either.  Maybe I suffer from excessive CGI and PHP exposure. They are both a little like heavy metals.  Would certainly enjoy reading some high-level pseudocode.

Wes
 

Wes Garland

unread,
Sep 4, 2009, 1:08:43 PM9/4/09
to comm...@googlegroups.com
Tom:

On Thu, Sep 3, 2009 at 8:22 PM, Tom Robinson <tlrob...@gmail.com> wrote:
JSGI uses the same *well-established* names for the HTTP request
properties as CGI, FastCGI, SCGI, PHP, WSGI, Rack, probably others.

The alternative is to invent yet another set of names that developers
will need to learn.

Using the names established within the HTTP protocol
1. Does not require re-learning  (since they are already used for output -- why not input)
2. Does not require inventing
3. Uses names which are at least as well-established
4. Is predictably forward-compatible with changes and extensions to HTTP

Also, the purpose of standardizing the request header names to
uppercase (aside from matching CGI) is to make lookups efficient,
otherwise you'd need to do a case-insensitive string comparison on
*every* header name.

I'm not sure uppercase is optimal, but certainly, exactly one spelling is required.

Wes

--

Mike Wilson

unread,
Sep 4, 2009, 2:29:49 PM9/4/09
to comm...@googlegroups.com
Hannes Wallnoefer wrote:
> 2009/9/4 Mike Wilson <mik...@hotmail.com>:

> > So, to sum up, after reviewing JSGI I can see ambitions for
> > providing low-level web server connectivity (mainly spoken
> ambition), a CGI
> > API (not equal to low-level web server APIs as CGI builds
> upon those), and
> > some kind of high-level response framework (the return
> object), while
> > failing at all three.
> > To address this I think the following should be done:
> > - rebrand/rename the API so it is clear its purpose is to
> provide a CGI
> > model
> > - move the high-level response body stuff to frameworks
> > - switch to pushing responses, like CGI
> > - make response status and headers handling adhere more to CGI
>
> I think you're throwing out the baby with the bathwater. After all,
> it's mostly about naming,

To me it's not mainly about naming although this is one of the
components. Main things are predicatibility, symmetry,
expressiveness, extendability and modularization/separation.

> It's true that the response mechanism is quite complex (at
> least at first view). But it's also true that it quite
> smartly exploits a method that is readily available
> (Array.forEach) to allow both push and pull responses.

Oh don't get me wrong, this very design may well suit
perfectly on a higher layer, but it has nothing to do
with web server interfaces, and is quite asymmetric
with the CGI-adhering request input. (though I would
like to see how it fares with async/Comet)

> Still, I think JSGI is a good starting point as long
> as we focus on the actual issues and don't get lost
> in bikeshedding.

If the goal is to get something out fast, and to have
CGI as the model, then that's probably fine. I would
prefer to clearly state the CGI goal for the API in
that case.
For an API that can support a wider range of
solutions there will be need for a bike-shedding phase,
I believe (and I regard that as a good thing in this
case).

Best regards
Mike

Tom Robinson

unread,
Sep 4, 2009, 2:46:51 PM9/4/09
to comm...@googlegroups.com

On Sep 4, 2009, at 7:53 AM, Hannes Wallnoefer wrote:

>
> 2009/9/4 Mike Wilson <mik...@hotmail.com>:
>> Tom Robinson wrote:
>>> JSGI uses the same *well-established* names for the
>>> HTTP request properties as CGI, FastCGI, SCGI, PHP,
>>> WSGI, Rack, probably others.
>
> Depends on where you come from. I think the last time I worked within
> a CGI named environment was in 1996 (and it was in fact Perl + CGI).
>
> I think it's quite easy to explain. Most scripting languages such as
> Perl, Python, PHP, Ruby etc started their life on the web behind a CGI
> interface, so later interfaces tended to emulate the established one.
> In contrast, platforms that started from scratch (Java, Apache)
> naturally tended to follow their own platform naming conventions.

FWIW many of the Java servlet request property names are simply camel
case versions of the CGI variables.

Can everyone at least agree that the current "env" contains the
appropriate data, irrespective of their names? For example, the "url"
property Mike proposed is inadequate, often you want to distinguish
between "SCRIPT_NAME" and "PATH_INFO" rather than the entire url
combined.

> The question for us is: Should we carry on a convention for the sake
> of continuity (CONTENT_LENGTH), or should we design something that
> adheres to the coding conventions of our language (contentLength) like
> Java and Apache have done. For me, it's clearly the latter, but I can
> live with the old style names if it really must be.

I'm not entirely opposed to using camel case versions of the CGI names
(and putting headers in their own object), but I do think there should
be an obvious 1-to-1 correspondence with the CGI properties, as there
is in servlets (and WSGI, Rack, etc).

Look at this code from the Jack servlet handler and tell me servlets
weren't also inspired by CGI...

env["SCRIPT_NAME"] = String(request.getServletPath() ||
"");
env["PATH_INFO"] = String(request.getPathInfo() || "");

env["REQUEST_METHOD"] = String(request.getMethod() || "");
env["SERVER_NAME"] = String(request.getServerName() ||
"");
env["SERVER_PORT"] = String(request.getServerPort() ||
"");
env["QUERY_STRING"] = String(request.getQueryString() ||
"");
env["HTTP_VERSION"] = String(request.getProtocol() || "");

env["REMOTE_HOST"] = String(request.getRemoteHost() ||
"");

Mike Wilson

unread,
Sep 4, 2009, 2:50:26 PM9/4/09
to comm...@googlegroups.com
Wes Garland wrote:
Although, we don't want /literal/ HTTP headers: we want case-corrected HTTP headers. Subtle detail, important though. 
Could you explain what you mean with case-corrected headers? (I was referring to the raw header names as used on the wire, in contrast to the CGI-mangled names.)

Yes. It's technically allowable for a client to send "content-type" rather than "Content-Type".   Using the literal headers, as on the wire, will make it difficult for middle users to "find" headers reliably. We should case-correct so that "content-type" can only ever be spelled one way. I don't care if it's "Content-Type", "content-type" or "Content-type", so long as its consistent.  
Ah right, of course.
(An alternative is to provide a get[Header]() method that does a case-insensitive match.)
In Java speak this would be filters sending along request attributes down the chain.

Apache-2 speak, I think, as well. I've read Nick Kew's lightsaber book, but haven't actually done any Apache-2 development. 
Same here!
I'll point out again how good it is to avoid name mangling;

Let me take this opportunity to point something interesting out: I have been developing with CGI on a moderately regular (and sometimes very heavy) basis since early 1997. That's more than 12 years.  EVERY TIME I need to look at something coming from the browser -- Accept-Language, Content-Type, whatever -- I have to hit a debug CGI I keep on my server to find out what the heck it's going to come out as.  I don't pretent to be some kind of savant, but you'd think after 12 years even a total moron would be able to keep the environment variables straight if they were well-named. 
:-) 
Instead, I would like to propose the following:
 
1) Specify a nice and modern low-level web server interface that maps HTTP, not CGI. This is what web servers do in their native models and it would look something like the listing above. To be clear, here I'm talking about strictly projecting HTTP and not mixing in any middleware features.

+1
 
2) If there are people that want to use the CGI model in their applications, specify an optional shim looking like CGI (funky string conversions) that calls through (1).
 
3) For web server integrations that want to connect through that server's CGI model, or through FastCGI, create an implementation of (1) that calls to CGI.
 

+1 -- I came to both these conclusions, too.
 
Last time I looked, Apache had its own problems here as the default threaded MPM consumes a thread per open connection, causing scalability problems, and the Event MPM needed to solve that problem hadn't the full feature set of the standard MPM.
I haven't read up on these subjects lately, but have any of you, and verified that the JSGI request/response design will work nicely with the upcoming asynchronous offerings?

I honestly haven't given anything beyond Apache threaded MPM much thought. The threaded MPM -- for better or worse -- maps well onto SpiderMonkey, which is my engine of choice.  This is because, very approximately, one JavaScript program per thread is about you can do without jumping through some pretty serious hoops. Of the "on fire" variety. 
Yes, it would be nice to know what is possible, difficult, or downright impossible wrt the threading bit. JS on the server, especially web servers, will have to deal with threads sooner or later, and it would be nice if it was possible to build advanced server apps with the CommonJS offerings.
Any Mozilla or ES folks on the list that work on the "concurrent programming" bits? :
Tom:

On Thu, Sep 3, 2009 at 8:22 PM, Tom Robinson <tlrob...@gmail.com> wrote:
JSGI uses the same *well-established* names for the HTTP request
properties as CGI, FastCGI, SCGI, PHP, WSGI, Rack, probably others.

The alternative is to invent yet another set of names that developers
will need to learn.

Using the names established within the HTTP protocol
1. Does not require re-learning  (since they are already used for output -- why not input)
2. Does not require inventing
3. Uses names which are at least as well-established
4. Is predictably forward-compatible with changes and extensions to HTTP 
+1
 
Best regards
Mike

Tom Robinson

unread,
Sep 4, 2009, 2:53:23 PM9/4/09
to comm...@googlegroups.com
On Sep 4, 2009, at 11:29 AM, Mike Wilson wrote:


Hannes Wallnoefer wrote:
2009/9/4 Mike Wilson <mik...@hotmail.com>:

It's true that the response mechanism is quite complex (at
least at first view). But it's also true that it quite
smartly exploits a method that is readily available
(Array.forEach) to allow both push and pull responses.

Oh don't get me wrong, this very design may well suit
perfectly on a higher layer, but it has nothing to do
with web server interfaces, and is quite asymmetric
with the CGI-adhering request input. (though I would
like to see how it fares with async/Comet)

The "push" style APIs don't work well with middleware, which IMO is one of the strengths of the WSGI/Rack/JSGI model.


Still, I think JSGI is a good starting point as long
as we focus on the actual issues and don't get lost
in bikeshedding.

If the goal is to get something out fast, and to have
CGI as the model, then that's probably fine. I would
prefer to clearly state the CGI goal for the API in
that case.

Please stop referring to JSGI as following the "CGI model". Things JSGI has in common with CGI:

1) They are both webserver interfaces
2) They share request property names

That's it.

Mike Wilson

unread,
Sep 4, 2009, 3:25:44 PM9/4/09
to comm...@googlegroups.com
Tom Robinson wrote:
> FWIW many of the Java servlet request property names are
> simply camel case versions of the CGI variables.

Certainly, no-one is debating or denying that. Both the .NET
and Java web APIs have inspiration from CGI (and from each other).

> Can everyone at least agree that the current "env" contains the
> appropriate data, irrespective of their names? For example,
> the "url"
> property Mike proposed is inadequate, often you want to distinguish
> between "SCRIPT_NAME" and "PATH_INFO" rather than the entire url
> combined.

You are referring to this:

request:
// request line
method
uri
httpVersion
// headers
headers{} // raw, no CGI mangling
// something stream-like for reading body
input

and it was just a sketch/example to indicate what a
minimal request structure would look like. See this
annotated HTTP request;

<method> <uri> <httpVersion>
POST /pages/mypage.html HTTP/1.1
^ Host: mysite.com
<headers> Referer: ...
v ...

<body> a=1&b=2...

Personally, I want the pathInfo, queryString members,
etc, as indicated in my initial mail. But as I got
comments that these members belonged in middleware or
frameworks I made this minimal illustration.

Thus, I am not sure there is consensus on the list
on what level the HTTP URI should actually be split
into its parts. I would certainly want to have them
split for me at some level (together with providing
the path my application is "mounted" at etc), but I can
imagine several alternatives on how to partition this
logic over API layers. There is talk about low-level
APIs, middleware, frameworks, but I'm not sure
everybody has the same picture on what that means,
and what roles they have with respect to each other.
(Something that easily happens if this is not defined
is that all problems are solved by adding a few more
keys to the CGI environment as quick-fix solutions,
ending up with a spaghetti system with hard-coded
dependencies to these keys all over the place.)

> Look at this code from the Jack servlet handler and
> tell me servlets weren't also inspired by CGI...
>
> env["SCRIPT_NAME"] =
> String(request.getServletPath() ||
> "");
> env["PATH_INFO"] =
> String(request.getPathInfo() || "");
>
> env["REQUEST_METHOD"] = String(request.getMethod() || "");
> env["SERVER_NAME"] = String(request.getServerName() ||
> "");
> env["SERVER_PORT"] = String(request.getServerPort() ||
> "");
> env["QUERY_STRING"] =
> String(request.getQueryString() ||
> "");
> env["HTTP_VERSION"] =
> String(request.getProtocol() || "");
>
> env["REMOTE_HOST"] = String(request.getRemoteHost() ||
> "");

In my initial mail I provided a similar observation:

[JSGI]


function(env) {
env["REQUEST_METHOD"]
env["PATH_INFO"]
env["QUERY_STRING"]
env["HTTP_CONTENT_TYPE"]
env["HTTP_CONTENT_LENGTH"]
env["HTTP_<*>"]
...
env["jsgi.input"]
}

[Java-inspired API]

function(req) {
req.method
req.pathInfo
req.queryString
req.contentType
req.contentLength
req.headers{} // actual literal HTTP headers
req.headers["Content-Type"]
req.headers["Content-Length"]
req.headers["Other-Header-Not-Exposed-As-Property"]
...
req.inputStream
}

To me, this is a strength in the Java and .NET APIs,
carrying over the (selected) legacy of CGI into a
more modern API structure. Just as I think should be
done in Common JS.

Best regards
Mike

Wes Garland

unread,
Sep 4, 2009, 4:45:23 PM9/4/09
to comm...@googlegroups.com
> Yes, it would be nice to know what is possible, difficult, or downright
> impossible wrt the threading bit.


Threads which share JavaScript objects (via JavaScript) are downright impossible to run reliably on current versions of SpiderMonkey.

Sharing JSON via C strings is totally safe.  Sharing JS Strings between threads in a single runtime is probably safe. Running multiple runtimes in different threads is totally safe.  I'm pretty sure running multiple programs on a single runtime is safe as well.

Multiple runtimes vs. single runtime -- tradeoffs are start time (can pool to mitigate), memory usage (strings are shared in a runtime, memory arenas are per runtime, memory limits are per runtime), garbage collection (GC pauses all threads for a particular runtime).

Daniel Friesen

unread,
Sep 4, 2009, 5:30:57 PM9/4/09
to comm...@googlegroups.com
Wes Garland wrote:
>
> Last time I looked, Apache had its own problems here as the
> default threaded MPM consumes a thread per /open connection/,

> causing scalability problems, and the Event MPM needed to solve
> that problem hadn't the full feature set of the standard MPM.
> I haven't read up on these subjects lately, but have any of you,
> and verified that the JSGI request/response design will work
> nicely with the upcoming asynchronous offerings?
>
>
> I honestly haven't given anything beyond Apache threaded MPM much
> thought. The threaded MPM -- for better or worse -- maps well onto
> SpiderMonkey, which is my engine of choice. This is because, very
> approximately, one JavaScript program per thread is about you can do
> without jumping through some pretty serious hoops. Of the "on fire"
> variety.
>
> I'm not even sure what an asynchronous piece of web content delivery
> middleware would look like, either. Maybe I suffer from excessive CGI
> and PHP exposure. They are both a little like heavy metals. Would
> certainly enjoy reading some high-level pseudocode.
>
> Wes
Does this dummy app using the push model asynchronously work for you?
http://gist.github.com/181131
It basically asynchronously writes the current datetime to the stream 5
times 1 second apart.

Upper one is a model where one would construct a simple stream and hand
it off as a body. Lower model is one where we have a jsgi.output like
the jsgi.input
The difference being implementation and middleware.
The one where you pass a stream, would be implemented by asynchronously
pulling chunks out of the stream while checking for new requests. And
middleware would modify the contents by asynchronously reading chunks
from the stream, and writing them out to another.
The one using a stream directly on env is a little fuzzy in
implementation to me. You could just give it direct access to whatever
output stream there is, but you'd have to find a way to make sure that
headers are written before output. Middleware would modify contents by
replacing jsgi.output in the env they send to the app with some other
type of faux stream or stream interface implementing thing that would
modify then write to the /real/ jsgi.output they were given (even if
that is actually another fake layer given by another piece of middleware)

Here's some fake dummyware (middleware that does nothing but read what
the app wrote, then write it out without modifying it; you'd never
actually do this honestly)
http://gist.github.com/181145

A good async friendly jsgi spec and implementation should be able to
when restricted to a single process, and a single thread, run that
example in 3 separate browsers at the same time without showing a
cascading effect where one browser is waiting for another.

Hannes Wallnoefer

unread,
Sep 4, 2009, 6:05:31 PM9/4/09
to comm...@googlegroups.com
2009/9/4 Tom Robinson <tlrob...@gmail.com>:
>
> On Sep 4, 2009, at 7:53 AM, Hannes Wallnoefer wrote:
>
>> The question for us is: Should we carry on a convention for the sake
>> of continuity (CONTENT_LENGTH), or should we design something that
>> adheres to the coding conventions of our language (contentLength) like
>> Java and Apache have done. For me, it's clearly the latter, but I can
>> live with the old style names if it really must be.
>
> I'm not entirely opposed to using camel case versions of the CGI names
> (and putting headers in their own object), but I do think there should
> be an obvious 1-to-1 correspondence with the CGI properties, as there
> is in servlets (and WSGI, Rack, etc).

That's fine with me! For me, it's all about the naming conventions,
which feel very strange to me in JSGI. I have no issues with the names
themselves.

Hannes

Kris Zyp

unread,
Sep 4, 2009, 6:30:00 PM9/4/09
to comm...@googlegroups.com
This problem is essentially solved with existing mechanisms we have
already proposed in CommonJS. If a server wants to offer support for
scalable long-lived connection handling for Comet style applications, it
can do so by simply supporting promises being returned from a request
handler, just as any other asynchronously enabled function would do.
http://gist.github.com/181131 rewritten with what we already have in
https://wiki.mozilla.org/ServerJS/Promises would look like:

function app(env) {
var promise = new Promise(); /* the proposal actually just defines the
interface, presumably there would be promise implementations that would
make the this easier*/
var finished, progress;
promise.then = function(callbackArg, errorHandler, progressHandler){
finished = callback;
progress = progressHandler;
};
var times = 5;
setTimeout(function t() {
times--;
progress((new Date).toString());
progress("\n");
if ( times <= 0 )
finished("");
else
setTimeout(t, 1000);
}, 1000);
return { status: 200, headers: { 'Content-Type': 'text/plain' }, body:
promise };
}

Note that for the majority of use cases of using asynchronous handling,
we would not need to use promise progress events, just the normal
callback/fulfillment since usually Comet apps use one-shot responses to
long-poll due to the lack of cross-browser support for streaming (IE is
the culprit of course). If a server just supported one-shot asynchronous
responses, it would be more reasonable to have the entire response
object (instead of just the body) be represented by the promise.

Thanks,
Kris

Mike Wilson

unread,
Sep 7, 2009, 10:58:59 AM9/7/09
to comm...@googlegroups.com
Tom Robinson wrote:

On Sep 4, 2009, at 11:29 AM, Mike Wilson wrote:
Hannes Wallnoefer wrote:
2009/9/4 Mike Wilson <mik...@hotmail.com>:

It's true that the response mechanism is quite complex (at
least at first view). But it's also true that it quite
smartly exploits a method that is readily available
(Array.forEach) to allow both push and pull responses.

Oh don't get me wrong, this very design may well suit
perfectly on a higher layer, but it has nothing to do
with web server interfaces, and is quite asymmetric
with the CGI-adhering request input. (though I would
like to see how it fares with async/Comet)
The "push" style APIs don't work well with middleware, which IMO is one of the strengths of the WSGI/Rack/JSGI model. 
I agree with you that the current JSGI model is more friendly when a middleware wants to examine the headers or status of its "child", as it can just look in the appropriate properties of the response. In a "push everything" model (as CGI) it would have to hook itself into the output stream and catch headers while being written.
 
But for middleware processing the body contents I don't see any big difference, as both models will force you to hook into, and pump, something stream-like and inspect bytes as they pass through. A notable difference is also that in JSGI an application needs to provide its own lazy or buffering stream-like object, while push-based APIs provide you with it.
Or am I missing something?
 
ISTM a push-based solution with support for direct access to headers and status of the child would be as middleware-friendly as current JSGI?
Still, I think JSGI is a good starting point as long
as we focus on the actual issues and don't get lost
in bikeshedding.

If the goal is to get something out fast, and to have
CGI as the model, then that's probably fine. I would
prefer to clearly state the CGI goal for the API in
that case.
Please stop referring to JSGI as following the "CGI model". Things JSGI has in common with CGI:

1) They are both webserver interfaces
2) They share request property names

That's it. 
No, I'd say that the request side of JSGI is almost identical to CGI, apart from env being provided in a parameter and not accessed by getenv calls. JSGI's response side is of course a different story as we are discussing elsewhere. My main gripes with CGI is:
  1. Mangling of request header names
  2. Different naming schemes for request and response headers
  3. Properties of all different kinds flattened into one big namespace, forcing creative naming for future extensions (f ex what if we want to add some property beginning with "HTTP" that is not a header?)
and all these seem to match JSGI.
 
If you still think this is not a striking similarity between JSGI and CGI, then I'm happy to agree to disagree here, and ask you to respect my opinion. I'm not telling you what to say or do on this list, and I'll ask you to do the same for me.
 
Best regards
Mike Wilson

Mike Wilson

unread,
Sep 7, 2009, 4:46:46 PM9/7/09
to comm...@googlegroups.com
Thanks for the write-up, Wes.
 
This is just really loud thinking, but I wonder if something should be said (or promises given) in some specification wrt this area. Questions to address could include:
  • When running a JSGI application in a CommonJS container, can it expect concurrent requests or not? [I most certainly guess it should]
  • So if this is the case, should these request threads expect to execute within the same runtime or in different runtimes? [Maybe this should be up to containers and be left "explicitly undefined"?]
  • Etc...
Also, it would be nice if there was some opening for extending the model to keep session data in the application (and not just in the database), but with concurrent requests that necessitates data sharing between threads, which is hard. Maybe this belongs to middleware or frameworks.
 
Best regards
Mike


From: comm...@googlegroups.com [mailto:comm...@googlegroups.com] On Behalf Of Wes Garland
Sent: den 4 september 2009 22:45
To: comm...@googlegroups.com
Subject: [CommonJS] Re: outstanding jsgi spec issues

Isaac Z. Schlueter

unread,
Sep 8, 2009, 2:19:19 PM9/8/09
to CommonJS
On Sep 4, 6:29 am, "Mike Wilson" <mike...@hotmail.com> wrote:
> { method:..., url:..., headers:..., body:... }

You watching my commits, or is this just a case of gmta? http://j.mp/3HXkss
^_^

I would adore that structure for expressing the incoming request, and
you make a pretty strong case for abandoning the CGI-style variable
names, at least for the request headers.

What about something like this?

{
method : string, the request method
url : string, the url that was requested
headers : {
case-insensitive key/value pairs. MAY be all UC/LC at the whim of
the implementation.
Perhaps specify swapping css-case for camelCase or UPPER_SNAKE?
Needs bikeshedding.
},
body : forEach-able collection of toByteString-able things.
Also needed: some place to put non-header stuff, like queryString
and
server environment stuff and whatnot. "env" seems as good a key as
any other.
}

My main concern is that we don't make changes capriciously, or break
from rack and wsgi except where it makes sense given the nature of the
languages. (For instance, in the case mentioned, using an Object
instead of Array makes sense in Javascript, because JS's Arrays are
not the same as Ruby's lists.) Although I am pleased by the symmetry
in this suggestion, I worry that it's a change without a lot of
benefit.

--i

Dean Landolt

unread,
Sep 8, 2009, 2:33:30 PM9/8/09
to comm...@googlegroups.com

I really like this suggestion and I do think it has some benefit. But JSGI, as it's defined now gets me what I need. Is there room for both? If so, how? Otherwise, it seems like we'll be bikeshedding for quite some time before we'll be in that same place we are with JSGI now, with multiple, interoperable implementations.

Tom Robinson

unread,
Sep 8, 2009, 3:23:20 PM9/8/09
to comm...@googlegroups.com
Let's do a quick vote to see if we should continue discussion on this.
If the consensus is it should change then I'd rather get it over with
now.

Are you in favor of:

1) Keeping JSGI close to WSGI, Rack, *CGI, etc (also WSAPI for Lua http://wsapi.luaforge.net/
and PSGI for Perl http://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html
)
2) Changing the "env" object properties and headers from the CGI-like
variables to something prettier.

While we're at it, there's the pending issue of changing the
"namespace delimiter" from "." to "_", i.e. env["jsgi.foo"] vs env
["jsgi_foo"] (which allows for env.jsgi_foo)

A) "."
B) "_"

Mike Wilson

unread,
Sep 8, 2009, 3:58:15 PM9/8/09
to comm...@googlegroups.com
[excuse the long post, you can skip the Example part
and just read Requirements and Basic idea if you are
short of time]

Daniel Friesen wrote:
> Wes
> Does this dummy app using the push model asynchronously work for you?
> http://gist.github.com/181131
> It basically asynchronously writes the current datetime to
> the stream 5
> times 1 second apart.

> [...]


> The one using a stream directly on env is a little fuzzy in
> implementation to me. You could just give it direct access to
> whatever
> output stream there is, but you'd have to find a way to make
> sure that
> headers are written before output. Middleware would modify
> contents by
> replacing jsgi.output in the env they send to the app with some other
> type of faux stream or stream interface implementing thing that would
> modify then write to the /real/ jsgi.output they were given (even if
> that is actually another fake layer given by another piece of
> middleware)

I have also been trying to think a bit on what a "push"
solution could look like. I can't say that I have a
100% finished solution, but I'd like to share my
unfinished thoughts so far, to maybe kick off some new
ideas.

REQUIREMENTS

First, these are the requirements I had in mind when
sketching on the solution:
- a middleware/app should be able to |read| data that
it wants for input, and |write| data that it outputs,
without having to provide its own handling of lazy
responses
- there should be an easy way to make sure that headers
are output at the right time
- attempts to set headers after the response has begun
transmission should result in exceptions, and not be
silently ignored
- a middleware ("filter") should be able to alter
headers and body contents on the incoming request
before it reaches the "child" middleware/app.
- a middleware ("filter") should be able to alter
headers and body contents of the outgoing response
before it reaches the "parent" middleware/app or
container.

BASIC IDEA

All the different ideas that popped into my head seems
to revolve around something pipeline- or maybe even
Actor-oriented in some sense.

The main feature is that apps/middlewares wouldn't call
each other directly, but instead be connected through a
container-provided pipeline, a bit like UNIX pipelines.
This allows one side of the pipeline to read and the
other to write, while hiding away scheduling or
buffering internals in container code.

So, instead of:

server <-> middleware <-> app

we get:

server <-> pipe <-> middleware <-> pipe <-> app

I'm mentioning Actors because in some designs this
pipeline could resemble an object's "asynchronous
mailbox", typical of Actors.

EXAMPLE SOLUTION

Here's one sketch, one of many that are possible along
the lines of this basic idea.

Note that I'm writing these examples with blocking reads
so it gets a bit easier to see the rest of the code. This
should probably look somewhat different in real life.

An application would look like this:

function app(pipe) {
var mystuff = pipe.request.getHeader("My-Stuff");
pipe.response = new Response;
pipe.response.status = 200;
pipe.response.setHeader("Last-Modified", "...");
var line;
while((line = pipe.read()) != null)
pipe.write(line.toUpperCase());
}

(I'm assuming ES5 getters and setters so that "response="
and "status=" can throw if the body contents is already
being transmitted when they are set.)

The Response class represents response properties (status
and headers but not body) and is used to control
"freezing" of these. It also has methods that deal with
case insensitive headers.

Now, when the container wants to run "app" it would make
a new pipeline based on the request properties, push in
body contents and then wait for the response:

var req = {<request props>};
pipe = new Pipeline(app, req, respCb);
pipe.write(<request body>);
function respCb() {
... = pipe.response.status; // response props
while(pipe.read()...) // response body
}

The response callback (respCb) is triggered when "app"
makes its first write to the pipeline. This freezes the
Response object, so the caller can safely take action on
the response properties. The caller then reads the
response body from the pipeline.

The Pipeline could look something like this:

function Pipeline(appfunc, request, responseCb) {
this._buffer = [];
this.response = null;
var innerpipe = {
request: request,
response: null,
read: function...,
write: function... /* on first call freezes response
and copies to outer pipe, then triggers responseCb */
};
appfunc(innerpipe);
}
Pipeline.prototype = {
write: function...
read: function...
};

As can be seen the Pipeline has an outer API and an inner
API, for the caller and callee, respectively. Body contents
that is written to the outer side is read from the inner
side and vice versa. The inner side should also allow the
response to be assigned, while the outer side should only
allow it to be examined.

Using the Pipeline,we can also build a middleware filter
that wraps another app or middleware. This example shows
conversion of data in request and response body contents:

function filter(pipe) {
var childpipe = new Pipeline(app, pipe.request, respCb);

// Pump and convert request body
var line;
while((line = pipe.read()) != null)
childpipe.write(line.toLowerCase());

// Pump and convert response body
function respCb() {
pipe.response = childpipe.response;
var line;
while((line = childpipe.read()) != null)
pipe.write(line.toLowerCase());
}
}

Note that the request and response objects are reused
between the two pipelines, and that parallel transmission
of request and response is possible if the pipelines are
scheduled in some nice way.

So, what do you think about these ideas? Have I missed any
use cases or requirements?

Best regards
Mike Wilson

Daniel Friesen

unread,
Sep 8, 2009, 4:28:17 PM9/8/09
to comm...@googlegroups.com
I'd rather have some cleaner, saner names.
As for env namespacing. _ just doesn't strike me as a namspacing
character... It feels like a space replacement. I'd prefer actual
namespaces myself.

Actually, if we do use prettier names meaning .method, .headers,
.body... Then doesn't that already mean we're now already using a
multi-level object?
If that's the case, why not go all the way and have an option C) for
actual namespaces (js objects).

{
method: "GET",
...,
headers: { ... },
body: undefined,
jsgi: {
version: [0,3],
url_scheme: 'http', // instead of scheme why not proto:/protocol:
or ssl: false?
multithread: false,
multiprocess: false,
run_once: false
}
}

One, note... url? Are we going to reconstruct that in the jsgi app?
- Normally that tends to be done app level, or at least request object
level.
- Apps don't necessarily need to live in the site root, they can be part
of a subfolder.
-- Rack handles this alright with the protocol (however I'm not quite a
fan of how the Request/Response objects don't give one ounce of help in
handling relative urls in a nice and easy way.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Hannes Wallnoefer

unread,
Sep 8, 2009, 4:34:22 PM9/8/09
to comm...@googlegroups.com
2009/9/8 Tom Robinson <tlrob...@gmail.com>:
>
> Are you in favor of:
>
> 1) Keeping JSGI close to WSGI, Rack, *CGI, etc (also WSAPI for Lua http://wsapi.luaforge.net/
>  and PSGI for Perl http://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html
>  )
> 2) Changing the "env" object properties and headers from the CGI-like
> variables to something prettier.

2)

> While we're at it, there's the pending issue of changing the
> "namespace delimiter" from "." to "_", i.e. env["jsgi.foo"] vs env
> ["jsgi_foo"] (which allows for env.jsgi_foo)
>
> A) "."
> B) "_"

B) (unless env.jsgi is actually a nested object)

hannes

Kris Zyp

unread,
Sep 8, 2009, 4:50:56 PM9/8/09
to comm...@googlegroups.com

I would like to add a couple:
- The 99% use case for JSGI will still be sync. The sync users should
need a more complicated API to pay for the async users. If possible, I
think we should maintain the current API, and provide additive support
for async.
- We should reuse async mechanisms. The whole premise of promises is to
provide an reusable API for async, it doesn't seem helpful to reinvent
async APIs here.

I believe that all of your requirements as well as mine can be meet by
the current promise (the low level and the promise manager) API in
conjunction with JSGI, by allowing the forEach to return a promise if
async behavior is needed. This provides a simple extension to the
current API, is easy to use for middleware, and allows for syntactical
sugar for environments that wish to provide generic async help. An
example top-ware:
function saySomethingAfterOneSec(env){
return {status:200, headers:{}, body:{forEach: function(write){
var promise = new FulfillablePromise(); // if we have a promise
that can be fulfilled by calling fulfill
setTimeout(function(){
write("had to think about it for a sec");
promise.fulfill(); // now we are all done
},1000);
return promise;
}}};
}

Now middleware can easily handle this as well. Lets look at how we could
add async support to a simple middleware app that wraps a response in
quotes (this uses the current API and expects sync behavior):
function Quote(nextApp){
return function(env){
var response = nextApp(env);
var body = response.body;
response.body = {forEach : function(write) {
write("(");
body.forEach(write);
write(")");
}};
};
}

To make this handle async (or sync) with a promise manager, we would do:
function Quote(nextApp){
return function(env){
var response = nextApp(env);
var body = response.body;
response.body = {forEach : function(write) {
write("(");
return Q.when(body.forEach(write), function(){
write(")");
});
}};
};
}

To make this async in an environment that provides a promise manager to
leverage generator syntax it becomes even easier to add sync (just add a
yield statement and async wrapper):
function Quote(nextApp){
return function(env){
var response = nextApp(env);
var body = response.body;
response.body = {forEach : async(function(write) {
write("(");
yield body.forEach(write);
write(")");
})};
};
}

There is no reason implementations can't throw errors at appropriate
times when headers are set after a response is started.
Kris

Isaac Z. Schlueter

unread,
Sep 8, 2009, 5:30:52 PM9/8/09
to CommonJS

> 1) Keeping JSGI close to WSGI, Rack, *CGI, etc (also WSAPI for Luahttp://wsapi.luaforge.net/
> and PSGI for Perlhttp://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html
> )
> 2) Changing the "env" object properties and headers from the CGI-like
> variables to something prettier.

So very torn on this. 2 is so nice, and so symmetrical and
beautiful. But it's also totally something that an application could
monkeypatch without very much trouble, and 1 greatly reduces the
friction with other *SGI incarnations.

> A) "."
> B) "_"
C) Nested objects. (IE, env.jsgi.foo). The spec can then say which
keys are off-limits on the env object, and stipulate that every
application should only hang data on their own namespace. (Or just
say that new keys added to env must be lower-case, and cannot be
"jsgi".)

--i

Mike Wilson

unread,
Sep 8, 2009, 6:07:36 PM9/8/09
to comm...@googlegroups.com
Kris Zyp wrote:
> - We should reuse async mechanisms. The whole premise of
> promises is to provide an reusable API for async, it doesn't
> seem helpful to reinvent async APIs here.

Thanks for making examples with Promise. I was going
to write that I am not fluent with this API and maybe
was missing something, but forgot.

> function Quote(nextApp){
> return function(env){
> var response = nextApp(env);
> var body = response.body;
> response.body = {forEach : function(write) {
> write("(");
> body.forEach(write);
> write(")");
> }};
> };
> }

Interesting, I didn't realize forEach was supplied the
write function. Is this always the case, and are
there any other interesting args? And would the above
example be a typical way to design "pushing" code, with
most of the code in the response object?
(I'll try to rethink things based on this information
when I get some time tomorrow)

How would you best use the above pattern to:
1) Make an app that "copies" input stream straight to
output stream.
2) Make a middleware that processes both input stream
and output stream (f ex toLowerCase() on both).
?

> There is no reason implementations can't throw errors at
> appropriate times when headers are set after a response
> is started.

I'm not sure I grokked the double negation - could you
rephrase? :-)

Best regards
Mike

Kris Zyp

unread,
Sep 8, 2009, 6:07:48 PM9/8/09
to comm...@googlegroups.com

Tom Robinson wrote:
> Let's do a quick vote to see if we should continue discussion on this.
> If the consensus is it should change then I'd rather get it over with
> now.
>
> Are you in favor of:
>
> 1) Keeping JSGI close to WSGI, Rack, *CGI, etc (also WSAPI for Lua http://wsapi.luaforge.net/
> and PSGI for Perl http://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html
> )
> 2) Changing the "env" object properties and headers from the CGI-like
> variables to something prettier.
>

2


> While we're at it, there's the pending issue of changing the
> "namespace delimiter" from "." to "_", i.e. env["jsgi.foo"] vs env
> ["jsgi_foo"] (which allows for env.jsgi_foo)
>
> A) "."
> B) "_"
>
>

B (unless env.jsgi is actually a nested object, like Hannes said)
Kris

Tom Robinson

unread,
Sep 8, 2009, 7:12:04 PM9/8/09
to comm...@googlegroups.com

On Sep 8, 2009, at 2:30 PM, Isaac Z. Schlueter wrote:

>> 1) Keeping JSGI close to WSGI, Rack, *CGI, etc (also WSAPI for Luahttp://wsapi.luaforge.net/
>> and PSGI for Perlhttp://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html
>> )
>> 2) Changing the "env" object properties and headers from the CGI-like
>> variables to something prettier.
>
> So very torn on this. 2 is so nice, and so symmetrical and
> beautiful. But it's also totally something that an application could
> monkeypatch without very much trouble, and 1 greatly reduces the
> friction with other *SGI incarnations.

Me too.

This quote comes to mind:

"I suspect there’s a corollary to Zawinski’s Law at work here: every
gateway interface expands until it looks sort of like a framework API."

http://www.b-list.org/weblog/2009/aug/10/wsgi/

Granted that article is all about WSGI's shortcomings. Several of the
criticisms don't really apply to JSGI, but one of the complaints is
the every framework has to reinvent the higher level request and
response objects (though many use WebOb, but it's not standard)

I admit using JSGI as both a gateway interface and very simple
framework foundation, built around middleware, is appealing to me.

We have already essentially unified the JSGI response and Jack's
Response object. The latter is the former plus some helper functions.
Perhaps we should do the same with JSGI's env and Jack's Request.

There could be two "levels" to JSGI:

"Level 0": the very minimal "gateway" API, which would be equivalent
to what we have now, perhaps with camelcase names, and headers in
their own hash. JSGI compatible webservers only need to implement
Level 0.

"Level 1": a standard set of request/response helper APIs, which would
be compatible with Level 0 but also offer higher level APIs,
essentially like Jack's Response/Request. Frameworks would not be
required to use Level 1 for their request/response objects, but are
encouraged to if it fits.

At various points in the chain of middleware/apps anyone are allowed
to replace the the 0 "env" with a (backwards compatible) Level 1
"request" and all downstream middleware/apps will get that
functionality. The Level 1 Request object should of course be smart
enough to only do the replacement once.

(I haven't thought this through completely, just throwing the idea out
there)

>
>> A) "."
>> B) "_"
> C) Nested objects. (IE, env.jsgi.foo). The spec can then say which
> keys are off-limits on the env object, and stipulate that every
> application should only hang data on their own namespace. (Or just
> say that new keys added to env must be lower-case, and cannot be
> "jsgi".)


My problem with nested objects is that it then requires null checking
at every namespace level. It's not a problem for the "jsgi" namespace
since it's guaranteed to be there, but for say
"jack.request.form_input" you end up with code like

if (env.jack && env.jack.request && env.jack.request.form_input)

instead of

if (env.jack_request_form_input)

The former is verbose, error prone, and potentially slower. It's
pretty common to check for the existence of these properties in
middleware.


-tom

Mike Wilson

unread,
Sep 8, 2009, 7:24:28 PM9/8/09
to comm...@googlegroups.com
Isaac Z. Schlueter wrote:
> On Sep 4, 6:29 am, "Mike Wilson" <mike...@hotmail.com> wrote:
> > { method:..., url:..., headers:..., body:... }
>
> You watching my commits, or is this just a case of gmta?
> http://j.mp/3HXkss
> ^_^

:-)

> What about something like this?
>
> {
> method : string, the request method
> url : string, the url that was requested

This may depend on what kind of url this refers to.
If it is the "Request-URI" on the "Request-Line"(per HTTP spec)
which includes path+queryString, but not protocol+host+port,
then prior art says:
Java: .requestUri
.NET: .rawUrl

If it should be a more "complete" URL, then I think it should
be designed and named to blend in with the other props inherited
from CGI (queryString etc).

> headers : {
> case-insensitive key/value pairs. MAY be all UC/LC at the whim of
> the implementation.

Fwiw, Apache Tomcat converts all incoming header names to
lowercase, and then through its JEE getHeader(name) method
uses this for quick lookup by first converting the supplied
name to lowercase.

As we discussed in another thread though, it has a
different behaviour for response headers, as it preserves
casing for them. As this is probably desired for response
headers maybe the same solution should be applied to request
headers. That would demand method access to the headers
collection, and not "plain js object" modification.

> Perhaps specify swapping css-case for camelCase or UPPER_SNAKE?

I would prefer to have the headers as "literal" as possible.

> Also needed: some place to put non-header stuff, like queryString
> and server environment stuff and whatnot. "env" seems as good a
> key as any other.

Looking at Java and .NET, many CGI-inspired properties have
been put on the "root" of the request object, f ex
queryString, pathInfo, pathTranslated, remoteHost, remoteUser,
contentType and contentLength.
I'd say that the most important, or required, properties
should go in the root, and the rest are probably headers or
"attributes" (in Java this is key/value pairs that any code
may add). env/notes/extras/attributes/variables/properties are
all possible names for this collection.

Best regards
Mike

Mike Wilson

unread,
Sep 8, 2009, 7:24:42 PM9/8/09
to comm...@googlegroups.com
Hi Tom,

I'm mostly with Daniel's here, with a couple of inline
comments:

Daniel Friesen wrote:
> I'd rather have some cleaner, saner names.

+1

> As for env namespacing. _ just doesn't strike me as a namspacing
> character... It feels like a space replacement. I'd prefer actual
> namespaces myself.

I can live with both
request.attributes/env/etc["JSGI_X"]
request.attributes/env/etc["JSGI_Y"]
and
request.jsgi.x
request.jsgi.y
or even
request.jsgiX
request.jsgiY
if the namespace doesn't have so many properties and they are all
required.

> Actually, if we do use prettier names meaning .method, .headers,
> .body... Then doesn't that already mean we're now already using a
> multi-level object?
> If that's the case, why not go all the way and have an option C) for
> actual namespaces (js objects).
>
> {
> method: "GET",
> ...,
> headers: { ... },
> body: undefined,
> jsgi: {
> version: [0,3],
> url_scheme: 'http', // instead of scheme why not
> proto:/protocol:
> or ssl: false?
> multithread: false,
> multiprocess: false,
> run_once: false
> }
> }

I would move the "scheme" to a root property as this has more to
do with the