FWIW, the wsgi_http2env is more or less an exact copy of similar
routine in Apache itself used in its mod_cgi modules when generating
similar variable names for CGI, which WSGI basically adheres to for
that encoding convention.
Graham
2011/8/11 Antony Chazapis <chaz...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>
>
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
When you step through the various standards, you end up with:
The field-name must be composed of printable ASCII characters
(i.e., characters that have values between 33. and 126.,
decimal, except colon).
That is only for the header name though for HTTP.
Anyway, definitely can't have arbitrary characters such that could
handle byte string version of a Unicode string.
For WSGI, that header name gets converted to a CGI meta variable name
as defined in:
http://www.ietf.org/rfc/rfc3875
as:
meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" |
"CONTENT_TYPE" | "GATEWAY_INTERFACE" |
"PATH_INFO" | "PATH_TRANSLATED" |
"QUERY_STRING" | "REMOTE_ADDR" |
"REMOTE_HOST" | "REMOTE_IDENT" |
"REMOTE_USER" | "REQUEST_METHOD" |
"SCRIPT_NAME" | "SERVER_NAME" |
"SERVER_PORT" | "SERVER_PROTOCOL" |
"SERVER_SOFTWARE" | scheme |
protocol-var-name | extension-var-name
protocol-var-name = ( protocol | scheme ) "_" var-name
scheme = alpha *( alpha | digit | "+" | "-" | "." )
var-name = token
extension-var-name = token
Where working back for token you get:
alpha = lowalpha | hialpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z"
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
"I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
"Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
"Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
alphanum = alpha | digit
OCTET = <any 8-bit byte>
CHAR = alpha | digit | separator | "!" | "#" | "$" |
"%" | "&" | "'" | "*" | "+" | "-" | "." | "`" |
"^" | "_" | "{" | "|" | "}" | "~" | CTL
CTL = <any control character>
token = 1*<any CHAR except CTLs or separators>
So, technically the code borrowed from Apache could well be too
restrictive as that would appear on first read to allow '%'.
Would have to do some more investigation as to why Apache does it that
way. Since for CGI it becomes a process environment variable, maybe
there is some restriction because of cross platform compatibility.
As far as what is accepted practice, I have never ever seen anyone
using anything for header names that wasn't alphanumeric and dash.
Graham
2011/8/14 Antony Chazapis <chaz...@gmail.com>: