Message:
Hello golan...@googlegroups.com (cc: golan...@googlegroups.com),
I'd like you to review this change to
https://go.googlecode.com/hg/
Description:
http: content-type negotiation and Accept header parsing.
This introduces two functions and one type. The type is Accept
which represents a clause in an Accept header. The functions
are ParseAccept which parses such a header and Negotiate
which, given a header and a list of alternatives content types
returns the most appropriate result.
It is possible that the Negotiate should default by returning
the first of the alternative in the case that the provided
header is empty. This doesn't appear to happen often in
practice and in any case perhaps it is the calling function's
job to make sure that the header passed in is valid.
Please review this at http://codereview.appspot.com/4528086/
Affected files:
M src/pkg/http/Makefile
A src/pkg/http/autoneg.go
A src/pkg/http/autoneg_test.go
type Accept struct{}
type AcceptList []Accept
func (al AcceptList) Negotiate(contentType ...string) string {
// ...
}
func (r *Request) Accept() AcceptList {
}
?
* right now there is an accept_slice, which is like AcceptList. It exists for
using the sort package and I purposely did not want to expose the sort
implementation publicly, not least because it is backwards - sorts from
largest to smallest - which could be confusing if someone tried to use it
and I can't imagine why they would try to use it anyways. So probably
it makes sense to keep the private accept_slice and make AcceptList
a separate type that exists only for the purpose of the Negotiate method.
Doing this rather than just having a function is just a matter of style and
I'm happy to defer to your judgement.
* Maybe it should handle errors in the Accept header more strictly. With an
invalid header right now it will try to do the best it can,
skipping over anything
it doesn't understand how to parse. Returning an error also from Accept
would be a stronger statement about the completeness of the implementation
then the best effort that it makes now.
* Likewise for corner cases in the Negotiate function. What should it do if
the AcceptList is empty? What should it do if the list of alternatives is
empty? On a high level what should happen in the former case is probably
return the first alternative, and the second should result in a 406 Not
Acceptable. Right now it returns an empty string in both cases which doesn't
distinguish. So also returning an error here would probably make sense?
* Hiding the explicit parse function is probably ok. I can't really see why it
would be desirable to want to parse such a header string outside of the
context of an http.Request.
I'll work on rejigging it now...
Cheers,
-w
--
William Waites
Email: wwa...@gmail.com
UK tel: +44 131 516 3563
UK mob: +44 789 798 9965
Please take another look.
struct HeaderField type {
Value string
Q int
Params map[string]string
}
and ParseHeaderFields that parses and sorts it. It could be reused in
parsing Accept-Encoding, Accept-Charset, and Accept-Language.
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go
File src/pkg/http/autoneg.go (right):
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode14
src/pkg/http/autoneg.go:14:
No empty line here.
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode25
src/pkg/http/autoneg.go:25: // Structure to represent a clause in an
HTTP Accept Header
. and the end of sentences (everywhere)
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode33
src/pkg/http/autoneg.go:33: type accept_sorter []Accept
acceptSorter
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode64
src/pkg/http/autoneg.go:64: func (al AcceptList) Negotiate(alternatives
...string) (content_type string, err os.Error) {
contentType
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode108
src/pkg/http/autoneg.go:108: media_range := mrp[0]
mediaRange or something shorter.
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode122
src/pkg/http/autoneg.go:122: err = os.ErrorString(errs)
err = os.NewError("autoneg: invalid media range in " + part)
The thing is, the sorting algorithm is related but not the same. For
Accept we have type/subtype with facilities for wildcards, for Accept-Language
we have lang[-country]. I think but am not sure without checking that
Accept-Encoding and Accept-Charset would be the same, but still
different from the other two. Similarly with the matching algorithm.
If you have time to generalise the implementation in such a way as to
handle all of these cases, please feel free.
Cheers,
-w
> The thing is, the sorting algorithm is related but not the same. For
> Accept we have type/subtype with facilities for wildcards, for
Accept-Language
> we have lang[-country]. I think but am not sure without checking that
> Accept-Encoding and Accept-Charset would be the same, but still
> different from the other two. Similarly with the matching algorithm.
If sorting doesn't apply, we can parse first into the list of
HeaderField and then apply different sortings for AcceptList,
LanguageList, etc.
The reason I didn't do this originally is that it would mean that the
Accept (or HeaderField) contains only partially parsed values - I
thought it cleaner to have the parsing code in the parsing function
and the sorting and matching code in the sorting and matching
functions.
Looking at RFC2616 at the other negotiation headers, it is indeed
the case that Charset and Encoding would be the same. However
none of Charset, Encoding or Language have any concept of extra
parameters.
So the choices seem to be:
* Have three different-but-similar types and three different-but-similar
parsers
* Have one type and move some of the parsing into the sort/match
functions which have to be different anyways
* Replace HeaderField.Value with an interface that can be used for
sorting and a two-pass parser and have a special type for Accept
that perhaps embeds HeaderField and adds a map for params
The types are not complicated (Accept is actually the most
complicated) and neither are the parsers. The redundancy implied by
the first option (more or less where we are now) is not that great and
it keeps the code simple to understand.
The second option would be the straightforward way of implenting what
you suggest, but having parsing in the sorter is ugly as I have mentioned.
The third stands to remove all possible redundancy in the code and might
ultimately be the nicest but the effort/benefit tradeoff for removing such
a small amount of redundancy (and adding structural complexity to an
otherwise straightforward implementation) seems wrong to me, but I
wouldn't object if someone else wanted to do it.
ParseHeaderFields(headerValue string) []HeaderField
It would return just an unsorted slice of header fields:
ParseHeaderFields("text/html;q=0.7, text/*")
=>
[&HeaderField{Value: "text/html", Q: 0.7, ...}, &HeaderField{Value:
"text/*", Q: 1, ...}]
ParseHeaderFields("en-us;q=0.8, de;q=0.1")
=>
[&HeaderField{Value: "en-us", Q: 0.8, ...}, &HeaderField{Value: "de", Q:
0.1, ...}]
Then []Accept, []Languages, []Charsets thing can sort them as needed and
return their own values.
This way we need to write parser only once, minimizing the effect of the
"Don't parse" rule (http://cr.yp.to/qmail/guarantee.html) :-)
All of these headers are quite well structured and rigorously defined
in the RFC.
Moreover we have to parse them to do the sorting and matching because those
operations depend on the internal structure of things you're suggesting to leave
unparsed.
Right now the parsing stages and sorting stages *are* separate.
Except, you're receiving an untrusted input.
> Moreover we have to parse them to do the sorting and matching because
those
> operations depend on the internal structure of things you're
suggesting to leave
> unparsed.
You'll get what you need for Accept sorting in HeaderField.Value.
> Right now the parsing stages and sorting stages *are* separate.
But I won't be able to use this parser to implement []Languages, so I'll
have to write another one.
And if it can't be parsed, an error is returned which the application can
handle as it wishes/
> You'll get what you need for Accept sorting in HeaderField.Value.
Which I then have to parse (for Accept and Language) inside the
sorting function... And the matching function.
>> Right now the parsing stages and sorting stages *are* separate.
>
> But I won't be able to use this parser to implement []Languages, so I'll
> have to write another one.
Its a pretty simple parser. The format of the string being parsed is
different. I don't think this is much of a burden.
> And if it can't be parsed, an error is returned which the application
can
> handle as it wishes/
Provided that the parser code is correct.
> > You'll get what you need for Accept sorting in HeaderField.Value.
> Which I then have to parse (for Accept and Language) inside the
> sorting function... And the matching function.
You do basically the same, but instead of the long parsing method, you
have to care only about individual values.
parseAccept(fields []HeaderField) []Accept {
...
}
> >> Right now the parsing stages and sorting stages *are* separate.
> >
> > But I won't be able to use this parser to implement []Languages, so
I'll
> > have to write another one.
> Its a pretty simple parser. The format of the string being parsed is
> different. I don't think this is much of a burden.
I agree that it's simple, but is it different?
value;q=...;extension=..., *
Extension does not exist for Charset, Language, Encoding.
> Extension does not exist for Charset, Language, Encoding.
And thus, ignored when filling
type Language struct {
Script string
Region string
Q float32
}
from HeaderField
So you don't want to parse the media type in the generic parser
but you do want to parse the extension parameters which are
illegal in the other headers?
I've just updated with your cosmetic change suggestions.
It seems we have been doing the same thing during the weekend by pure chance.
type Variant struct {
// contains filtered or unexported fields
}
The http.Variant is an immutable resource representation option.
func NewVariant(mediaType string, languages ...string) *Variant
Gets a new http.Variant.
The mediaType may contain a charset parameter to specify its
character encoding.
If either the mediaType or languages are malformed NewVariant
returns nil.
func SelectVariant(available []Variant, request *Request, response
ResponseWriter) *Variant
SelectVariant returns the most appropriate Variant or nil if no match was
found.
When nil is returned then no futher action to the ResponseWriter is required.
func (v *Variant) Apply(header Header)
Applies the appropriate content headers for the response message.
The Accept, Accept-Charset and Accept-Language header parsing is done and
tested but the rest is still under development. I can send you a copy if you
want to?
Idealy I'd like to end up with func (mux *ServeMux) HandleVariant(pattern
string, handler Handler, content *Variant). What do you think?
Actually, discussing unrelated things in IRC, what would be *really* nice is
for the ebnf package's Grammar to be able to unmarshal things in a similar
way to the json and xml packages. That would be a nice general solution.
http://codereview.appspot.com/4528086/diff/9001/src/pkg/http/autoneg.go#newcode163
src/pkg/http/autoneg.go:163: // such a client would be quite broken...
isn't this a comma-separated field? multiple headers should mean the
same thing according to RFC 2616, no?
That is:
Accept: application/xml,application/xhtml+xml
and:
Accept: application/xml
Accept: application/xhtml+xml
Are the same, aren't they?