The existing behaviour is that when the http package is writing the
response headers, if no explicit Content-Type has been set,
"text/html; charset=utf-8" is used.
I think that's a mistake, and my arguments are below.
(1) Setting a default Content-Type, while convenient, is not Go-like.
It is backward-looking, not forward-looking.
One of, if not the most important argument put forth that the
Content-Type should be defaulted to this is that older browsers
(particularly Internet Explorer) do a bad job of content-sniffing when
they don't receive a Content-Type.
However, newer browsers tend to be better behaved, *except* when you
give them the wrong Content-Type (see point 2). We're optimising for a
dying breed of clients, and *only* when the programmer doesn't declare
what Content-Type they are generating.
If we care this much that a Content-Type is set, perhaps the http
package should throw errors if content is written without a
Content-Type being set.
(2) This particular default, "text/html; charset=utf-8" is not almost
always the right one.
When we dropped the laddr argument to net.Dial, that was a useful
change because the programmer almost always wants laddr to be "", and
there were other options for the rare instance where the programmer
wanted something else. That's not the case with Content-Type.
It's true that many uses of the http package will be sending UTF-8
encoded HTML back, but it's only a majority case, and probably only a
slim majority at that. Other responses include image/*, text/plain,
application/json, application/octet-stream, and so on. It would be
better for there to be *no* Content-Type sent with those responses
than an *incorrect* Content-Type for many reasons, not least of which
that browsers behave unpredictably when given an incorrect
Content-Type.
A small anecdote: I was a teaching assistant at Google I/O BootCamp
this year, and I came across one attendee who was horribly confused.
Their tiny HTTP handler looked something like this:
func serve(w http.ResponseWriter, r *http.Request) {
t := T{"something", 4}
fmt.Fprintf("{ %s , %d }", t.Name, t.Age)
}
Their browser (I can't remember; it might have been Firefox) was
throwing up an obscure XML error message trying to parse the response,
and it was because the Content-Type was silently set to "text/html;
charset=utf-8". That's not a good first experience, and it wasn't easy
to explain.
In short, the Web is not the entirety of the Internet, and HTML is not
the only thing sent over HTTP.
(3) Bad programs are still going to get it wrong.
A program that doesn't care (or forgets) to explicitly set a
Content-Type header is not guaranteed to be generating valid
UTF-8-encoded HTML.
(4) We're violating the RFC.
The HTTP RFC specifies that a Content-Type header SHOULD be included,
and that a client MAY guess if the header isn't there. That's the
protocol supported by an increasing majority of browsers; while we're
trying to be clever to work around bad behaviour of older, dying
clients, we're mucking up the behaviour of newer, well-behaved
clients. RFCs aren't the be-all and end-all, but standards are most
useful when things conform to them, and short of compelling reasons
(old IE support not being one of those) we should follow the standard.
(5) It's magical.
I expect a HTTP package to do the right kinds of protocol work and
header formatting for me, and maybe even set things like
Content-Length that it can perfectly deduce. I don't expect a HTTP
package to declare to the world what my Content-Type is, especially
when it is a static default. It's not the way that the vast majority
of other widely used HTTP packages work, and it's surprising.
Dave.
(1) Setting a default Content-Type, while convenient, is not Go-like.
It is backward-looking, not forward-looking.
One of, if not the most important argument put forth that the
Content-Type should be defaulted to this is that older browsers
(particularly Internet Explorer) do a bad job of content-sniffing when
they don't receive a Content-Type.
However, newer browsers tend to be better behaved, *except* when you
give them the wrong Content-Type (see point 2). We're optimising for a
dying breed of clients, and *only* when the programmer doesn't declare
what Content-Type they are generating.
If we care this much that a Content-Type is set, perhaps the http
package should throw errors if content is written without a
Content-Type being set.
(2) This particular default, "text/html; charset=utf-8" is not almost
always the right one.
In short, the Web is not the entirety of the Internet, and HTML is not
the only thing sent over HTTP.
(3) Bad programs are still going to get it wrong.
A program that doesn't care (or forgets) to explicitly set a
Content-Type header is not guaranteed to be generating valid
UTF-8-encoded HTML.
(4) We're violating the RFC.
The HTTP RFC specifies that a Content-Type header SHOULD be included,
and that a client MAY guess if the header isn't there. That's the
protocol supported by an increasing majority of browsers; while we're
trying to be clever to work around bad behaviour of older, dying
clients, we're mucking up the behaviour of newer, well-behaved
clients. RFCs aren't the be-all and end-all, but standards are most
useful when things conform to them, and short of compelling reasons
(old IE support not being one of those) we should follow the standard.
(5) It's magical.
> (1) Setting a default Content-Type, while convenient, is not Go-like.
> It is backward-looking, not forward-looking.
It is very Go like to make the API as simple as possible.
The most common thing you want to do in a web server
is serve web pages, and there the content type should
be text/html; charset=utf8, hence the default.
The charset=utf8 is important, because it is standard in Go.
Most people who write these servers will forget that part.
That's why it's there automatically.
> One of, if not the most important argument put forth that the
> Content-Type should be defaulted to this is that older browsers
> (particularly Internet Explorer) do a bad job of content-sniffing when
> they don't receive a Content-Type.
> However, newer browsers tend to be better behaved, *except* when you
> give them the wrong Content-Type (see point 2). We're optimising for a
> dying breed of clients, and *only* when the programmer doesn't declare
> what Content-Type they are generating.
I don't buy this at all. Not setting Content-Type is equivalent
to using an uninitialized variable: it might happen to work out,
but it's not guaranteed. The safe thing is to initialize the variable
to a defined default, and then you'll get consistent behavior
everywhere.
> (2) This particular default, "text/html; charset=utf-8" is not almost
> always the right one.
>
> It's true that many uses of the http package will be sending UTF-8
> encoded HTML back, but it's only a majority case, and probably only a
> slim majority at that. Other responses include image/*, text/plain,
> application/json, application/octet-stream, and so on. It would be
> better for there to be *no* Content-Type sent with those responses
> than an *incorrect* Content-Type for many reasons, not least of which
> that browsers behave unpredictably when given an incorrect
> Content-Type.
At least they behave the same. Sure there are other possible
content-types. That's why it's not hard-coded. Most handlers
people write send HTML.
> A small anecdote: I was a teaching assistant at Google I/O BootCamp
> this year, and I came across one attendee who was horribly confused.
> Their tiny HTTP handler looked something like this:
> func serve(w http.ResponseWriter, r *http.Request) {
> t := T{"something", 4}
> fmt.Fprintf("{ %s , %d }", t.Name, t.Age)
> }
> Their browser (I can't remember; it might have been Firefox) was
> throwing up an obscure XML error message trying to parse the response,
> and it was because the Content-Type was silently set to "text/html;
> charset=utf-8". That's not a good first experience, and it wasn't easy
> to explain.
Huh? How is `{ something, 4 }` not valid HTML?
> (3) Bad programs are still going to get it wrong.
>
> A program that doesn't care (or forgets) to explicitly set a
> Content-Type header is not guaranteed to be generating valid
> UTF-8-encoded HTML.
A program that sets it is not guaranteed to do so either.
This is not a valid argument.
> (4) We're violating the RFC.
>
> The HTTP RFC specifies that a Content-Type header SHOULD be included,
> and that a client MAY guess if the header isn't there. That's the
> protocol supported by an increasing majority of browsers; while we're
> trying to be clever to work around bad behaviour of older, dying
> clients, we're mucking up the behaviour of newer, well-behaved
> clients. RFCs aren't the be-all and end-all, but standards are most
> useful when things conform to them, and short of compelling reasons
> (old IE support not being one of those) we should follow the standard.
You have a very different interpretation of the RFC than I do.
My reading of those words is that setting Content-Type is
preferable to not setting it.
> (5) It's magical.
>
> I expect a HTTP package to do the right kinds of protocol work and
> header formatting for me, and maybe even set things like
> Content-Length that it can perfectly deduce. I don't expect a HTTP
> package to declare to the world what my Content-Type is, especially
> when it is a static default. It's not the way that the vast majority
> of other widely used HTTP packages work, and it's surprising.
It's not magical; it's a default setting.
Russ
Why do you change the content type for html5?
I thought you were supposed to write <!DOCTYPE html> ?
> I wrote the current code, so just to give the rationale...
>
>> (1) Setting a default Content-Type, while convenient, is not Go-like.
>> It is backward-looking, not forward-looking.
>
> It is very Go like to make the API as simple as possible.
It's Go-like in its simplicity, but not in its practicality. And it
seems to be tilted towards older, dying browsers, rather than newer,
rising browsers; *that* is what this point was about.
It's Go-like to be explicit about things; an extra line of code isn't
going to kill people, and we demand it in many situations. Providing
this default in this circumstance stands out.
>> One of, if not the most important argument put forth that the
>> Content-Type should be defaulted to this is that older browsers
>> (particularly Internet Explorer) do a bad job of content-sniffing when
>> they don't receive a Content-Type.
>> However, newer browsers tend to be better behaved, *except* when you
>> give them the wrong Content-Type (see point 2). We're optimising for a
>> dying breed of clients, and *only* when the programmer doesn't declare
>> what Content-Type they are generating.
>
> I don't buy this at all. Not setting Content-Type is equivalent
> to using an uninitialized variable: it might happen to work out,
> but it's not guaranteed. The safe thing is to initialize the variable
> to a defined default, and then you'll get consistent behavior
> everywhere.
It's nothing like using an uninitialized variable. It's more like
using a zero value. It's well-defined, and the fact that older, dying
browsers misbehave is orthogonal to that.
You're right that the safe thing to do is to initialise it to the
right thing, but it's the programmer who knows best what that right
thing is, and the right thing is *not* always "text/html;
charset=utf-8".
>> (2) This particular default, "text/html; charset=utf-8" is not almost
>> always the right one.
>>
>> It's true that many uses of the http package will be sending UTF-8
>> encoded HTML back, but it's only a majority case, and probably only a
>> slim majority at that. Other responses include image/*, text/plain,
>> application/json, application/octet-stream, and so on. It would be
>> better for there to be *no* Content-Type sent with those responses
>> than an *incorrect* Content-Type for many reasons, not least of which
>> that browsers behave unpredictably when given an incorrect
>> Content-Type.
>
> At least they behave the same. Sure there are other possible
> content-types. That's why it's not hard-coded. Most handlers
> people write send HTML.
No, they don't behave the same. IE does some sniffing, and will ignore
Content-Type if it looks too incorrect for some classes of MIME types.
Firefox throws weird errors. Chrome usually takes the Content-Type at
face value.
The HTML case is probably a majority, but I'd wager it's more like a
60% majority than a 99% majority. We should make it easy, but I
disagree it should be the default.
>> A small anecdote: I was a teaching assistant at Google I/O BootCamp
>> this year, and I came across one attendee who was horribly confused.
>> Their tiny HTTP handler looked something like this:
>> func serve(w http.ResponseWriter, r *http.Request) {
>> t := T{"something", 4}
>> fmt.Fprintf("{ %s , %d }", t.Name, t.Age)
>> }
>> Their browser (I can't remember; it might have been Firefox) was
>> throwing up an obscure XML error message trying to parse the response,
>> and it was because the Content-Type was silently set to "text/html;
>> charset=utf-8". That's not a good first experience, and it wasn't easy
>> to explain.
>
> Huh? How is `{ something, 4 }` not valid HTML?
It's not valid HTML. HTML starts with a tag, whether <HTML> or a <!DOCTYPE>.
I suspect the browser was guessing that it might have been JavaScript.
>> (3) Bad programs are still going to get it wrong.
>>
>> A program that doesn't care (or forgets) to explicitly set a
>> Content-Type header is not guaranteed to be generating valid
>> UTF-8-encoded HTML.
>
> A program that sets it is not guaranteed to do so either.
> This is not a valid argument.
A program that sets it is much more likely to get it right, because
the value is visible to the programmer, and they at least had to do
something to put it there. If they get it wrong, and something
misbehaves, they will see the Content-Type value they set, as opposed
to having to memorise the default that the http package applies
silently.
>> (4) We're violating the RFC.
>>
>> The HTTP RFC specifies that a Content-Type header SHOULD be included,
>> and that a client MAY guess if the header isn't there. That's the
>> protocol supported by an increasing majority of browsers; while we're
>> trying to be clever to work around bad behaviour of older, dying
>> clients, we're mucking up the behaviour of newer, well-behaved
>> clients. RFCs aren't the be-all and end-all, but standards are most
>> useful when things conform to them, and short of compelling reasons
>> (old IE support not being one of those) we should follow the standard.
>
> You have a very different interpretation of the RFC than I do.
> My reading of those words is that setting Content-Type is
> preferable to not setting it.
Setting it to a correct value, yes, but for a good portion of the time
"text/html; charset=utf-8" is *not* the correct value.
>> (5) It's magical.
>>
>> I expect a HTTP package to do the right kinds of protocol work and
>> header formatting for me, and maybe even set things like
>> Content-Length that it can perfectly deduce. I don't expect a HTTP
>> package to declare to the world what my Content-Type is, especially
>> when it is a static default. It's not the way that the vast majority
>> of other widely used HTTP packages work, and it's surprising.
>
> It's not magical; it's a default setting.
A default setting that, incidentally, is not documented.
But even if it were, it's still unusual amongst HTTP packages, and
surprising to me and others.
Dave.
Dave.
So let me get this straight. You state that:
- most of the time the present default is correct,
- sometimes it is not,
- people might not set the content-type,
- therefore we should force them to set it every time.
IMO, your proposed change merely increases the likelihood of people
getting it wrong.
It is reasonable to expect people to set the content-type in the
minority cases, but why should I have to add (literally) hundreds of
lines to my existing web projects? It is boilerplate and that sucks.
> A default setting that, incidentally, is not documented.
It should be documented, then.
+1 to everything Russ said.
Andrew
> So let me get this straight. You state that:
> - most of the time the present default is correct,
I don't even know that UTF-8-encoded HTML is "most of the time".
As for some concrete data, I went to
http://www.google.com/codesearch?hl=en&lr=&q=lang%3Ago+func.*http.ResponseWriter&sbtn=Search
to get a feel for what types of data people are generating. Here's a summary:
1. Ambiguous. It's a framework.
2. Produces an incorrect HTTP response, because it's writing plain
text, not HTML.
3. Ambiguous, but certainly gets it wrong in error cases because
it's writing plain text, not HTML.
4. Gets it wrong, because it's not producing HTML.
5. Aah, finally some HTML. This benefits from the default.
6. Gets it right in one place, because it explicitly sets a
Content-Type, but gets it wrong in almost every other because it isn't
HTML.
7. Has a helper to explicitly set a Content-Type correctly.
I got tired after those seven, but they seem like they might be
representative. You'll notice that only 1 in 7 benefits from the
default. So I retract my statement that HTML is the majority case,
because it seems not to be.
> IMO, your proposed change merely increases the likelihood of people
> getting it wrong.
At the moment, if the average programmer gets it 100% wrong in the
non-HTML case; I think that's a big case, if not a strict majority.
With my proposal, the average programmer will get it 0% wrong if they
don't set a Content-Type, and will usually get it close to right if
they do set it.
Not including the "charset=utf-8" is *not* getting it wrong; it's just
suboptimal. And if a programmer writes code that sets Content-Type to
image/png, but then produces JSON, then there's not much we can do to
stop them.
Dave.
> It is boilerplate and that sucks.
Oh, and it might feel like boilerplate, but it's not. It's a
legitimate HTTP response to not include Content-Type, so you don't
*have* to include it.
Dave.
Russ
You are adding complexity. Any server can use a handler that does
whatever it likes before passing the buck to other handlers.
Russ
To put it a different way, can you point to any authority that
says that omitting Content-Type in HTTP responses is the
new Right Thing To Do?
Russ
FWIW, when I was reviewing the http/fcgi package, I made (or was going
to make, I forget exactly) a comment that it shouldn't add
Content-Type by default, and was surprised to learn that the http
package does.
I would still prefer a naked HTTP response, but I can see Russ'
rationale for the current behavior, and am willing to let him pick the
bikeshed color.
You're not really stating my position quite accurately. Here's a list,
from best to worst, in my opinion:
- setting the Content-Type correctly
- not setting the Content-Type
- setting the Content-Type incorrectly
I'd prefer the second over the third, and the third occurs much more
regularly with the current default.
RFC 2616 section 7.2.1 says
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
I'd take it as implicit in that first sentence that the media type
should be the correct one.
And as far as RFCs are concerned, SHOULD is not MUST. I think an
incorrect value is worse than no value, and I reckon that at least Jon
Postel would agree.
Dave.
> Is it possible under the current setup to explicitly NOT send a
> content-type?
It's currently impossible. That bit us while implementing the
blobstore API for App Engine, incidentally, though that's a bit more
of a niche situation.
Dave.
> * RFCs in general, and especially RFC 2616, are often accidentally,
> optimistically, or delusional wrong. Reality trumps spec ambiguity.
Reality is that this default is incorrect for what seems to be a
majority of code. The RFC reference is but one of my points.
> * Changing this would break a lot of code. What's your proposed migration
> path? I can't think of a good one.
I can't see how this would break any code. It would, in fact, fix the
HTTP response of what seems to be a majority of code that does not set
an explicit Content-Type.
> * So that leaves people who forget to set it. Now somebody has to sniff.
> Is that browsers (n outcomes) or Go (1 outcome). If anybody is sniffing, I
> would prefer it be us.
Browsers are already sniffing. Why do you necessarily think we'd do a
better job?
Incidentally, I'm not opposed to us adding some sniffing on the Go
side, if that's what people really, truly want. But even then, if the
sniffing isn't confident it should still default to nothing.
Dave.
> * Changing this would break a lot of code. What's your proposed migrationI can't see how this would break any code.
> path? I can't think of a good one.
It would, in fact, fix the
HTTP response of what seems to be a majority of code that does not set
an explicit Content-Type.
> * So that leaves people who forget to set it. Now somebody has to sniff.Browsers are already sniffing. Why do you necessarily think we'd do a
> Is that browsers (n outcomes) or Go (1 outcome). If anybody is sniffing, I
> would prefer it be us.
better job?
> So we gave them confidence to ship their buggy code. Yay us. No, I'd
> rather we break their Content-Type-less JSON immediately and force them to
> do the right thing. Not get bitten later.
So if a Content-Type is set, we should log loudly or even panic.
Setting Content-Type to "text/html; charset=utf-8" when the content is
JSON is going to behave differently on different browsers, so we'd
have to hope that they're testing every response their (who?) server
makes on every browser.
>> > * So that leaves people who forget to set it. Now somebody has to
>> > sniff.
>> > Is that browsers (n outcomes) or Go (1 outcome). If anybody is
>> > sniffing, I
>> > would prefer it be us.
>>
>> Browsers are already sniffing. Why do you necessarily think we'd do a
>> better job?
>
> When I write code in Go, I expect it to run the same on Linux, Mac, FreeBSD,
> Windows, Chrome, Firefox, or MSIE.
> Portability means not leaving API behavior up to the environment, be that
> system calls or content-type sniffing.
Then you shouldn't want this default Content-Type. It results in HTTP
responses that do *not* behave the same across browsers.
Dave.
On Thu, Jun 2, 2011 at 3:35 PM, Brad Fitzpatrick <brad...@golang.org> wrote:So if a Content-Type is set, we should log loudly or even panic.
> So we gave them confidence to ship their buggy code. Yay us. No, I'd
> rather we break their Content-Type-less JSON immediately and force them to
> do the right thing. Not get bitten later.
Setting Content-Type to "text/html; charset=utf-8" when the content is
JSON is going to behave differently on different browsers, so we'd
have to hope that they're testing every response their (who?) server
makes on every browser.
Then you shouldn't want this default Content-Type. It results in HTTP
>> > * So that leaves people who forget to set it. Now somebody has to
>> > sniff.
>> > Is that browsers (n outcomes) or Go (1 outcome). If anybody is
>> > sniffing, I
>> > would prefer it be us.
>>
>> Browsers are already sniffing. Why do you necessarily think we'd do a
>> better job?
>
> When I write code in Go, I expect it to run the same on Linux, Mac, FreeBSD,
> Windows, Chrome, Firefox, or MSIE.
> Portability means not leaving API behavior up to the environment, be that
> system calls or content-type sniffing.
responses that do *not* behave the same across browsers.
Thank you for sharing with us.
But I'm still not sure what's the *real* problem you are trying to fix,
to figure it out.
> (1) Setting a default Content-Type, while convenient, is not Go-like.
> It is backward-looking, not forward-looking.
Package http API design issue?
(if so I have no preference)
> (2) This particular default, "text/html; charset=utf-8" is not almost
> always the right one.
Interoperability issue?
(seems like not)
> (3) Bad programs are still going to get it wrong.
Language dissemination issue?
(if so I have no preference)
> (4) We're violating the RFC.
I think it doesn't matter unless the http package breaks the communication
btw customers of package http.
> (5) It's magical.
[...]
I guess the word "right/wrong or correct/incorrect" is very subjective, more
practical words help me to understand your issue.
-- Mikio
> You said earlier you didn't care about MSIE, but if you do, let's just set
> "X-Content-Type-Options: nosniff" on all our responses:
> http://blogs.msdn.com/b/ie/archive/2008/09/02/ie8-security-part-vi-beta-2-update.aspx
That doesn't help IE6 or IE7. And if we're getting it wrong so much of
the time then we need to rely upon the browser to get the sniffing
right.
>> >> > * So that leaves people who forget to set it. Now somebody has to
>> >> > sniff.
>> >> > Is that browsers (n outcomes) or Go (1 outcome). If anybody is
>> >> > sniffing, I
>> >> > would prefer it be us.
>> >>
>> >> Browsers are already sniffing. Why do you necessarily think we'd do a
>> >> better job?
>> >
>> > When I write code in Go, I expect it to run the same on Linux, Mac,
>> > FreeBSD,
>> > Windows, Chrome, Firefox, or MSIE.
>> > Portability means not leaving API behavior up to the environment, be
>> > that
>> > system calls or content-type sniffing.
>>
>> Then you shouldn't want this default Content-Type. It results in HTTP
>> responses that do *not* behave the same across browsers.
>
> I remain unconvinced. I'm off to sleep now, but surprise me with a
> standalone Go server in the morning that demonstrates the problem in various
> browsers.
A quick Google search found this:
https://developer.mozilla.org/en/Properly_Configuring_Server_MIME_Types
Browsers based on Gecko 2 will stop accepting different-origin CSS
files with the wrong MIME type.
You say that you expect your Go code to run the same in lots of
environments, and that you don't want to leave API behaviour up to the
environment. When you write something that speaks a protocol, then,
it's your job to follow that protocol. And, to a first degree, your
program is *not* speaking browser, it's speaking HTTP. The Go http
package is currently lying about the Content-Type in many situations,
and violating that protocol.
We've all had to deal with systems that don't follow the rules and are
broken in some regard. Why on earth would we want Go to be that kind
of system?
Furthermore, by propagating this notion that "text/html;
charset=utf-8" is a sensible default when we don't even try to see
whether its HTML we're producing, we're telling HTTP clients: "Hey,
don't trust our Content-Type, especially if it says text/html. You
better sniff the content and take a stab in the dark."
To turn it around: what's the benefit of having this default?
- it saves one line of (strictly speaking, optional) code, that,
without it, can sometimes confuse IE6.
That's it? Seriously?
Dave.
No, we are telling clients "believe what I say" so that they all
behave the same instead of some guessing right and some
guessing wrong.
Also, there are two guesses involved here: text/html and
charset=utf-8. While it might be easy (but not always)
to tell whether something is HTML, it is often very difficult
in predominantly ASCII pages to tell UTF-8 from other encodings.
I care much more about getting the charset tag out than
I do about the text/html part. That's one line I don't have
to look up every time I want to remember how to spell it
(which I did for years before writing this package).
Russ
>> Furthermore, by propagating this notion that "text/html;
>> charset=utf-8" is a sensible default when we don't even try to see
>> whether its HTML we're producing, we're telling HTTP clients: "Hey,
>> don't trust our Content-Type, especially if it says text/html. You
>> better sniff the content and take a stab in the dark."
>
> No, we are telling clients "believe what I say" so that they all
> behave the same instead of some guessing right and some
> guessing wrong.
If I use a server, and it's telling me Content-Type=text/html for
things that are definitely not HTML, then I stop believing what the
server is saying. That's what's going on here.
I repeat: HTML is not the 99% case; it's probably not even a majority.
> Also, there are two guesses involved here: text/html and
> charset=utf-8. While it might be easy (but not always)
> to tell whether something is HTML, it is often very difficult
> in predominantly ASCII pages to tell UTF-8 from other encodings.
> I care much more about getting the charset tag out than
> I do about the text/html part. That's one line I don't have
> to look up every time I want to remember how to spell it
> (which I did for years before writing this package).
I can get behind having a charset default. Something like this would
be fine with me:
ct := r.Header().Get("Content-Type")
if strings.HasPrefix(ct, "text/") && strings.Index(ct, ";") == -1 {
ct += "; charset=utf-8"
}
Dave.
People who serve non-HTML from a web server expect that
they have to set Content-Type. And they do.
Russ
They don't. See my straw poll from further up thread. We're even
getting it wrong in the standard library.
Dave.
I think this is the root of the disagreement.
It is mind-boggling to me that people expect they can just
spit out any content at all and let the browsers figure it out.
Again, can you point at any reference that says this is
the new Right Way To Do It?
Russ
It's the old Right Way To Do It, per RFC 2616
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1).
"Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If and
only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream"."
We may be interpreting that differently, so let me give my interpretation.
The first sentence says: Content-Type should be set to what the media
type is. (not: Content-Type should be set at all costs, even if
incorrect)
The second sentence says: The browser may sniff the contents if the
Content-Type header is missing. (that implies that the Content-Type
header is optional)
The third sentence says: application/octet-stream is the true default type.
In reality, a majority of browsers implement sniffing. IE implements
more aggressive sniffing than what the standard permits.
I think it would be reasonable for us to log something angrily if a
http response is written and a Content-Type header was not explicitly
set. I think that would have a good corrective reaction for Go
servers.
Dave.
The third sentence says: application/octet-stream is the true default type.
I think it would be reasonable for us to log something angrily if a
http response is written and a Content-Type header was not explicitly
set. I think that would have a good corrective reaction for Go
servers.
Setting charset on text is something I considered and may
even have done originally but that seemed much more magical
than having a default. It also doesn't handle the case where
the handler doesn't set text/html. I have seen enough UTF-8
mangled as Latin-1 on the web that I think this is important
enough to make sure it happens without any effort.
We all know what the default is.
We all know how to override the default.
Let's move on.
Russ