I think Ring's character encoding capabilities could use a little
love, and
I'd like a little feedback on a couple potential options. The problem
looks like this:'
If I define a handler function like
(def a (constantly {:status 200 :headers {"Content-Type" "text/html"}
:body "<html></body>hεllo world</body></
html>"}))
and serve it up using Ring's run-jetty macro, I get a response like:
< HTTP/1.1 200 OK
< Date: Mon, 05 Apr 2010 15:47:49 GMT
< Content-Type: text/html; charset=iso-8859-1
< Content-Length: 47
< Server: Jetty(6.1.14)
<
<html></body>h?llo world</body></html>
This is to say, Jetty (or the servlet api?) doesn't serve requests
using UTF-8 by default. I can of course just set the Content-Type
"text/html; charset=utf-8", but this becomes problematic if, say, I'd
like to define middleware that sets charset to UTF-8 by default: it
would have to parse the current Content-Type header, look for a
charset and conditionally set it (preserving the MIME type). It would
also be
somewhat prone to being accidentally overridden by another
Content-Type munging middleware down the road.
The servlet API has a solution to this in
HttpServletResponse#setCharacterEncoding,
but ring.util.servlet/update-servlet-response doesn't currently do
anything with it. Ideally, I'd be able to set some key in the Ring
response, and update-servlet-response would call setCharacterEncoding
with this value.
One possible solution to this is here:
http://github.com/travis/ring/commit/51f61d5f9721b93ffa4825b5aeb528e80353fc9d
As Mark very correctly pointed out, however, this would be an addition
to Ring's response spec, and probably shouldn't be undertaken lightly.
I see two potential solutions:
1) Add something to the response spec. Mark informally suggested
:character-encoding and :content-type, which would be consistent with
the underlying setCharacterEncoding and setContentType methods in the
Servlet spec, but moderately weird from a raw HTTP point of view (ie,
why should this header be allotted two values in the top level
response
object?)
2) Add functions like:
get-charset
set-charset
to the ring.util.response that grok the Content-Type header format and
can be used to safely modify the charset param. The big downside here
is the potential for accidental overrides (ie, '(assoc headers
"Content-Type" "text/html")') which could be confusing.
I have a slight preference for (1) as I think it provides a
more robust solution to the particular problem I'm trying to solve
(setting utf-8 as the default charset for my whole app), but I'm one
voice of many. Any thoughts?
Thanks!
Travis
Hi,
please have in mind that only character based resource representations like HTML, JSON or XML have the concept of a character encoding. For image/png e.g. a character encoding makes no sense.
That's why I wonder how such middleware would differentiate between character-based and binary resources!
-billy
--
To unsubscribe, reply using "remove me" as the subject.
There are 2 problems here:
1. (.setCharacterEncoding response "UTF-8") for all responses modifies
even binary responses
2. As it is currently implemented, this occurs for handlers reified as
servlets via ring.util.servlet/servlet, but not for handlers run
directly with ring.adapter.jetty/run-jetty.
I think we should remove this line in the next point release; any objections?
I'm still looking into the broader issue of the content-type /
character-encoding API for Ring responses.
- Mark
I put some brain onto this issue again. Although the Servlet API javadoc
at
http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletResponse.html#setCharacterEncoding(java.lang.String)
does not tell this explicitly I suppose that CharacterEncoding is used
when the response body is sent via the Writer returned by
ServletRespose#getWriter. When using the methods
java.io.OutputStream#write on the ServletOutputStream returned by
ServletResponse#getOutputStream no encoding will occur. For the
print/println methods defined at ServletOutputStream I think character
encoding will occur again.
This is definitively a very massy part of the Servlet API because it
mixes issues of character encoding with general (binary) io. A cleaner
separation as between java.io.OutputStream and java.io.PrintWriter would
have been nice.
As a conclusion I would support a middleware for setting the character
set, however it should correctly handle character set negotiation if
this was signalled in the HTTP request with the "Accept-Charset" Header
as specified at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
I cannot determine for sure how to correctly handle Accept-Charset for a
binary representation of a resource (say image/png) that doesn't support
character sety. I suppose ignoring Accept-Charset will be fine.
As far as I can see ring.middleware.file_info,
ring.util.response/file-response will need some rework too. One must be
able to specify the character set the file on disk is encoded in. In the
end ring must be able to respond with a Content-Type
"text/html;charset=iso-8859-15" while the file on disk is encoded in
"MacRoman".
-billy.
http://www.wsgi.org/wsgi/Specifications/unicode_handling
This is interesting - I'm gonna noodle on this a bit more. Thanks!
t