charset in Ring

407 views
Skip to first unread message

travis vachon

unread,
Apr 5, 2010, 12:11:47 PM4/5/10
to Ring
Hi folks

I think Ring's character encoding capabilities could use a little
love, and
I'd like a little feedback on a couple potential options. The problem
looks like this:'


If I define a handler function like

(def a (constantly {:status 200 :headers {"Content-Type" "text/html"}
:body "<html></body>hεllo world</body></
html>"}))

and serve it up using Ring's run-jetty macro, I get a response like:

< HTTP/1.1 200 OK
< Date: Mon, 05 Apr 2010 15:47:49 GMT
< Content-Type: text/html; charset=iso-8859-1
< Content-Length: 47
< Server: Jetty(6.1.14)
<
<html></body>h?llo world</body></html>

This is to say, Jetty (or the servlet api?) doesn't serve requests
using UTF-8 by default. I can of course just set the Content-Type
"text/html; charset=utf-8", but this becomes problematic if, say, I'd
like to define middleware that sets charset to UTF-8 by default: it
would have to parse the current Content-Type header, look for a
charset and conditionally set it (preserving the MIME type). It would
also be
somewhat prone to being accidentally overridden by another
Content-Type munging middleware down the road.

The servlet API has a solution to this in
HttpServletResponse#setCharacterEncoding,
but ring.util.servlet/update-servlet-response doesn't currently do
anything with it. Ideally, I'd be able to set some key in the Ring
response, and update-servlet-response would call setCharacterEncoding
with this value.

One possible solution to this is here:

http://github.com/travis/ring/commit/51f61d5f9721b93ffa4825b5aeb528e80353fc9d

As Mark very correctly pointed out, however, this would be an addition
to Ring's response spec, and probably shouldn't be undertaken lightly.

I see two potential solutions:

1) Add something to the response spec. Mark informally suggested
:character-encoding and :content-type, which would be consistent with
the underlying setCharacterEncoding and setContentType methods in the
Servlet spec, but moderately weird from a raw HTTP point of view (ie,
why should this header be allotted two values in the top level
response
object?)

2) Add functions like:

get-charset
set-charset

to the ring.util.response that grok the Content-Type header format and
can be used to safely modify the charset param. The big downside here
is the potential for accidental overrides (ie, '(assoc headers
"Content-Type" "text/html")') which could be confusing.


I have a slight preference for (1) as I think it provides a
more robust solution to the particular problem I'm trying to solve
(setting utf-8 as the default charset for my whole app), but I'm one
voice of many. Any thoughts?

Thanks!

Travis

Philipp Meier

unread,
Apr 5, 2010, 1:25:14 PM4/5/10
to ring-c...@googlegroups.com

Hi,

please have in mind that only character based resource representations like HTML, JSON or XML have the concept of a character encoding. For image/png e.g. a character encoding makes no sense.

That's why I wonder how such middleware would differentiate between character-based and binary resources!

-billy



--
To unsubscribe, reply using "remove me" as the subject.

Mark McGranaghan

unread,
Apr 7, 2010, 12:35:38 PM4/7/10
to ring-c...@googlegroups.com
Billy makes a good point. Reflecting on his comment, I think the
following may actually be a bug:

http://github.com/mmcgrana/ring/blob/393ab57106d34f5c632391622f4618fdcbad1a24/ring-servlet/src/ring/util/servlet.clj#L112

There are 2 problems here:
1. (.setCharacterEncoding response "UTF-8") for all responses modifies
even binary responses
2. As it is currently implemented, this occurs for handlers reified as
servlets via ring.util.servlet/servlet, but not for handlers run
directly with ring.adapter.jetty/run-jetty.

I think we should remove this line in the next point release; any objections?

I'm still looking into the broader issue of the content-type /
character-encoding API for Ring responses.

- Mark

Philipp Meier

unread,
Apr 7, 2010, 6:48:56 PM4/7/10
to ring-c...@googlegroups.com
Am 07.04.10 18:35, schrieb Mark McGranaghan:

> Billy makes a good point. Reflecting on his comment, I think the
> following may actually be a bug:
>
> http://github.com/mmcgrana/ring/blob/393ab57106d34f5c632391622f4618fdcbad1a24/ring-servlet/src/ring/util/servlet.clj#L112
>
> There are 2 problems here:
> 1. (.setCharacterEncoding response "UTF-8") for all responses modifies
> even binary responses
> 2. As it is currently implemented, this occurs for handlers reified as
> servlets via ring.util.servlet/servlet, but not for handlers run
> directly with ring.adapter.jetty/run-jetty.
>
> I think we should remove this line in the next point release; any objections?
>
> I'm still looking into the broader issue of the content-type /
> character-encoding API for Ring responses.

I put some brain onto this issue again. Although the Servlet API javadoc
at
http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletResponse.html#setCharacterEncoding(java.lang.String)
does not tell this explicitly I suppose that CharacterEncoding is used
when the response body is sent via the Writer returned by
ServletRespose#getWriter. When using the methods
java.io.OutputStream#write on the ServletOutputStream returned by
ServletResponse#getOutputStream no encoding will occur. For the
print/println methods defined at ServletOutputStream I think character
encoding will occur again.

This is definitively a very massy part of the Servlet API because it
mixes issues of character encoding with general (binary) io. A cleaner
separation as between java.io.OutputStream and java.io.PrintWriter would
have been nice.

As a conclusion I would support a middleware for setting the character
set, however it should correctly handle character set negotiation if
this was signalled in the HTTP request with the "Accept-Charset" Header
as specified at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

I cannot determine for sure how to correctly handle Accept-Charset for a
binary representation of a resource (say image/png) that doesn't support
character sety. I suppose ignoring Accept-Charset will be fine.

As far as I can see ring.middleware.file_info,
ring.util.response/file-response will need some rework too. One must be
able to specify the character set the file on disk is encoded in. In the
end ring must be able to respond with a Content-Type
"text/html;charset=iso-8859-15" while the file on disk is encoded in
"MacRoman".

-billy.

Travis Vachon

unread,
Apr 7, 2010, 7:20:16 PM4/7/10
to ring-c...@googlegroups.com
Definitely very good points. Here's a related discussion from WSGI:

http://www.wsgi.org/wsgi/Specifications/unicode_handling

This is interesting - I'm gonna noodle on this a bit more. Thanks!

t

Sebastián Galkin

unread,
Apr 18, 2010, 3:02:24 AM4/18/10
to Ring
> As a conclusion I would support a middleware for setting the character
> set, however it should correctly handle character set negotiation if
> this was signalled in the HTTP request with the "Accept-Charset" Header
> as specified athttp://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

You may be interesting on this implementation of the Accept*
negotiation for rack:
http://github.com/mjijackson/rack-accept


--
Subscription settings: http://groups.google.com/group/ring-clojure/subscribe?hl=en

Adam Schmideg

unread,
Apr 21, 2010, 2:53:55 PM4/21/10
to Ring
In my experience, you need to call setCharacterEncoding explicitly.
If you set only the content-type to "text/html; charset=utf-8", jetty
will still encode the response to iso-8859-1.
My suggestion is (without a deeper understanding of the issue) to
include all setXXX methods from the Servlet api as :xxx in the ring
response spec.

- adam

> I'd like a little feedback on a couple potential options. The problem
> looks like this:'
>
> If I define a handler function like
>
> (def a (constantly {:status 200 :headers {"Content-Type" "text/html"}
>                              :body "<html></body>hεllo world</body></
> html>"}))
>
> and serve it up using Ring's run-jetty macro, I get a response like:
>
> < HTTP/1.1 200 OK
> < Date: Mon, 05 Apr 2010 15:47:49 GMT
> < Content-Type: text/html; charset=iso-8859-1
> < Content-Length: 47
> < Server: Jetty(6.1.14)
> <
> <html></body>h?llo world</body></html>
>
> This is to say, Jetty (or the servlet api?) doesn't serve requests
> using UTF-8 by default. I can of course just set the Content-Type
> "text/html; charset=utf-8",


James Reeves

unread,
Apr 21, 2010, 3:02:10 PM4/21/10
to ring-c...@googlegroups.com
I'm not sure I like the idea of extending the Ring spec just to get
around problems with specific Java servers.

Why not parse the charset from the content type header and use
setCharacterEncoding on that?

- James

Philipp Meier

unread,
Apr 21, 2010, 3:35:23 PM4/21/10
to Ring


On 21 Apr., 20:53, Adam Schmideg <a...@schmideg.net> wrote:
> In my experience, you need to call setCharacterEncoding explicitly.
> If you set only the content-type to "text/html; charset=utf-8", jetty
> will still encode  the response to iso-8859-1.

That's correct if you use the Writer returned by getWriter(). The
encoding
that you specify with setCharacterEncoding tells the servlet container
how to serialize the Java String literals passed to the Writer into
the binary
(octet) stream that goes over the wire.

As I said before, when you use the OutputStream from getOutputStream
you
responsible for the correct encoding of the Strings or anything
yourself.

> My suggestion is (without a deeper understanding of the issue) to
> include all setXXX methods from the Servlet api as :xxx in the ring
> response spec.

I'd prefer the solution pointed out by James: if there is an encoding
given in
the content-type header then call setCharacterEncoding with it.

-billy.
Reply all
Reply to author
Forward
0 new messages