chinese character in hiccup

145 views
Skip to first unread message

limux

unread,
Aug 10, 2010, 10:32:58 AM8/10/10
to Clojure
hi!

I am doing a real-file demo with ring, compojure, hiccup, and database
access as well.
There is some chinese chararters in the tables. I want to display them
by hiccup, but browser display those chinese character as ???. But the
prn to console is ok. I am confused since I was a newbie in Clojure
and not very familiar with Java yet.
Also, There is extractly not very much people who are using or
familiar with Clojure in china(known as myself). even some of famous
IT web or blog does not have many articles about Clojure to put on. So
there is no any where or any one I can get help.

I uses Windows 7 as the dev platform. Heartly thanks of advices!

Limux,
Regards.

Joop Kiefte

unread,
Aug 10, 2010, 11:23:01 AM8/10/10
to clo...@googlegroups.com
Try using [:meta {:http-equiv "Content-Type" :content "text/html;
charset=utf-8"}] inside your [:head]

(can this be done with Jetty?)

2010/8/10 limux <liumen...@gmail.com>:

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--
Communication is essential. So we need decent tools when communication
is lacking, when language capability is hard to acquire...

- http://esperanto.net  - http://esperanto-jongeren.nl

Linux-user #496644 (http://counter.li.org) - first touch of linux in 2004

Rasmus Svensson

unread,
Aug 10, 2010, 12:23:44 PM8/10/10
to clo...@googlegroups.com
2010/8/10 limux <liumen...@gmail.com>:

> There is some chinese chararters in the tables. I want to display them
> by hiccup, but browser display those chinese character as ???.

I spoke to him on #clojure and from what I could tell from some
experiments I asked him to run:

(map int "刘孟江") -> (21016 23391 27743)
=> no source file encoding issues

(.name (java.nio.charset.Charset/defaultCharset)) -> "GBK"
=> his OS default encoding is GBK ("GBK is an extension of the
GB2312 character set for simplified Chinese characters, used in the
People's Republic of China.")
some libs might erroneously rely on that the OS default
encoding is UTF-8 or something else

(defroutes app (GET "/" [] (java.io.ByteArrayInputStream. (.getBytes
"<html><head><meta http-equiv='Content-Type' content='text/html;
charset=UTF-8'></head><body>刘孟江</body></html>" "UTF-8")))) -> showed
up correctly
=> works, since we do the encoding ourselves

(defroutes app (GET "/" [] "<html><head><meta
http-equiv='Content-Type' content='text/html;
charset=UTF-8'></head><body>刘孟江</body></html>")) -> showed up
correctly
=> ring uses UTF-8 as the default encoding no matter what the OS
default is. a very reasonable behavior, since then the result is
always deterministic.

This leaves me to the conclusion that the error is caused by hiccup
somehow (which he also used), since everything seems to work fine
without it. I might look into this later this evening to see if I can
reproduce the error that occurred for him.

// raek, your encoding wizard

Rasmus Svensson

unread,
Aug 10, 2010, 4:33:29 PM8/10/10
to clo...@googlegroups.com
I looked into the source of hiccup and tried entering the string "刘孟江"
at various places, but I couldn't reproduce the error or find any code
that did any form of encoding.

When playing around with the repl, I was reminded that JLine (used by
lein repl) does not support multibyte encodings (including UTF-8 and
GBK). Could this be the problem, Limux?

// raek

Joop Kiefte

unread,
Aug 10, 2010, 5:36:00 PM8/10/10
to clo...@googlegroups.com
What do you get if you use the meta-line with GBK encoding? if that
gives the right output, it is an input problem (i.e. the input is not
UTF-8).

2010/8/10 Rasmus Svensson <ra...@lysator.liu.se>:

Nebojsa Stricevic

unread,
Aug 11, 2010, 2:41:05 AM8/11/10
to Clojure
Hi,

This looks similar like problem that I had with Clojure + Compojure +
Enlive and Serbian characters. I'm not sure if this is true, but maybe
solution to my problem can be helpful. Read this mailing list thread:
http://tiny.cc/3cmrx

Greets,

--
Nebojša Stričević

limux

unread,
Aug 12, 2010, 9:33:40 AM8/12/10
to Clojure
The solution in http://tiny.cc/3cmrx is useful, thanks.
That what cause the issue should be compojure. That thread's time is
6, June.
and compjure haven't fixed it.

On 8月11日, 下午2时41分, Nebojsa Stricevic <nebojsa.strice...@gmail.com>
wrote:
> Neboj¹a Strièeviæ

James Reeves

unread,
Aug 12, 2010, 10:55:47 AM8/12/10
to clo...@googlegroups.com
On 12 August 2010 14:33, limux <liumen...@gmail.com> wrote:
> The solution in http://tiny.cc/3cmrx is useful, thanks.
> That what cause the issue should be compojure. That thread's time is
> 6, June.
> and compjure haven't fixed it.

The solution you mention is some middleware that sets the content-type
charset header to a specific value.

Has this fixed the issue? I was under the impression from Rasmus's
post that raw strings worked fine, and it was just an issue with
Hiccup.

However, that in itself is odd, as Hiccup only uses raw strings and
the str function to join them together. I believe this should maintain
the correct string encoding. So assuming both str and literal strings
work, Hiccup should work.

I guess we need to determine whether the string itself has the wrong
encoding, or whether an incorrect encoding has been specified in the
content type.

- James

Rasmus Svensson

unread,
Aug 12, 2010, 2:25:08 PM8/12/10
to clo...@googlegroups.com
2010/8/12 James Reeves <jre...@weavejester.com>:
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

From what I can tell, the problem he had was caused by compojure's
default content type "text/html" being replaced by "text/html;
charset=iso8859-1". If he added the charset attribute (with the
middleware proposed in the link) the problem went away.

I asked him to check the page info in firefox to see what content-type
the web server served. Without the middleware, it was "text/html;
charset=iso8859-1" and with it, it was "text/html; charset=utf-8", as
expected. The only value existing in the compojure code is
"text/html", iirc.

It appears that Jetty rewrites any text/html content type it serves
and adds a charset attribute (maybe with the dreaded "OS default
charset" as its value) if there isn't one.

Time for some tests, maybe?

// Rasmus

limux

unread,
Aug 12, 2010, 10:36:46 PM8/12/10
to Clojure
Perhaps Jetty add a charset of iso-8859-1 if there isn't one in
response.
At the same time, in Compojure, it add none of the charset when a
string is rendered. The headers has only a few info exactly as
{"Content-Type" "text/html"}.
So perhaps Jetty will add a old charset iso-8859-1. That's all! Maybe?

Regards.

On 8月13日, 上午2时25分, Rasmus Svensson <r...@lysator.liu.se> wrote:
> 2010/8/12 James Reeves <jree...@weavejester.com>:
>
>
>
>
>
>
>
>
>
> > On 12 August 2010 14:33, limux <liumengji...@gmail.com> wrote:
> >> The solution inhttp://tiny.cc/3cmrxis useful, thanks.

limux

unread,
Aug 12, 2010, 10:50:20 PM8/12/10
to Clojure
All most of the ring sample I can see have a response map as
{"Content-Type" "text/html"} without adding a kind of charset.

I will spend some time to test Rack to see if there is the same issue.
Concretely, to see Rack will add a charset if there is no one in
response.
or It will action like Ring to see those response pass through
directly.
I know Ring looks like Rack.

On 8月13日, 上午10时36分, limux <liumengji...@gmail.com> wrote:
> Perhaps Jetty add a charset of iso-8859-1 if there isn't one in
> response.
> At the same time, in Compojure, it add none of the charset when a
> string is rendered. The headers has only a few info exactly as
> {"Content-Type" "text/html"}.
> So perhaps Jetty will add a old charset iso-8859-1. That's all! Maybe?
>
> Regards.
>
> On 8月13日, 上午2时25分, Rasmus Svensson <r...@lysator.liu.se> wrote:
>
>
>
>
>
>
>
> > 2010/8/12 James Reeves <jree...@weavejester.com>:
>
> > > On 12 August 2010 14:33, limux <liumengji...@gmail.com> wrote:
> > >> The solution inhttp://tiny.cc/3cmrxisuseful, thanks.

ngocdaothanh

unread,
Aug 13, 2010, 5:05:08 AM8/13/10
to Clojure
> Perhaps Jetty add a charset of iso-8859-1 if there isn't one in response

I think this behavior is specified in the servlet spec: "If no charset
is specified, ISO-8859-1 will be used"
http://download.oracle.com/docs/cd/E17802_01/products/products/servlet/2.5/docs/servlet-2_5-mr2/javax/servlet/ServletResponse.html

limux

unread,
Aug 13, 2010, 6:31:20 AM8/13/10
to Clojure
Then, if Ring doesn't care of the charset also, there is no one, suck
as jetty, ring, compojure would take care of the charset except
myself. There are many peopole come from all kinds of country or
region who use ring, jetty and compojure. Let themselves set the right
charset manually by wrap? I don't think that is a good idea.

On 8月13日, 下午5时05分, ngocdaothanh <ngocdaoth...@gmail.com> wrote:
> > Perhaps Jetty add a charset of iso-8859-1 if there isn't one in response
>
> I think this behavior is specified in the servlet spec: "If no charset
> is specified, ISO-8859-1 will be used"http://download.oracle.com/docs/cd/E17802_01/products/products/servle...

limux

unread,
Aug 13, 2010, 6:38:23 AM8/13/10
to Clojure
One word, Why we would let Jetty set default charset to iso-8859-1,
Why not Compojure can set the default charset to utf-8? Isn't utf8 a
better choice? Is iso-8859 better than utf-8?

On 8月13日, 下午5时05分, ngocdaothanh <ngocdaoth...@gmail.com> wrote:
> > Perhaps Jetty add a charset of iso-8859-1 if there isn't one in response
>
> I think this behavior is specified in the servlet spec: "If no charset
> is specified, ISO-8859-1 will be used"http://download.oracle.com/docs/cd/E17802_01/products/products/servle...

James Reeves

unread,
Aug 13, 2010, 6:40:24 AM8/13/10
to clo...@googlegroups.com
2010/8/13 limux <liumen...@gmail.com>:

> Then, if Ring doesn't care of the charset also, there is no one, suck
> as jetty, ring, compojure would take care of the charset except
> myself. There are many peopole come from all kinds of country or
> region who use ring, jetty and compojure. Let themselves set the right
> charset manually by wrap? I don't think that is a good idea.

Well, the charset could potentially be set to the default encoding of
the JVM, but that might produce inconsistent results. If you develop
of a JVM with a default encoding of X, but your production machine has
a default encoding of Y, you'll run into problems.

Another option is to have a default charset, such as UTF-8. I don't
think Ring should have a default charset, because it's too "low
level". But Compojure could be set up with a default charset. However,
this won't help people who, say, use Shift-JIS.

I think it would be worth adding some charset setting middleware to
Ring, though, and perhaps document this behaviour. Github has new
Wikis that I'd like to try out :)

- James

limux

unread,
Aug 13, 2010, 7:49:26 AM8/13/10
to Clojure
You are right that there should be such a middleware in Ring.

On 8月13日, 下午6时40分, James Reeves <jree...@weavejester.com> wrote:
> 2010/8/13 limux <liumengji...@gmail.com>:

Steve Purcell

unread,
Aug 13, 2010, 8:11:39 AM8/13/10
to clo...@googlegroups.com
On 13 Aug 2010, at 11:40, James Reeves wrote:

> I think it would be worth adding some charset setting middleware to
> Ring, though, and perhaps document this behaviour.


+1 -- character encoding is exactly the kind of thing one would want to set up application-wide.

-Steve

Reply all
Reply to author
Forward
0 new messages