mdash output as question mark

306 views
Skip to first unread message

David Smith

unread,
Dec 22, 2013, 6:38:56 PM12/22/13
to enliv...@googlegroups.com
I am transforming an h1 tag's content to separate the user's Organization from the Name with an mdash (U+2014), and the Name from the page name with a right double angle bracket (U+00BB). While the double angle brackets display correctly, the mdash always displays as a question mark (a plain question mark, not the odd "I don't know how to display that unicode character" question mark).

To narrow-down the variables, I've done some testing by using David Nolen's tutorial "template1" and inserting various unicode characters into the "change" message thusly:


   ["change"] (fn [req] (render-to-response
                         (index {:message "We changed \u2014 the message! \u00bb"})))

In this example, the mdash is also rendered a ? and the double right angle brackets display correctly. I've tested a variety of other characters, some of which display correctly and others display as ?. This is with a variety of browsers (the current Chrome and Firefox on Debian and Chrome and Safari on OS-X, Chrome and IE11 on Win7).

Our application is running on debian squeeze with Java 1.7.0_17, and the tutorial test was run on debian wheezy with Java 1.7.0_13.

Has anyone else seen this behavior? Am I missing something obvious?

Thanks for any help.

David Smith

Linus Ericsson

unread,
Dec 24, 2013, 12:38:44 PM12/24/13
to enliv...@googlegroups.com
The best way to do get non-ascii characters in html is probably to use &#nnnn; as noted here:

http://www.ascii.cl/htmlcodes.htm


if you really want to use strange characters:

Can you somehow assert your jvm is using UTF-8 as its default charset? What do (Charset/defaultCharset) say?

Do your html-document contains <meta charset="utf-8">?

Can you just try to make the string to (str "we changed" (char 0x2014) "the message" (char 0x00bb))? Clojure *should* read the string correctly, but the &....; is better.

/Linus
--
You received this message because you are subscribed to the Google Groups "Enlive" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enlive-clj+...@googlegroups.com.
To post to this group, send email to enliv...@googlegroups.com.
Visit this group at http://groups.google.com/group/enlive-clj.
For more options, visit https://groups.google.com/groups/opt_out.

David Smith

unread,
Dec 25, 2013, 9:15:36 AM12/25/13
to enliv...@googlegroups.com
Thank you, Linus, for your kind response.

When I began using enlive I used html entities as you suggest, but Christophe (in response to my February 14th question to this group) explained "Text nodes in Enlive are handled as their values and not as their representations. It means that entities are replaced by their corresponding unicode characters." So, it's Unicode or nothing.

I'm sort of a front-end guy, so directly querying my JVM is a bit above my pay grade - is this repl interaction useful in testing the JVM character set?

user=> (get (System/getProperties) "file.encoding")
"UTF-8"
user=> (println "\u2014")
nil
user=> (println "\u00bb")
»

In addition, I've added "export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8" to my .bash_profile.

By the way, I've also tried using the literal em-dash (—) instead of \u2014 in our code, but the "?" is still what ends up being sent, both with our own stack and with the tutorial's stack.

For now, I suppose I'll suppress my typographic prejudices and use a hyphen rather than either the literal em-dash (—) or the unicode \u2014 in our code and just remain puzzled by the inconsistent rendering of the unicode characters.

David


--
You received this message because you are subscribed to a topic in the Google Groups "Enlive" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/enlive-clj/sa8Kvp_Nk9Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to enlive-clj+...@googlegroups.com.

Linus Ericsson

unread,
Dec 25, 2013, 4:47:56 PM12/25/13
to enliv...@googlegroups.com
I made a very simple example, see [1]. The example converts a simple  template straight to a HTML-file, and when I open that file it in Firefox (Mac OS X) the encoding works as expected. It did not work without the <meta charset="utf-8">-tag in my browser.

Can you reproduce this?

If not, maybe your web server (tomcat? jetty?) messes things up. The web server can send an http response header [2] with a charset, maybe it overrides the document-specified encoding?

The response headers can be shown with the Firefox add-on HTTPfox [3] (I have not tested this particular tool myself, though).

/Linus


2013/12/25 David Smith <acm...@gmail.com>

David Smith

unread,
Dec 26, 2013, 2:05:41 PM12/26/13
to enliv...@googlegroups.com
Linus - 

That was it - nginx is new to me/us and it turns out that it's sending charset=ISO-8859-1.

Thanks so much!

Meikel Brandmeyer

unread,
Dec 25, 2013, 9:52:43 AM12/25/13
to enliv...@googlegroups.com
Hi,

Am 25.12.2013 15:15, schrieb David Smith:

> For now, I suppose I'll suppress my typographic prejudices and use a
> hyphen rather than either the literal em-dash (—) or the unicode \u2014
> in our code and just remain puzzled by the inconsistent rendering of the
> unicode characters.

How do you query the the string which shows as "?"? Is it in the server
response? Does the server correctly declare the response as
"Content-Type: text/html;charset=utf-8"?

Kind regards
Meikel

--
Meikel Brandmeyer
Clojure Trainings
Kastellstraße 3
63526 Erlensee
http://kotka.de
USt.-Id: DE 285 667 417

signature.asc

David Smith

unread,
Dec 27, 2013, 10:11:40 AM12/27/13
to enliv...@googlegroups.com
Meikel - 

Thanks - as it happens, our html templates have the <meta charset=utf-8> tag and our server (nginx 1.4.2) is configured to deliver utf-8, but its response headers are defining the content-type as ISO-8859-1.

I guess we now look at friend/ring/jetty as the source of this oddity.

David
Reply all
Reply to author
Forward
0 new messages