Returning UTF-8 encoded Japanese displayed raw unicode.

476 views
Skip to first unread message

Danny Trieu

unread,
Jan 31, 2014, 1:04:25 AM1/31/14
to spray...@googlegroups.com
Hi All,

I am new to Spray, and not very family unicode and UTF-8 encoding. I have GET | PUT request submitted text in Japanese language, that has charset=UTF-8. After receiving text, I replace some of the Japanese words with ascii char '*' and return the text as JSON string. The returned JSON string, when displayed on the browser it displayed as raw unicode like this \u0646\u0635\u064a\u060c 
My chrome default encoding is UTF-8. 

Any help will be appreciated....

Thanks,
--danny

Danny Trieu

unread,
Jan 31, 2014, 2:41:08 AM1/31/14
to spray...@googlegroups.com
I did a simple test that returns the submitted Japanese text and the response text displayed Japanese characters properly.

However, after my router received the submitted Japanese text, the router(actor) make and ask call to another actor(for processing) by wrapping the text inside a message(a case class). The wrapped arrived the processing actor correctly and I can print to console its Japanese text. But when I return the Japanese text in side a response message(a case class) which the router the return to the client browser as JSON string. When the JSON string get to the browser, the JSON string when displayed nolonger can I see the proper Japanese text. All I see is the unicode representations.

Can someone point out what when wrong?

Thanks,

--danny

Martijn Hoekstra

unread,
Jan 31, 2014, 3:55:29 AM1/31/14
to spray...@googlegroups.com


On Jan 31, 2014 8:41 AM, "Danny Trieu" <trieu...@gmail.com> wrote:
>
> I did a simple test that returns the submitted Japanese text and the response text displayed Japanese characters properly.
>
> However, after my router received the submitted Japanese text, the router(actor) make and ask call to another actor(for processing) by wrapping the text inside a message(a case class). The wrapped arrived the processing actor correctly and I can print to console its Japanese text. But when I return the Japanese text in side a response message(a case class) which the router the return to the client browser as JSON string. When the JSON string get to the browser, the JSON string when displayed nolonger can I see the proper Japanese text. All I see is the unicode representations.
>
> Can someone point out what when wrong?
>
> Thanks,
>
> --danny

According to the spec, any character in a string may be escaped in json notation, so according to the spec this behavior is allowed. It seems currently spray json encodes more than is demanded by the spec. I'm not sure why it does this, possibly to preempt faulty parsers consuming the json, or maybe to preempt faulty encoding later on: any character output by spray json is in the ascii low set.

It does make the output larger and less easily human readable. Spray folks, is this behavior still desired?


>
>
> On Thursday, January 30, 2014 10:04:25 PM UTC-8, Danny Trieu wrote:
>>
>> Hi All,
>>
>> I am new to Spray, and not very family unicode and UTF-8 encoding. I have GET | PUT request submitted text in Japanese language, that has charset=UTF-8. After receiving text, I replace some of the Japanese words with ascii char '*' and return the text as JSON string. The returned JSON string, when displayed on the browser it displayed as raw unicode like this \u0646\u0635\u064a\u060c 
>> My chrome default encoding is UTF-8. 
>>
>> Any help will be appreciated....
>>
>> Thanks,
>> --danny
>

> --
> You received this message because you are subscribed to the Google Groups "spray.io User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to spray-user+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/spray-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/spray-user/c4e20183-140f-4fb2-821e-ebfadc5d3bf9%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

Johannes Rudolph

unread,
Feb 3, 2014, 3:46:17 AM2/3/14
to spray...@googlegroups.com
FYI: Martin created a PR to change the default encoding behavior:

https://github.com/spray/spray-json/pull/83

Thanks for that!

IMO as the encoding for JSON is fixed to UTF-8 by the spec it makes
sense to output allowed unicode characters as UTF-8 and only \u encode
what needs to be encoded by the spec.

Johannes
> https://groups.google.com/d/msgid/spray-user/CAN8A5NOGpbBxyid1dib2LC94erFnEPtvPH18rQ3JzHrVUaXCZg%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.



--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Martijn Hoekstra

unread,
Feb 3, 2014, 3:58:02 AM2/3/14
to spray...@googlegroups.com


On Feb 3, 2014 9:46 AM, "Johannes Rudolph" <johannes...@googlemail.com> wrote:
>
> FYI: Martin created a PR to change the default encoding behavior:
>
> https://github.com/spray/spray-json/pull/83
>
> Thanks for that!
>
> IMO as the encoding for JSON is fixed to UTF-8 by the spec it makes
> sense to output allowed unicode characters as UTF-8 and only \u encode
> what needs to be encoded by the spec.
>
> Johannes

The spec is actually quite odd in regards to encoding. It defaults to UTF-8, but only for the basic multilingual plain. It uses surrogate pairs for anything beyond that, even though UTF 8 is perfectly capable of encoding those codepoints. It works well for the jvm though, as characters beyond the basic plane aren't supported either, so it makes for a nice 1-1 mapping without having to split those characters into surrogate pairs in the implementation.

> To view this discussion on the web visit https://groups.google.com/d/msgid/spray-user/CAKF7HnduYUY8s0e2vexFxqBD%3DD3gy5Xa1onGQJqGjOXvivC6Ag%40mail.gmail.com.

Johannes Rudolph

unread,
Feb 3, 2014, 4:32:19 AM2/3/14
to spray...@googlegroups.com
On Mon, Feb 3, 2014 at 9:58 AM, Martijn Hoekstra
<martijn...@gmail.com> wrote:
> The spec is actually quite odd in regards to encoding. It defaults to UTF-8,
> but only for the basic multilingual plain. It uses surrogate pairs for
> anything beyond that, even though UTF 8 is perfectly capable of encoding
> those codepoints. It works well for the jvm though, as characters beyond the
> basic plane aren't supported either, so it makes for a nice 1-1 mapping
> without having to split those characters into surrogate pairs in the
> implementation.

Thanks for that clarification. Yes, that's really fortunate.

Danny Trieu

unread,
Feb 3, 2014, 1:56:59 PM2/3/14
to spray...@googlegroups.com
Thanks Johannes,
I override printString in JsonPrinter to exclude the unicode encoding for now and it works.

--danny



--
You received this message because you are subscribed to a topic in the Google Groups "spray.io User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spray-user/5f8eyFln4_o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spray-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages