Converting escaped unicode to utf8

6,603 views
Skip to first unread message

JohnGB

unread,
Dec 24, 2016, 7:54:48 PM12/24/16
to golang-nuts
I have an application where I am processing a HTTP request with a JSON body.  However, the JSON in the body has been unicode escaped, so instead of `wasn't`, I get `wasn\u0027t`.

What is the simplest way to unescape this text back to utf8 encoded text.  i.e. convert `wasn\u0027t` to `wasn't`.  Please note that I'm only using this as an example, but I'd like all unicode escaped characters to be converted to their utf8 equivalents.

Matt Harden

unread,
Dec 25, 2016, 12:14:22 AM12/25/16
to JohnGB, golang-nuts
If it has a JSON body, are you using encoding/json to parse / decode it? That will handle the unescaping for you.

On Sat, Dec 24, 2016 at 4:54 PM JohnGB <jgbe...@gmail.com> wrote:
I have an application where I am processing a HTTP request with a JSON body.  However, the JSON in the body has been unicode escaped, so instead of `wasn't`, I get `wasn\u0027t`.

What is the simplest way to unescape this text back to utf8 encoded text.  i.e. convert `wasn\u0027t` to `wasn't`.  Please note that I'm only using this as an example, but I'd like all unicode escaped characters to be converted to their utf8 equivalents.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JohnGB

unread,
Dec 26, 2016, 2:23:02 AM12/26/16
to golang-nuts, jgbe...@gmail.com
It *should* have a JSON body, but in this particular case I can't be sure that it will, as I'm trying to log requests when there are errors, which are quite often errors in the request.  So although in some cases it is decoded later in the handler chain, I need to be able to log the string in a human readable form before any possible decoding.

If it's unescaped as part of json.Unmarshal(), is there an exposed function in the encoding/json package which only unescapes a string?  I've tried looking, but I can't find anything that I can call directly.

roger peppe

unread,
Dec 26, 2016, 11:32:56 AM12/26/16
to JohnGB, golang-nuts
On 26 December 2016 at 07:23, JohnGB <jgbe...@gmail.com> wrote:
> It *should* have a JSON body, but in this particular case I can't be sure
> that it will, as I'm trying to log requests when there are errors, which are
> quite often errors in the request. So although in some cases it is decoded
> later in the handler chain, I need to be able to log the string in a human
> readable form before any possible decoding.
>
> If it's unescaped as part of json.Unmarshal(), is there an exposed function
> in the encoding/json package which only unescapes a string? I've tried
> looking, but I can't find anything that I can call directly.

If you've got a JSON string, then unmarshaling into a string should work.

https://play.golang.org/p/AThEIUZnX4

John Beckett

unread,
Dec 26, 2016, 11:41:15 AM12/26/16
to roger peppe, golang-nuts
Thanks Roger.  That is a really elegant solution.

C Banning

unread,
Dec 26, 2016, 1:34:16 PM12/26/16
to golang-nuts, rogp...@gmail.com
If you're processing anonymous messages and don't know where unicode will occur you might want something like this: https://play.golang.org/p/M-21sy_en5

JohnGB

unread,
Dec 28, 2016, 8:25:49 AM12/28/16
to golang-nuts, rogp...@gmail.com
That look like a nice solution.  I'll have to benchmark the methods and see which one wins out in the end.

JohnGB

unread,
Dec 28, 2016, 9:00:44 AM12/28/16
to golang-nuts, jgbe...@gmail.com
I tried this with a variety of messages, and it doesn't work with most unicode escaped string.  For example, the unicode escaped string '\u044d\u0442\u043e \u0442\u0435\u0441\u0442 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435' should unescape to 'это тест сообщение' but it unescapes to 'MB> B5AB A>>1I5=85' instead.

JohnGB

unread,
Dec 28, 2016, 9:02:52 AM12/28/16
to golang-nuts, jgbe...@gmail.com
My response should have read that it produces an error of "invalid character 'Ñ' looking for beginning of value".  I got the responses from the two methods mixed up.


On Monday, 26 December 2016 17:32:56 UTC+1, rog wrote:

JohnGB

unread,
Dec 28, 2016, 9:03:21 AM12/28/16
to golang-nuts, rogp...@gmail.com
I tried this with a variety of messages, and it doesn't work with most unicode escaped string.  For example, the unicode escaped string '\u044d\u0442\u043e \u0442\u0435\u0441\u0442 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435' should unescape to 'это тест сообщение' but it unescapes to 'MB> B5AB A>>1I5=85' instead.

On Monday, 26 December 2016 19:34:16 UTC+1, C Banning wrote:

Jakob Borg

unread,
Dec 28, 2016, 6:44:07 PM12/28/16
to JohnGB, golang-nuts
Both json.Unmarshal and strconv.Unquote seem to handle this fine:

https://play.golang.org/p/mexmJNGQyh

//jb

Ben Song

unread,
Feb 2, 2023, 2:53:51 PM2/2/23
to golang-nuts
strconv.Unquote returns invalid syntax if the string contains `"`. It is better to quote string before Unquote.

Reply all
Reply to author
Forward
0 new messages