JSON invalid character

8,326 views
Skip to first unread message

Nico

unread,
Aug 29, 2011, 10:26:29 AM8/29/11
to golang-nuts
Hi there,

I just discovered Go and decided to port a little program to Go.

The program reads JSON-Data from an URL and process the Data. The Go
port works well till now.

I dont have any influence on the JSON data and so sometimes there are
control character in it and my program crashes with "invalid character
'\x12' in string literal"

here the code sample of my program:

http_return, err := http.Get(newurl)
var http_body []byte;
if err == nil {
http_body, err = ioutil.ReadAll(http_return.Body)
http_return.Body.Close()
}
param_info := make(map[string]interface{})
param_err := json.Unmarshal(http_body, &param_info)

This Unmarshal call crashes with "invalid character '\x12' in string
literal"

How do I remove this or any similar control character from the JSON
data. I dont need those characters so they can simply be removed.

Thanks
Nico

Kyle Lemons

unread,
Aug 29, 2011, 1:56:03 PM8/29/11
to Nico, golang-nuts
If you don't mind them being replaced with spaces, you could do something like:

for i, ch := range http_body {
  switch {
  case ch > '~':   http_body[i] = ' '
  case ch == '\r':
  case ch == '\n':
  case ch == '\t':
  case ch < ' ':   http_body[i] = ' '
  }
}

~K

Richard Crowley

unread,
Aug 29, 2011, 2:10:57 PM8/29/11
to Nico, golang-nuts
On Mon, Aug 29, 2011 at 7:26 AM, Nico <n4...@hotmail.com> wrote:

The JSON format requires control characters to be encoded as Unicode
code points so replacing the byte 0x12 in your input with the string
"\u0012" (that's a literal backslash, so in code it'd be "\\u0012")
would create valid JSON that means the same thing. See the goofy
syntax diagrams on http://json.org/ for a more complete explanation.

Nico

unread,
Aug 30, 2011, 5:19:59 AM8/30/11
to golang-nuts
thanks, seems to work like i want.

Nico

On Aug 29, 7:56 pm, Kyle Lemons <kev...@google.com> wrote:
> If you don't mind them being replaced with spaces, you could do something
> like:
>
> for i, ch := range http_body {
>   switch {
>   case ch > '~':   http_body[i] = ' '
>   case ch == '\r':
>   case ch == '\n':
>   case ch == '\t':
>   case ch < ' ':   http_body[i] = ' '
>   }
>
> }
>
> ~K
>

Nico

unread,
Aug 30, 2011, 5:29:24 AM8/30/11
to golang-nuts
if I try to check for '\x12' I get compiler errors like:

missing '
syntax error: unexpected name, expecting := or = or : or comma
missing '
newline in string
empty character literal or unescaped ' in character literal
missing '

if I use "case ch == '\\x12' : http_body[i] = '\\u0012'"

or

cannot convert "\\x12" to type uint8
invalid operation: ch == "\\x12" (mismatched types uint8 and string)
cannot use "\\u0012" (type string) as type uint8 in assignment

if I use "case ch == "\\x12" : http_body[i] = "\\u0012"

or

missing '
illegal character 0x12
missing '

if I use "case ch == '\\\x12' : http_body[i] = '\\\u0012'"


Nico


On Aug 29, 8:10 pm, Richard Crowley <r...@rcrowley.org> wrote:

Ian Lance Taylor

unread,
Aug 30, 2011, 9:27:30 AM8/30/11
to Nico, golang-nuts
Nico <n4...@hotmail.com> writes:

> if I try to check for '\x12' I get compiler errors like:
>
> missing '
> syntax error: unexpected name, expecting := or = or : or comma
> missing '
> newline in string
> empty character literal or unescaped ' in character literal
> missing '

Show us your code.

Ian

peterGo

unread,
Aug 30, 2011, 9:38:59 AM8/30/11
to golang-nuts
Nico,

I read the suggested escape mechanism to be something like: "Hello,\t世界
\n\x12" is escaped to "Hello,\u0009世界\u000a\u0012". For example,

package main

import (
"fmt"
"encoding/hex"
)

func EscapeCtrl(ctrl []byte) (esc []byte) {
u := []byte(`\u0000`)
for i, ch := range ctrl {
if ch <= 31 {
if esc == nil {
esc = append(make([]byte, 0, len(ctrl)+len(u)), ctrl[:i]...)
}
esc = append(esc, u...)
hex.Encode(esc[len(esc)-2:], ctrl[i:i+1])
continue
}
if esc != nil {
esc = append(esc, ch)
}
}
if esc == nil {
return ctrl
}
return esc
}

func main() {
body := []byte("Hello,\t世界\n\x12")
fmt.Println(body, string(body))
body = EscapeCtrl(body)
fmt.Println(body, string(body))
body = []byte("Hello, 世界")
fmt.Println(body, string(body))
body = EscapeCtrl(body)
fmt.Println(body, string(body))
}

Peter

Nico

unread,
Aug 31, 2011, 8:07:47 AM8/31/11
to golang-nuts
thanks, works. Have to understand what exactly you ar doing.

Looks like a complicated version of Kyle Lemons suggestion

Kyle Lemons

unread,
Sep 1, 2011, 6:08:01 PM9/1/11
to Nico, golang-nuts
My suggestion just replaces all nonprinting characters (except tab, newline, and carriage return) with spaces.  If you actually need to have the improperly escaped bytes, you'll have to use something more complicated, like what peter posted.
~K

2011/8/31 Nico <n4...@hotmail.com>
Reply all
Reply to author
Forward
0 new messages