encoding/json character escaping question

Poussier William

unread,

Oct 3, 2019, 11:11:31 AM10/3/19

to golang-nuts

Hello

The encoding/json package escapes 0xA (line feed), 0xD (carriage return) and 0x9 (horizontal tab) using the escape character '\'. However, when it comes to 0x8 (backspace) and 0xc (form feed), it uses the Unicode escape sequence staring with '\uXXXX'.

Reproducer: https://play.golang.org/p/jihv9sZUjvY

I can't really grasp the reason behind this difference for characters < 0x20, even tho it is perfectly valid JSON, I expected to see \f and \b.

Does anyone know the reason, if there is one that lead to this?

Thanks

David Finkel

unread,

Oct 5, 2019, 2:48:11 PM10/5/19

to Poussier William, golang-nuts

It looks like only a few of the RFC 8259 sec 7 special two-byte escapes are supported:

https://github.com/golang/go/blob/go1.13.1/src/encoding/json/encode.go#L975-L994

Digging around the CLs linked from blame entries in that code-block, I found this comment from rsc@ on the CL that added handling for \r and \n:

\r and \n is good.
let's leave \b and \f out.
no one cares about \f
and more people know \b as
word boundary than as backspace.

Note that using two-letter substitutions are optional according to the RFC. (the relevant section):

Alternatively, there are two-character sequence escape
   representations of some popular characters.  So, for example, a
   string containing only a single reverse solidus character may be
   represented more compactly as "\\".

 To escape an extended character that is not in the Basic Multilingual
   Plane, the character is represented as a 12-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented as
   "\uD834\uDD1E".

      string = quotation-mark *char quotation-mark

      char = unescaped /
          escape (
              %x22 /          ; "    quotation mark  U+0022
              %x5C /          ; \    reverse solidus U+005C
              %x2F /          ; /    solidus         U+002F
              %x62 /          ; b    backspace       U+0008
              %x66 /          ; f    form feed       U+000C
              %x6E /          ; n    line feed       U+000A
              %x72 /          ; r    carriage return U+000D
              %x74 /          ; t    tab             U+0009
              %x75 4HEXDIG )  ; uXXXX                U+XXXX

      escape = %x5C              ; \

      quotation-mark = %x22      ; "

      unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

On the other hand, it looks like on the decoding-side, the full complement are supported: https://github.com/golang/go/blob/b17fd8e49d24eb298c53de5cd0a8923f1e0270ba/src/encoding/json/decode.go#L1284-L1316

Thanks

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f3c65b8c-c612-4b75-852a-fda7b246a77e%40googlegroups.com.

Poussier William

unread,

Oct 5, 2019, 7:05:56 PM10/5/19

to golang-nuts

Thank you David for the detailled answer.

Le samedi 5 octobre 2019 20:48:11 UTC+2, David Finkel a écrit :

To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

nanmu42

unread,

Oct 6, 2019, 12:37:28 AM10/6/19

to golang-nuts

You can disable this behavior, this link may help: https://go-review.googlesource.com/c/go/+/21796/

Reply all

Reply to author

Forward