encoding/json character escaping question

已查看 371 次
跳至第一个未读帖子

Poussier William

未读,
2019年10月3日 11:11:312019/10/3
收件人 golang-nuts
Hello

The encoding/json package escapes 0xA (line feed), 0xD (carriage return) and 0x9 (horizontal tab) using the escape character '\'. However, when it comes to 0x8 (backspace) and 0xc (form feed), it uses the Unicode escape sequence staring with '\uXXXX'.


I can't really grasp the reason behind this difference for characters < 0x20, even tho it is perfectly valid JSON, I expected to see \f and \b.

Does anyone know the reason, if there is one that lead to this?

Thanks

David Finkel

未读,
2019年10月5日 14:48:112019/10/5
收件人 Poussier William、golang-nuts
It looks like only a few of the RFC 8259 sec 7 special two-byte escapes are supported:

Digging around the CLs linked from blame entries in that code-block, I found this comment from rsc@ on the CL that added handling for \r and \n:
\r and \n is good.
let's leave \b and \f out.
no one cares about \f
and more people know \b as
word boundary than as backspace.



Note that using two-letter substitutions are optional according to the RFC. (the relevant section):
Alternatively, there are two-character sequence escape
   representations of some popular characters.  So, for example, a
   string containing only a single reverse solidus character may be
   represented more compactly as "\\".
 To escape an extended character that is not in the Basic Multilingual
   Plane, the character is represented as a 12-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented as
   "\uD834\uDD1E".

      string = quotation-mark *char quotation-mark

      char = unescaped /
          escape (
              %x22 /          ; "    quotation mark  U+0022
              %x5C /          ; \    reverse solidus U+005C
              %x2F /          ; /    solidus         U+002F
              %x62 /          ; b    backspace       U+0008
              %x66 /          ; f    form feed       U+000C
              %x6E /          ; n    line feed       U+000A
              %x72 /          ; r    carriage return U+000D
              %x74 /          ; t    tab             U+0009
              %x75 4HEXDIG )  ; uXXXX                U+XXXX

      escape = %x5C              ; \

      quotation-mark = %x22      ; "

      unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

On the other hand, it looks like on the decoding-side, the full complement are supported: https://github.com/golang/go/blob/b17fd8e49d24eb298c53de5cd0a8923f1e0270ba/src/encoding/json/decode.go#L1284-L1316

 

Thanks

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f3c65b8c-c612-4b75-852a-fda7b246a77e%40googlegroups.com.

Poussier William

未读,
2019年10月5日 19:05:562019/10/5
收件人 golang-nuts
Thank you David for the detailled answer.

Le samedi 5 octobre 2019 20:48:11 UTC+2, David Finkel a écrit :


To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

nanmu42

未读,
2019年10月6日 00:37:282019/10/6
收件人 golang-nuts
You can disable this behavior, this link may help: https://go-review.googlesource.com/c/go/+/21796/
回复全部
回复作者
转发
0 个新帖子