code review 8926046: encoding/json: document that marshaling invalid utf-8 s... (issue 8926046)

97 views
Skip to first unread message

minu...@gmail.com

unread,
Apr 29, 2013, 11:19:48 AM4/29/13
to golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Reviewers: golang-dev1,

Message:
Hello golan...@googlegroups.com (cc: golan...@googlegroups.com),

I'd like you to review this change to
https://code.google.com/p/go/


Description:
encoding/json: document that marshaling invalid utf-8 sequence will
return error
Fixes issue 5360.

Please review this at https://codereview.appspot.com/8926046/

Affected files:
M src/pkg/encoding/json/encode.go


Index: src/pkg/encoding/json/encode.go
===================================================================
--- a/src/pkg/encoding/json/encode.go
+++ b/src/pkg/encoding/json/encode.go
@@ -39,8 +39,8 @@
//
// Floating point, integer, and Number values encode as JSON numbers.
//
-// String values encode as JSON strings, with each invalid UTF-8 sequence
-// replaced by the encoding of the Unicode replacement character U+FFFD.
+// String values encode as JSON strings, InvalidUTF8Error will be returned
+// if invalid UTF-8 sequence is encountered.
// The angle brackets "<" and ">" are escaped to "\u003c" and "\u003e"
// to keep some browsers from misinterpreting JSON output as HTML.
//


a...@golang.org

unread,
Apr 29, 2013, 12:05:40 PM4/29/13
to minu...@gmail.com, golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

https://codereview.appspot.com/8926046/diff/4001/src/pkg/encoding/json/encode.go
File src/pkg/encoding/json/encode.go (right):

https://codereview.appspot.com/8926046/diff/4001/src/pkg/encoding/json/encode.go#newcode43
src/pkg/encoding/json/encode.go:43: // if invalid UTF-8 sequence is
encountered.
s/if invalid/if an invalid/

https://codereview.appspot.com/8926046/

minu...@gmail.com

unread,
Apr 29, 2013, 12:22:11 PM4/29/13
to golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com

minu...@gmail.com

unread,
Apr 29, 2013, 12:23:17 PM4/29/13
to golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Please wait a moment, i will take the chance to document
InvalidUTF8Error.

https://codereview.appspot.com/8926046/

minu...@gmail.com

unread,
Apr 29, 2013, 12:29:46 PM4/29/13
to golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
PTAL.

Now also added docs for InvalidUTF8Error.

https://codereview.appspot.com/8926046/

r...@golang.org

unread,
Apr 29, 2013, 6:17:00 PM4/29/13
to minu...@gmail.com, golan...@googlegroups.com, a...@golang.org, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
LGTM


https://codereview.appspot.com/8926046/diff/16001/src/pkg/encoding/json/encode.go
File src/pkg/encoding/json/encode.go (right):

https://codereview.appspot.com/8926046/diff/16001/src/pkg/encoding/json/encode.go#newcode42
src/pkg/encoding/json/encode.go:42: // String values encode as JSON
strings, InvalidUTF8Error will be returned
s/,/./

https://codereview.appspot.com/8926046/

minu...@gmail.com

unread,
Apr 29, 2013, 11:22:02 PM4/29/13
to minu...@gmail.com, golan...@googlegroups.com, a...@golang.org, r...@golang.org, re...@codereview-hr.appspotmail.com
*** Submitted as
https://code.google.com/p/go/source/detail?r=6aab8f011070 ***

encoding/json: document that marshaling invalid utf-8 sequence will
return error
Also added docs for InvalidUTF8Error.
Fixes issue 5360.

R=golang-dev, adg, r
CC=golang-dev
https://codereview.appspot.com/8926046


https://codereview.appspot.com/8926046/

minux

unread,
May 1, 2013, 2:31:25 PM5/1/13
to tyl...@gmail.com, golan...@googlegroups.com, a...@golang.org, r...@golang.org, re...@codereview-hr.appspotmail.com

On Wed, May 1, 2013 at 10:38 PM, <tyl...@gmail.com> wrote:
Is there anything in the spec that actually says which way this should behave? 
AFAIK, no. the RFC only states that utf-8 is the default encoding, but any unicode encoding
is acceptable if i read correctly. however, it do say that strings should be comprised of valid
unicode characters.

As an application developer, the (originally) documented behavior is much less annoying - an invalid UTF rune isn't really worth barfing over. Especially annoying since there doesn't appear to be any handy stdlib to clean a UTF-8 string.
i think for marshaling this behavior is more useful that silently corrupting the string (potentially
binary data) with the unicode replacement character.

Russ Cox

unread,
May 1, 2013, 3:18:20 PM5/1/13
to minux, tyl...@gmail.com, golang-dev, Andrew Gerrand, Rob Pike, re...@codereview-hr.appspotmail.com
It's too late to change this behavior. We're just clarifying the docs.

Reply all
Reply to author
Forward
0 new messages