JSON UTF8 and Unmarshal

1,720 views
Skip to first unread message

Johann Höchtl

unread,
Feb 10, 2013, 6:06:09 AM2/10/13
to golan...@googlegroups.com
JSON is required to be in utf8 per the spec. Do I have any chances to check after Unmarshal, if eg. a stringcontained invalid utf8 - characters?

http://play.golang.org/p/NW4b3LLMPy


Patrick Mylund Nielsen

unread,
Feb 10, 2013, 6:09:32 AM2/10/13
to Johann Höchtl, golang-nuts
Check before and reject it if it's not UTF8 like you're doing? If some of it's bad, you probably don't want to accept a payload from a bad encoder anyway.


On Sun, Feb 10, 2013 at 12:06 PM, Johann Höchtl <johann....@gmail.com> wrote:
JSON is required to be in utf8 per the spec. Do I have any chances to check after Unmarshal, if eg. a stringcontained invalid utf8 - characters?

http://play.golang.org/p/NW4b3LLMPy



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Johann Höchtl

unread,
Feb 10, 2013, 6:19:13 AM2/10/13
to Patrick Mylund Nielsen, golang-nuts
On 02/10/2013 12:09 PM, Patrick Mylund Nielsen wrote:
> Check before and reject it if it's not UTF8 like you're doing? If some
> of it's bad, you probably don't want to accept a payload from a bad
> encoder anyway.
>
>
Of course that's possible, but what does the byte sequence

[]byte{0xef, 0xbf, 0xbd}

mean? That's what I get from json.Unmarshal for the latin-1 encoded "Ö"
(= \xD6 )

> On Sun, Feb 10, 2013 at 12:06 PM, Johann Höchtl
> <johann....@gmail.com <mailto:johann....@gmail.com>> wrote:
>
> JSON is required to be in utf8 per the spec. Do I have any chances
> to check after Unmarshal, if eg. a stringcontained invalid utf8 -
> characters?
>
> http://play.golang.org/p/NW4b3LLMPy
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to golang-nuts...@googlegroups.com
> <mailto:golang-nuts%2Bunsu...@googlegroups.com>.

Jan Mercl

unread,
Feb 10, 2013, 6:35:44 AM2/10/13
to Johann Höchtl, Patrick Mylund Nielsen, golang-nuts
On Sun, Feb 10, 2013 at 12:19 PM, Johann Höchtl
<johann....@gmail.com> wrote:
> Of course that's possible, but what does the byte sequence
>
> []byte{0xef, 0xbf, 0xbd}
>
> mean? That's what I get from json.Unmarshal for the latin-1 encoded "Ö" (=
> \xD6 )

http://play.golang.org/p/8XC5rqwl3k

-j

Johann Höchtl

unread,
Feb 10, 2013, 7:00:23 AM2/10/13
to Jan Mercl, Patrick Mylund Nielsen, golang-nuts
Thank you, that was very helpful. So assuming that a sensible UTF input
will not contain a replacement sequence, I can detect erroneous input
even after Unmarshal.

Which makes me wonder by Unmarshal doesn't return an InvalidUTF8Error.
(NB: I like it the way it behaves, but isn't the InvalidUTF8Error made
up for this cases?)

Johann
> -j
>


minux

unread,
Feb 10, 2013, 2:09:48 PM2/10/13
to Johann Höchtl, Jan Mercl, Patrick Mylund Nielsen, golang-nuts

On Sun, Feb 10, 2013 at 8:00 PM, Johann Höchtl <johann....@gmail.com> wrote:
Which makes me wonder by Unmarshal doesn't return an InvalidUTF8Error. (NB: I like it the way it behaves, but isn't the InvalidUTF8Error made up for this cases?)
currently InvalidUTF8Error doesn't have any docs, and it only applies to encoding.
we should at least fix the docs to make this clear (we can't change decode to return
that error or we break the Go 1 API compatibility).

please file an issue.

Johann Höchtl

unread,
Feb 10, 2013, 3:13:11 PM2/10/13
to minux, golang-nuts
Reply all
Reply to author
Forward
0 new messages