Re: handling of NULL bytes in lift-json

314 views
Skip to first unread message

Joni Freeman

unread,
Jun 6, 2012, 11:52:06 AM6/6/12
to lif...@googlegroups.com
Hi,

What do you mean by NULL bytes, can you give an example?

Cheers Joni

On Wednesday, June 6, 2012 10:07:48 AM UTC+3, mmliu wrote:
I notice that in ruby active_support::JSON, when there are NULL bytes in json string , they just ignore them

while in lift-json,seems that we will keep it , which one is right?(or both?) 

mmliu

unread,
Jun 11, 2012, 1:30:49 AM6/11/12
to lif...@googlegroups.com
By saying NULL bytes  I mean string like "endwith\u0000"

active_support::JSON will ignore the '\u0000',while lift-json seems will keep it.

ps.Sorry for the late reply, I thought google will send me mail when there is a new reply,however it didn't

Joni Freeman

unread,
Jun 11, 2012, 3:17:55 AM6/11/12
to lif...@googlegroups.com
Hi,

The spec says (http://www.ietf.org/rfc/rfc4627.txt):

"All Unicode characters may be placed within the quotation marks except for the characters that must be
escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)."

I think the spec means that a JSON string "endwith\u0000" should not be parseable at all. Trying to
parse that should then be a parse error.

I tried some other JSON parsers:

Text.JSON (Haskell) - works like lift-json, keeps the control characters
Jackson (Java) - throws an exception, parse error
jQuery (Javascript) - throws an exception, parse error

While the spec is not very clear on what the parser should do with unquoted control characters,
I think Jackson's and jQuery's behavior is the most sane one.

Cheers Joni

Jeppe Nejsum Madsen

unread,
Jun 11, 2012, 5:28:15 AM6/11/12
to lif...@googlegroups.com
mmliu <diveinto...@gmail.com> writes:

> By saying NULL bytes I mean string like "endwith\u0000"
>
> active_support::JSON will ignore the '\u0000',while lift-json seems will
> keep it.

Keeping the NUL seems to be in line with the spec

"All Unicode characters may be placed within the quotation marks...."

http://www.ietf.org/rfc/rfc4627.txt

/Jeppe

Jeppe Nejsum Madsen

unread,
Jun 11, 2012, 5:42:51 AM6/11/12
to lif...@googlegroups.com
On Mon, Jun 11, 2012 at 9:17 AM, Joni Freeman <freema...@gmail.com> wrote:
> Hi,
>
> The spec says (http://www.ietf.org/rfc/rfc4627.txt):
>
> "All Unicode characters may be placed within the quotation marks except for
> the characters that must be
> escaped: quotation mark, reverse solidus, and the control characters (U+0000
> through U+001F)."
>
> I think the spec means that a JSON string "endwith\u0000" should not be
> parseable at all. Trying to
> parse that should then be a parse error.

Too late with previous response :-) I dont think this is what the spec
says, just that NUL should always be escaped, which it is. E.g.
further in the spec:

"Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point."


So \u0000 is a perfectly valid unicode character and, imo, should
parsed as such.
Reply all
Reply to author
Forward
0 new messages