wish: accept // or /* javascript-like comment in JSON

1,510 views
Skip to first unread message

Basile Starynkevitch

unread,
Feb 22, 2011, 6:29:58 PM2/22/11
to jansso...@googlegroups.com
Hello All,

A wish for Jansson would be to permit comments with a Javascript syntax.
I am aware that such comments are legally outside of JSON specification
(RFC4627).

For what it is worth, YAJL http://lloyd.github.com/yajl/ is handling
them.
http://lloyd.github.com/yajl/yajl-1.0.11/structyajl__parser__config.html


At parsing time, comments would be skipped, and considered like spaces
(between lexemes, e.g. outside of JSON strings).

To write comments from their program into a JSON file, people would use
json_dumpf for JSON generation and fprintf for comments generation.

Concretely, we could for instance in the API

define a new flag JSON_WITH_COMMENTS

have json_loads, json_loadf, and json_load_file accept that flag.
When enabled, Javascript comments - that is a slash slash up to end of
line, or a slash star up to star slash, are silently skipped (outside
of JSON strings).

Use case for comments in JSON files include:

1. JSON configuration files edited by humans. Obviously, one can want to
add comments inside them, exactly as one adds # comments in many
configuration files inside /etc (like /etc/hosts or others).


2. A small single-line generated comment intended for humans, e.g.
// file foo.json generated by myprog on 2011-Feb-21

3. A bigger multiline generated comment, like the mandatory GPL comment
/****
© 2011 Basile Starynkevitch

this file data/firstspace.json is part of IaCa

IaCa is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

IaCa is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with IaCa. If not, see <http://www.gnu.org/licenses/>.
****/
The use case of a GPL notice in data is quite simple. Imagine that you
dump some internal representation of a compiler or interpreter in JSON
format (a quite sensible thing to do). Then, for persisting GPL
licensed software, it makes sense to emit a GPL comment like above.


Your comments are welcome about this idea. I might even perhaps propose
a patch for it.

By the way, I believe it would be nice if the current git master of
jansson had a JSON_VERSION like "1.99"; leaving "1.3" there is
misleading!

Regards.

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Petri Lehtinen

unread,
Feb 23, 2011, 6:09:00 AM2/23/11
to jansso...@googlegroups.com
Basile Starynkevitch wrote:
> Hello All,
>
> A wish for Jansson would be to permit comments with a Javascript syntax.
> I am aware that such comments are legally outside of JSON specification
> (RFC4627).
>
[snip]

>
> Your comments are welcome about this idea. I might even perhaps propose
> a patch for it.

You know, I've already rejected a proposal like this before. My main
concern was the choice of a comment style, i.e. should it be #, //,
/* */, ;, or what?

But supporting the JavaScript comments actually makes sense, as JSON
is a subset of JavaScript. You're welcome to provide a patch, but
it'll have to wait for 2.1. I'm releasing 2.0 soon, as it has already
been in the making for too long.

> By the way, I believe it would be nice if the current git master of
> jansson had a JSON_VERSION like "1.99"; leaving "1.3" there is
> misleading!

Hey, that's a good idea. I've thought about this, but couldn't find a
good way to do it. Using 99 as the minor or micro sounds nice, I'll
implement this right after 2.0.

Petri

Deron Meranda

unread,
Feb 23, 2011, 2:58:49 PM2/23/11
to jansso...@googlegroups.com
I personally don't like comments in JSON, because it then is no longer
JSON, and that can lead to compatibility issues and encourage
defragmentation of the standard.

That being said, if comment-support is allowed, it should:

1. Be an option that must be explicitly selected. The default usage
should remain strict JSON and reject comments.


2. The lexical parsing should follow precisely that of JavaScript,
e.g., ECMAscript.

See http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf
Section 7.4 (page 16)

Note that this means that section 7.3 (line terminators) should also
be followed, as the // style comments depend on it. This is more than
just recognizing the common CR or LF characters.

It's also not clear whether comment-support should be read-only
(ignoring them as whitespace), or if it is desirable to be able to
inspect them or even write them as well.
--
Deron Meranda
http://deron.meranda.us/

Petri Lehtinen

unread,
Feb 24, 2011, 2:01:27 AM2/24/11
to jansso...@googlegroups.com
Deron Meranda wrote:
> I personally don't like comments in JSON, because it then is no longer
> JSON, and that can lead to compatibility issues and encourage
> defragmentation of the standard.
>
> That being said, if comment-support is allowed, it should:
>
> 1. Be an option that must be explicitly selected. The default usage
> should remain strict JSON and reject comments.

I agree. Non-standard extensions must not be the default behavior.

> 2. The lexical parsing should follow precisely that of JavaScript,
> e.g., ECMAscript.
>
> See http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf
> Section 7.4 (page 16)
>
> Note that this means that section 7.3 (line terminators) should also
> be followed, as the // style comments depend on it. This is more than
> just recognizing the common CR or LF characters.

Section 7.4 is not relevant for JSON as line terminators are "just
whitespace" and there's no automatic semicolon insertion.

The JSON RFC is not clear on whether U+2028 and U+2029 should be
considered whitespace, but I don't think it hurts anyone if they are.

> It's also not clear whether comment-support should be read-only
> (ignoring them as whitespace), or if it is desirable to be able to
> inspect them or even write them as well.

I'd keep it simple and just ignore them with no futher support.

Petri

Deron Meranda

unread,
Feb 24, 2011, 3:14:40 AM2/24/11
to jansso...@googlegroups.com
On Thu, Feb 24, 2011 at 2:01 AM, Petri Lehtinen <pe...@digip.org> wrote:
> Section 7.4 is not relevant for JSON as line terminators are "just
> whitespace" and there's no automatic semicolon insertion.
>
> The JSON RFC is not clear on whether U+2028 and U+2029 should be
> considered whitespace, but I don't think it hurts anyone if they are.

The JSON RFC does specify exactly what is considered whitespace,
and it is a small subset of what JavaScript does. In JavaScript pretty
much any character that has a Unicode category of Zs, Zl, or Zp is
whitespace. But JSON allows for only 4 specific characters.


A more subtle issue though comes to the treatment of U+2028 and
U+2029 if you do permit comments.

Consider this snippet of text:

[ 123 // hello <U+2028> world <LF> ]

In strict JSON that's an error as soon as you see "//".

And in JavaScript it is also an error, because the token "world"
follows "123", which is a syntax error.

But in an extended JSON that allows comments, if you don't
also extend the definition of line separator then you would parse
that as valid (as "world" would still be within the lexical scope
of the comment). Certainly, we'd never want to accept something
as valid which even JavaScript would error on (*see below).

So clearly, if you are to accept comments, you must also accept
the expanded definition of line terminator sequences. I think this
agrees with what you were saying, but I wanted to make it more
clear.


The question then becomes, should you just accept the expanded
set of line terminators (and possibly even all whitespace as Z*) all
the time, even when you don't allow comments? That may make the
code easier, I don't know. I'm still of the opinion though that in the
default mode, it should be very pedantic and never allow anything
which is not strictly per the JSON RFC ... which means failing
if it saw a U+2028 outside a quoted string.


(*) Note: Actually there IS a subtle incompatibility between JSON
and JavaScript. Consider the string literal:
"hello<U+2028>world"
This is valid in JSON, but is illegal in JavaScript! It is an unfortunate
artifact of the specifications, and nothing that we can hope to solve.


>> It's also not clear whether comment-support should be read-only
>> (ignoring them as whitespace), or if it is desirable to be able to
>> inspect them or even write them as well.
>
> I'd keep it simple and just ignore them with no futher support.

I agree, unless others can posit a reasonable argument to do more with them.

Deron Meranda

unread,
Feb 24, 2011, 3:32:30 AM2/24/11
to jansso...@googlegroups.com
> A more subtle issue though comes to the treatment of U+2028 and
> U+2029 if you do permit comments.
> ...

I probably made that sound a lot more complicated than it is. It's
useful discussion for those who want to know the "why", but
in a practical sense the solution is easy:

Inside the new code that recognizes and "skips over" a //-style
comment (e.g., a single-line comment), it must recognize the
end of the comment under these conditions:

* Reaches end of text, or
* Sees any of the four Unicode characters:
U+000A, U+000D, U+2028, U+2029

Anything beyond that might be nice (such as allowing U+2028
outside of comments too), but not as necessary.

Petri Lehtinen

unread,
Feb 24, 2011, 3:56:38 AM2/24/11
to jansso...@googlegroups.com
Deron Meranda wrote:
> On Thu, Feb 24, 2011 at 2:01 AM, Petri Lehtinen <pe...@digip.org> wrote:
> > Section 7.4 is not relevant for JSON as line terminators are "just
> > whitespace" and there's no automatic semicolon insertion.
> >
> > The JSON RFC is not clear on whether U+2028 and U+2029 should be
> > considered whitespace, but I don't think it hurts anyone if they are.
>
> The JSON RFC does specify exactly what is considered whitespace,
> and it is a small subset of what JavaScript does. In JavaScript pretty
> much any character that has a Unicode category of Zs, Zl, or Zp is
> whitespace. But JSON allows for only 4 specific characters.

You're right. Only \n, \r, \t and space are specified in the JSON RFC.

> A more subtle issue though comes to the treatment of U+2028 and
> U+2029 if you do permit comments.
>
> Consider this snippet of text:
>
> [ 123 // hello <U+2028> world <LF> ]
>
> In strict JSON that's an error as soon as you see "//".
>
> And in JavaScript it is also an error, because the token "world"
> follows "123", which is a syntax error.
>
> But in an extended JSON that allows comments, if you don't
> also extend the definition of line separator then you would parse
> that as valid (as "world" would still be within the lexical scope
> of the comment). Certainly, we'd never want to accept something
> as valid which even JavaScript would error on (*see below).
>
> So clearly, if you are to accept comments, you must also accept
> the expanded definition of line terminator sequences. I think this
> agrees with what you were saying, but I wanted to make it more
> clear.

True. But does having JavaScript like comments imply that we conform
to the JavaScript specification? It should be quite easy to implement
but should we really implement it?

JSON is a subset of JavaScript, but this doesn't mean that JSON should
be parsed using eval() in JavaScript, right?

> The question then becomes, should you just accept the expanded
> set of line terminators (and possibly even all whitespace as Z*) all
> the time, even when you don't allow comments? That may make the
> code easier, I don't know. I'm still of the opinion though that in the
> default mode, it should be very pedantic and never allow anything
> which is not strictly per the JSON RFC ... which means failing
> if it saw a U+2028 outside a quoted string.

Yes, the default mode should be strictly conforming to the JSON RFC.

It's interesting how just yesterday mr. Crockford commented on some
issues about JSON and Unicode in the JSON mailing list [1]. His
message was very mych like "JSON doesn't care about whether Unicode is
valid or not".

[1] Full thread: http://tech.groups.yahoo.com/group/json/message/1582

> (*) Note: Actually there IS a subtle incompatibility between JSON
> and JavaScript. Consider the string literal:
> "hello<U+2028>world"
> This is valid in JSON, but is illegal in JavaScript! It is an unfortunate
> artifact of the specifications, and nothing that we can hope to solve.

Yep. JSON RFC requires only to escape U+0000 to U+001F inside strings.

I'm unsure about what to do about this, will have to think. I clearly
see that allowing comments inside JSON may be beneficial for some use
cases.

Petri Lehtinen

unread,
Mar 17, 2011, 11:00:24 AM3/17/11
to jansso...@googlegroups.com
Petri Lehtinen wrote:
> I'm unsure about what to do about this, will have to think. I clearly
> see that allowing comments inside JSON may be beneficial for some use
> cases.

I decided not to implement comments, once again. The main reason is
that they are not in the JSON spec, and there should only be one JSON
spec out there.

Comments may be useful in some use cases, but the fact is that they're
not JSON. Maybe JSON is not the correct data format for those use
cases at all?

Deron Meranda

unread,
Mar 17, 2011, 7:37:51 PM3/17/11
to jansso...@googlegroups.com, Petri Lehtinen
> I decided not to implement comments, once again. The main reason is
> that they are not in the JSON spec, and there should only be one JSON
> spec out there.

It's your call, but I tend to agree. JSON's primary benefit is that
it is widely interchangeable, and we don't want to encourage any
fragmentation. Also, another problem with comments is that they are
not preserved; e.g., they won't survive a decode-then-encode round
trip. I'm of the opinion that an encode(decode(s)) should always
result in the same semantically-equivalent data.


However for the benifit of those who may desire something like
comments, but not necessarily "JavaScript" comments, there are some
options....

One pattern I particularly use when I have control or flexibility over
the JSON schema being used, is to just use strings inside an
object/dictionary, e.g.,

{
"comment" : "This is a comment string",
... }

Another option, for those who can pre-process the JSON input before
parsing, is to use a less lexically-involved comment syntax that can
be easily striped out. Such as the common '#'-style one-line comments
often used in scripting languages.

You could even implement a //-style comment in a preprocessor easily
enough, as long as you restricted it to just recognize comments that
only start at the first non-blank character. Handling //-comments
that appear at the end of a line following other stuff can be very
challenging, and would probably have to be done inside the main JSON
parser and not as and independent preprocessor.

Deron

Petri Lehtinen

unread,
Mar 18, 2011, 2:08:23 AM3/18/11
to jansso...@googlegroups.com
Deron Meranda wrote:
> One pattern I particularly use when I have control or flexibility over
> the JSON schema being used, is to just use strings inside an
> object/dictionary, e.g.,
>
> {
> "comment" : "This is a comment string",
> ... }
>

I've used this, too. This makes it a bit hard or at least ugly to have
multi-line comments, though.

xgbi

unread,
Apr 3, 2012, 12:00:53 PM4/3/12
to jansso...@googlegroups.com
I'm shovelling up this thread but...

Could you tell me how you make multi line string in a JSON structure, using Jansson?

I guess something like this:

{
    "comment": [ 
         "line 1 ....",
         "line 2 ...."
    ]
}

Am I right?
I find this quite bad, because on the C side, you have ton concatenate each line of the array (which is far from efficient...) to get back the original.
Does the JSON spec have something to say about this?

On the other hand, I'm wondering: why is the comment parsing a "won't implement", but the decoding of single JSON values is allowed ( JSON_DECODE_ANY in 2.3, against RFC as much as comments parsing ) in Jansson?

My opinion is that I find both of them important, and the developper should be left with the choice of using it or not. Some other people seem to have that same opinion on comments: http://blog.getify.com/json-comments/
I think Basile's proposition is sound: Jansson can be used in a lot of different contextes, many of which include configuration files, in which comments are crucial. 

I'm not advocating for the inclusion of JSON comments in the encode/output part, only in the decode/input processus, because as you said (and getify gets it also), the raw JSON format should not have comments when transmitted to another party.

Anyway, I might dive in this, maybe propose a patch if I find the time to do it.

Guillaume

Deron Meranda

unread,
Apr 3, 2012, 2:10:12 PM4/3/12
to jansso...@googlegroups.com
> Could you tell me how you make multi line string in a JSON structure, using
> Jansson?

Your problem is that JSON itself does not provide any multi-line
string syntax; short of embedding newlines—which has the effect of
multiple lines, but not the visual appearance:

{ "comment": "line one\nline two\nline three" }

That's as good as you can do in JSON.


> I guess something like this:
>
> {
>     "comment": [
>          "line 1 ....",
>          "line 2 ...."
>     ]
> }

You can certainly do that too. Technically there are multiple strings,
but perhaps that okay for your application. You can always design a
format where you can simulate comments, say by always treating any
object-keys that start with underscores or some like to be ignored.
But that's an application decision—it's not in JSON and therefore not
in Jansson.


> On the other hand, I'm wondering: why is the comment parsing a "won't
> implement", but the decoding of single JSON values is allowed (
> JSON_DECODE_ANY in 2.3, against RFC as much as comments parsing ) in
> Jansson?

That's because the JSON_DECODE_ANY is there to allow parsing of JSON
fragments. But those fragments must still be valid according to the
JSON spec. It's a good question, but there is a slight difference.
Comments have never been legal in JSON in any form.


> Jansson can be used in a lot of
> different contextes, many of which include configuration files, in which
> comments are crucial.

I'm not speaking for Petri, but I think the feeling is that Jansson
should do JSON and only JSON. The problem with providing libraries
that allow the user to go beyond what the spec says is that it sort of
promotes sloppiness and endangers data exchange. If people using JSON
could produce output with Javascript-like comments or multiline string
concatenation; then that data would not be portable in that it would
no longer be guaranteed to be readable by other JSON-compliant
software, perhaps in different languages. If you give people the easy
power to abuse JSON, they will, perhaps even unknowingly, and then
you've destroyed the primary goal and advantage of using JSON.

The whole idea of JSON is data portability across implementations; so
it would be reckless to start supporting features not in JSON. The
same should also be said of other JSON implementations in other
languages. Some are quite good at adhering only to the spec while
others, alas, are not.


If you must have comments or multi-line string concatenation or
whatever other extra-JSON extension you want, you have a few options:

* Modify the Jansson code according to its open source license. You
could even make a fork I guess, just follow the licence;
* Do your own preprocessing, say to strip out comments or concatenate
strings in your non-JSON input to convert it to JSON first (this is
not trivial in a general sense, but could be in an
application-specific setting);
* Use a different library; perhaps even look at some Javascript
parsers or engines;
* Look at a different richer data format, perhaps YAML.

As far as accepting a patch, that's up to Petri, but he's taken a
pretty solid stance on this in the past.

Petri Lehtinen

unread,
Apr 3, 2012, 3:15:16 PM4/3/12
to jansso...@googlegroups.com

This is exactly why I don't want comment support...

> If you must have comments or multi-line string concatenation or
> whatever other extra-JSON extension you want, you have a few options:
>
> * Modify the Jansson code according to its open source license. You
> could even make a fork I guess, just follow the licence;
> * Do your own preprocessing, say to strip out comments or concatenate
> strings in your non-JSON input to convert it to JSON first (this is
> not trivial in a general sense, but could be in an
> application-specific setting);

... and these two are the best options you have if you really want
comments. Either modify Jansson for your needs or do some
preprocessing to strip comments.

The most fundamental problem with comments is that (because they're
not in the spec) they cannot be implemented in a way that would be
compatible with other existing libraries.

Petri

xgbi

unread,
Apr 3, 2012, 3:34:52 PM4/3/12
to jansso...@googlegroups.com

After posting my question, I browsed the jansson-users archives for a while, and found that this question has been asked quite a few times.
I read thoroughly your answers, and I guess I better understand what the problem with comments is: if you want all the existing JSON libraries to be compatible together, you have to stick to the JSON spec, and only this spec.

So, yeah, I guess I'll have to do something myself. The preprocessing solution is quite appealing (and easier I guess), so I think I'll stick to that.

Thanks for both your answers,
Guillaume
Reply all
Reply to author
Forward
0 new messages