GSON: Go Simple Object Notation (Need suggestions)

709 views
Skip to first unread message

Hǎiliàng

unread,
Feb 21, 2014, 3:29:40 AM2/21/14
to golan...@googlegroups.com, Rolf Veen
Hi gophers,

Rolf and I are trying to define a specification of a Go-friendly, human readable data format based on his OGDL. The spec is under development and the implementation has not started yet. Here is the draft of the spec:

https://github.com/ogdl/gson/blob/master/spec.md

The goals include:
* Simple and readable
* Go-friendly: support complete Go data types
* Support cyclic reference

Currently we are not quite satisfied with the syntax for cyclic reference and some of the details. So we'd like to ask for valuable feedback from all the gophers on golang-nuts. Any suggestions or criticisms are welcome.

Hǎiliàng

egon

unread,
Feb 21, 2014, 4:08:42 AM2/21/14
to golan...@googlegroups.com, Rolf Veen
Using white-space for nesting is a bad idea, especially in data-formats.

For references use identifiers and links... the idea would be something like:

alpha:uuid &{
foo 123
bar "123"
}
beta #uuid

^ defines that it points to a structure... field alpha is a uniquely named by uuid ... it points to a structure containing number, string... field beta has the same value as #uuid. Obviously it needs to only work for pointer types, to protect against memory overflows. And obviously adjust to your syntax.

+ egon

Hailiang Wang

unread,
Feb 21, 2014, 9:09:07 PM2/21/14
to egon, golang-nuts, Rolf Veen
Hi Egon,

Thanks for your valuable feedback.

> Using white-space for nesting is a bad idea, especially in data-formats.

Could you please explain why it is bad?

One of the major goal of GSON is readability. Curly braces are for
machines, which is why we don't have many curly braces in English
sentences.

Space separated format is being accepted by developers, e.g. the
configuration file of travis-ci is YAML.

GSON is based on OGDL, which is much simpler than YAML and does not
allow mix of tab and space characters, reducing the risk of wrong
format.

I'm not saying that it has no weakness, but it has its strength and position.

Hǎiliàng

Hailiang Wang

unread,
Feb 21, 2014, 9:09:12 PM2/21/14
to egon, golang-nuts, Rolf Veen
Hi Egon,

Thanks for your valuable feedback.

> Using white-space for nesting is a bad idea, especially in data-formats.

Hailiang Wang

unread,
Feb 21, 2014, 9:08:42 PM2/21/14
to egon, golang-nuts, Rolf Veen
Hi Egon,

Thanks for your valuable feedback.

> Using white-space for nesting is a bad idea, especially in data-formats.

Dan Kortschak

unread,
Feb 21, 2014, 9:16:57 PM2/21/14
to Hailiang Wang, egon, golang-nuts, Rolf Veen
White-space is less fault resistant since it's not paired. Braces, brackets and parentheses are paired.

Hailiang Wang

unread,
Feb 21, 2014, 9:54:33 PM2/21/14
to Dan Kortschak, egon, golang-nuts, Rolf Veen
> White-space is less fault resistant since it's not paired. Braces, brackets and parentheses are paired.
>

Good point! Thanks.

So it is harder to tell from the text whether it is complete or not.

However, this could be solved by:
1. An explicit end of stream symbol (OGDL already defines it).
2. Various verifications from each level of protocol (TCP/IP, HTTP)
and hash verification from version control systems.

Hǎiliàng

John Waycott

unread,
Feb 22, 2014, 9:27:58 AM2/22/14
to golan...@googlegroups.com, Dan Kortschak, egon, Rolf Veen
I think egon was pointing out that if the whitespace gets mangled (easy to do when code is cut/pasted from forums, blogs, email, etc, it is difficult to recover the indentation. Since one of your goals is human-readable format, you can assume people will post examples. Python code examples get mangled this way and it can be a pain to restore the indentation.

John

Hailiang Wang

unread,
Feb 22, 2014, 9:40:22 AM2/22/14
to John Waycott, golang-nuts, Dan Kortschak, egon, Rolf Veen
Good point! Thanks.

Python allows mixture of tab and space but OGDL does not allow it. So
a valid OGDL file contains only tabs or spaces, which could partly
solve this problem.

Hǎiliàng

Hotei

unread,
Feb 22, 2014, 9:53:37 AM2/22/14
to golan...@googlegroups.com, John Waycott, Dan Kortschak, egon, Rolf Veen
I've written a ton of python and been bitten by the indentation bug more than a few times.  Cut and paste of python snips can indeed be a pain to sort out.  With a "brace-matching" editor I can sort out the same type of problem (if there even is one) in go code in a few seconds.  Gophers are comfortable with braces and brackets.  To take them out would actually _sacrifice_ readability for most of us.

egon

unread,
Feb 22, 2014, 9:55:03 AM2/22/14
to golan...@googlegroups.com, Dan Kortschak, egon, Rolf Veen
Well, white-space is terrible for lots of reasons :)

1. Less fault-tolerant as kortchak said. Making it impossible to get the correct structure if the spacing is off.

2. It will be copy/pasted, as John pointed out. This means you'll get spacing errors, even if you only use single format... think about Makefile problems. Also do you really want to explain to non-programmers that they copied wrong type of nothing. When copy pasting and you lose an indentation level it may still be interpreted as "correct" input although it has a mistake.

3. And, the most important, you assume that white-space is more human-readable than braces... which may not be the case.

Expanding on the 3. point... Making non-programmers understand significant nesting with white-space is more difficult than making them understand braces. (I also work as a teacher, and I absolutely hate python for it's significant white-space for that reason... for a beginner it can be difficult to understand although it may look cleaner.)

e.g. this looks like nested things

foo
  bar
  baz

this looks like bar is nested under foo and bar is a separate thing... although, by the spec it is probably the same case as before...

foo
  bar

  baz

Although using white-space is less noisy, it is not more readable (based on my experience).

+ egon

Hailiang Wang

unread,
Feb 22, 2014, 10:50:18 AM2/22/14
to egon, golang-nuts, Dan Kortschak, Rolf Veen
Thanks, everyone! I promise I will take all your opinions into serious
considerations.

Has anyone considered that it will be more fault-tolerant if both
indent and curly braces are used and checked against each other?

Hǎiliàng

Benjamin Measures

unread,
Feb 22, 2014, 10:57:29 AM2/22/14
to golan...@googlegroups.com, egon, Dan Kortschak, Rolf Veen
On Saturday, 22 February 2014 15:50:18 UTC, Hǎiliàng wrote:
Has anyone considered that it will be more fault-tolerant if both
indent and curly braces are used and checked against each other?

Wouldn't that end up being the worst of both worlds if, in addition to matching braces (which is sufficient), you had to ensure whitespace was also preserved when copying?

egon

unread,
Feb 22, 2014, 11:04:34 AM2/22/14
to golan...@googlegroups.com, egon, Dan Kortschak, Rolf Veen
Adding more checks always makes things fail early, but not necessarily more fault tolerant.
You should be tolerant against accidental hard to notice mistakes for human input e.g.

a {
  b,
  c,
  d
}

if the commas are enforced this way, like in JSON, then it's really annoying to use... e.g. add a line... try to execute... fix the last line... retry...
Just, don't make subtle rules, use simple consistent rules... everything between these braces are nested, every new line is a new item, if it has " or { it can span multiple lines... or something similar.

TL;DR; the additional indentation rules add complexity where a person can make mistakes (i.e. you have copy/paste problems) without adding too much benefit.

Try to figure out the main uses (i.e. as configuration, as setup etc.) and how it will be used (user copies part A from X and then copies part B from Y together) for the format and then try to optimize for those cases.

You could look at edn.. imho, it has been very well designed... and it's readable enough.

+ egon


Hǎiliàng

Benjamin Measures

unread,
Feb 22, 2014, 11:07:11 AM2/22/14
to golan...@googlegroups.com, Rolf Veen
On Friday, 21 February 2014 08:29:40 UTC, Hǎiliàng wrote:
https://github.com/ogdl/gson/blob/master/spec.md

The goals include:
* Simple and readable
* Go-friendly: support complete Go data types
* Support cyclic reference

Wasn't the main attraction of JSON that it was [mostly] a subset of the Javascript grammar?

Doesn't Go have composite literals <http://golang.org/ref/spec#Composite_literals>? With a name like GSON, don't you think that people might expect the grammar to at least have a passing resemblance to Go?

Hailiang Wang

unread,
Feb 22, 2014, 11:19:47 AM2/22/14
to Benjamin Measures, golang-nuts, Rolf Veen
How do you represent a cyclic reference with only composite literal?
How do you initialize a string/int pointer with only composite literal?

Hǎiliàng

Jan Mercl

unread,
Feb 22, 2014, 11:28:07 AM2/22/14
to Hailiang Wang, Benjamin Measures, golang-nuts, Rolf Veen
On Sat, Feb 22, 2014 at 5:19 PM, Hailiang Wang <hwan...@gmail.com> wrote:
> How do you represent a cyclic reference with only composite literal?
> How do you initialize a string/int pointer with only composite literal?

Any graph can be converted to an [acyclic] tree by naming/numbering
the necessary nodes and expressing the "missing" edges as references
to those node names/numbers. Pointers are essentially the same thing.

-j

Hailiang Wang

unread,
Feb 22, 2014, 11:36:15 AM2/22/14
to Jan Mercl, Benjamin Measures, golang-nuts, Rolf Veen
Could you give an example of the syntax?

type struct {
Val string
PVal *string
}

How do you initialize it with ONLY a composite literal without
declaring a string first?
How do you initialize PVal with &Val with ONLY a composite literal?

Hǎiliàng

Jan Mercl

unread,
Feb 22, 2014, 12:12:37 PM2/22/14
to Hailiang Wang, Benjamin Measures, golang-nuts, Rolf Veen
On Sat, Feb 22, 2014 at 5:36 PM, Hailiang Wang <hwan...@gmail.com> wrote:
> Could you give an example of the syntax?
>
> type struct {
> Val string
> PVal *string
> }
>
> How do you initialize it with ONLY a composite literal without
> declaring a string first?
> How do you initialize PVal with &Val with ONLY a composite literal?

Why the 'without' requirement? What matters is the final data structure, right?

Syntax is mostly irrelevant, let me use some pseudo code (not meant as
a syntax proposal):

#1234: "Hello";
T{Val: #1234, PVal: &#1234};

(Semicolons could get optional; field names could get elided if the
type is declared explicitly etc.).

-j

Hailiang Wang

unread,
Feb 22, 2014, 12:29:24 PM2/22/14
to Jan Mercl, Benjamin Measures, golang-nuts, Rolf Veen
>
> Why the 'without' requirement? What matters is the final data structure, right?
>

No, what matters is exactly the syntax. Benjamin asks why I'm not
considering a syntax like Go composite literal, and I'm just trying to
make a point by asking those questions. Composite literal's syntax
does not support pointer to basic type and circular reference, which
is why.

And there are other reasons, e.g. JSON contains no types but only
values. Composite literal is much more verbose.

Hǎiliàng

Jan Mercl

unread,
Feb 22, 2014, 12:46:34 PM2/22/14
to Hailiang Wang, Benjamin Measures, golang-nuts, Rolf Veen
On Sat, Feb 22, 2014 at 6:29 PM, Hailiang Wang <hwan...@gmail.com> wrote:
>>
>> Why the 'without' requirement? What matters is the final data structure, right?
>>
>
> No, what matters is exactly the syntax.

Let's have two different syntaxes and two different sentences in them
which both result in the same data structure in memory after being
processed/parsed/interpreted. From the resulting memory data you
cannot distinguish which sentence in which language was used. That's a
sufficient condition to prove syntax being irrelevant - above
subjective criteria like elegance or objective things like
performance/speed/space used etc..

> And there are other reasons, e.g. JSON contains no types but only
> values. Composite literal is much more verbose.

When eliding struct fields, not at all.

-j

Dan Kortschak

unread,
Feb 22, 2014, 3:30:43 PM2/22/14
to Jan Mercl, Hailiang Wang, Benjamin Measures, golang-nuts, Rolf Veen
Yes, I like the idea of something that is pretty much go syntax like that, with the option of type definition blocks. A question that comes out of that is case of field name; since you would not be able to set unexported fields, there is not need to distinguish case in the object notation, but you might want to.

This is what goon[1] almost is if you get rid of the type annotation in the values.

[1]https://github.com/shurcooL/goon

Benjamin Measures

unread,
Feb 22, 2014, 7:57:41 PM2/22/14
to golan...@googlegroups.com, Jan Mercl, Benjamin Measures, Rolf Veen
On Saturday, 22 February 2014 16:36:15 UTC, Hǎiliàng wrote:
Could you give an example of the syntax?

type struct {
    Val   string
    PVal *string
}

How do you initialize it with  ONLY a composite literal without
declaring a string first?

It need only match the grammar, not necessarily be executable or compilable. Now liberated, we can:
T{
  Val:  "hello",
  PVal: &"world",  // cannot normally take the address of a literal 
}

How do you initialize PVal with &Val with  ONLY a composite literal?

T{
  Val:  "hello",
  PVal: &Val,      // resolve the address in GSON.Unmarshal
}

Hailiang Wang

unread,
Feb 23, 2014, 12:12:50 AM2/23/14
to Benjamin Measures, golang-nuts, Jan Mercl, Rolf Veen
>
> T{
> Val: "hello",
> PVal: &Val, // resolve the address in GSON.Unmarshal
> }
>


&Val only works within the same struct. How to reference a value outside of it?

Hǎiliàng

Kevin Gillette

unread,
Feb 23, 2014, 9:24:55 AM2/23/14
to golan...@googlegroups.com, Rolf Veen
I can also confirm that structural indentation eventually causes problems in anything intended for transmission; problems that, e.g. JSON does not have.

Also, it's problematic to have multiple syntactic forms for the same thing, e.g. people are less inclined to implement a parser when

1, 2, 3

means the same thing as:

1
2
3

Further, just as you can't assume whitespace won't get removed, you can't assume whitespace, such as newlines, won't get added, such as when this data is copied into an email message, or anything else that enforces a maximum line length. I suspect that a parse would fail when some part of the transmission transforms the first example into:

1, 2,
3

The explicit structure of JSON, for example, is both syntactically simpler and more resistant to these issues. If you want it all on one line, it's:

[1, 2, 3]

and on multiple lines:

[ 1
, 2
, 3
]

Visually, it's unambiguous that you're dealing with a JSON array, and since whitespace is unimportant between JSON tokens, the above two examples can be treated as being syntactically and semantically identical.

As per points mentioned earlier in this thread, GSON is clearly a reference to JSON, and JSON is valid JavaScript; because of this, I'd recommend against calling it GSON unless it is valid Go, or a substantial syntactic overlap with Go.

How important is the ability to specify cyclic structures truly? Even in typical Go code, cycles are fairly rare. If you need such references, one option is to go the JSON Pointer or XPath route. For example:

{ x: / }

(slash in this example meaning 'whole document') could represent a document in which the struct contains a reference to itself, such that if it were expanded once and then serialized, it would look like:

{ x: { x: /x } }

and if expanded twice and serialized, would look like:

{ x: { x: { x: /x/x } } }

Closing note: if you're looking for a indent-structured data format with cyclic capabilities and the ability to specify explicit, user-defined type type annotations, then YAML already has all those features.

Hailiang Wang

unread,
Feb 23, 2014, 10:36:40 AM2/23/14
to Kevin Gillette, golang-nuts, Rolf Veen
Hi Kevin,

Thanks for your efforts for writing so much trying to explain why
indentation is so bad.

Your response forces me to think what's the requirement, and I've made
up my decision:

1. Stop calling it GSON, The name Typed OGDL is good enough. it is
really a misleading name. It has nothing to do with JSON and it has
not enough to do with Go.

2. I won't try to introduce (optional) curly braces into it like YAML.
Just as simple as possible: an OGDL with just enough extensions to
marshal/unmarshal all Go types.

So Typed OGDL will be a YAML style but much simpler data format,
supporting Go types with first priority.

Regards,

Hǎiliàng

Jan Mercl

unread,
Feb 23, 2014, 11:54:05 AM2/23/14
to Hailiang Wang, Kevin Gillette, golang-nuts, Rolf Veen
Never heard about OGDL before reading this thread. The Wikipedia
article started 9 years ago, has _two_ sentences to say about it - in
cca 20 or so edits through all those years.

It might be the case that the format didn't catched users. It might be
the case that it's because the decision to use white space to carry
semantics. I'm obviously not the only one considering that
unfortunate. Oh, and also that space vs comma thing.

-j

Hǎiliàng

unread,
Feb 23, 2014, 12:59:41 PM2/23/14
to golan...@googlegroups.com, Rolf Veen
Thanks everybody!

You never know until you have tried.

Let's close this thread.

Hǎiliàng

Rolf Veen

unread,
Feb 23, 2014, 2:10:34 PM2/23/14
to Jan Mercl, Hailiang Wang, Kevin Gillette, golang-nuts
Let me comment on this thread since I'm the author of OGDL. While
working on a Java parser for YAML (that was long ago, when none was
available), I got frustated by the growing complexity of the language,
so OGDL was born. It is not a substitute for any data format, it is
just another one, a very simple way of displaying text for both humans
and machines. I would for example not use JSON or XML for
configurations files, I would use YAML or OGDL. With minimal
modifications you could print the output of common unix commands in
OGDL and then they would be parseable and still readable. That are the
kind of niches for OGDL, YAML and other indentation based languages. I
would not use indentation for code as Python does.

I'm obviously not trying to sell OGDL, hence more than 10 years have
produced only 2 lines in the wikipedia. But now, thanks to Go, I could
write a parser [1] (announced here some weeks ago, Jan), with which
I'm satisfied with, for the first time (even if it may still contain
bugs).

Cheers,
Rolf.

[1] http://godoc.org/github.com/rveen/ogdl

bugpowder

unread,
Feb 24, 2014, 11:38:54 AM2/24/14
to golan...@googlegroups.com, Rolf Veen
On Monday, February 24, 2014 1:59:41 AM UTC+8, Hǎiliàng wrote:
Thanks everybody!

You never know until you have tried.

Well, most of us HAVE tried. 

I've been programming Python professionally for around 15 years. 

I'd rather it had braces.

And I've tried YAML -- after seeing the mess in parsers for it, I stuck with JSON and XML.
Reply all
Reply to author
Forward
0 new messages