Preserving key order in encoding/json

12,759 views
Skip to first unread message

Peter Waldschmidt

unread,
Mar 21, 2015, 12:43:27 AM3/21/15
to golan...@googlegroups.com
I've run into a limitation of the encoding/json package when decoding JSON that is not mapped to a struct. I need to determine the original order in which the JSON object keys were specified (in the parsed bytes). However, since the Unmarshal and Decode APIs return a generic JSON object as a map[string]interface{}, we lose initial ordering information as a consequence of the randomized ordering of maps.

It's really not too hard to solve, but I wanted to get some feedback before submitting code. 

I think the simplest solution would be to follow the example of json.Decoder.UseNumber(). The proposed types would look something like the following.

type ObjectHandler interface {
   
Set(key string, value interface{})
}

// Creates an object that takes the place of map[string]interface for free-form JSON
func
(d *Decoder) UseObjectHandler( newObj func() ObjectHandler )


The semantics would be unchanged from today's implementation with the following exceptions.

1. When using the json.Decoder, one can specify a function (UseObjectHandler) that returns a ObjectHandler interface.
2. If UseObjectHandler was called on a Decoder, then each time a map[string]interface{} would have been created, the result of the newObj function will be used in its place. If newObj returns null then fall back to default behavior (i.e. map[string]interface{}).
3. If, when parsing a JSON object, the decoder gets an ObjectHandler then the Decoder will call ObjectHandler.Set(key,value) rather than placing each JSON object key/value pair into a map.

This should be a relatively simple change with no impact on the existing API. On the other hand, if provides a reasonable and simple way to get access to the order information of keys. 
Thoughts??

Peter Waldschmidt

As an aside, we could extend this to the Unmarshall method as well by allowing the user to pass an ObjectHandler as the v parameter. The danger with that would be inconsistency, because if there are nested objects, presumably they'd have to fall back to a regular map since there's no way to tell Unmarshal how to construct a new ObjectHandler without adding another argument.


Dave Cheney

unread,
Mar 21, 2015, 1:08:46 AM3/21/15
to Peter Waldschmidt, golang-dev
Hi Peter,

You haven't explained why the preserving the ordering is important.
Please work on the use case before jumping to the solution.

Thanks

Dave
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Peter Waldschmidt

unread,
Mar 21, 2015, 1:19:47 AM3/21/15
to Dave Cheney, golang-dev

I am parsing a large JSON document.

The data in the source document is laid out in a fairly organized way, such that it makes sense to read it. Related fields are clustered together, etc...

The way that I render this data makes the most sense to the end user if it is organized in the same order as it appears in the JSON source data.

Currently, the data is being rendered in a confusing way because the keys are jumbled by the map. Even worse, they are not the same from one run to another. Simply being able to get the original key order information from the decoder would make all the difference.

Martin Gallagher

unread,
Mar 21, 2015, 5:21:08 AM3/21/15
to golan...@googlegroups.com, da...@cheney.net
https://www.ietf.org/rfc/rfc4627.txt

JSON itself doesn't guarantee object (maps) ordering.

Egon Elbre

unread,
Mar 21, 2015, 8:47:23 AM3/21/15
to golan...@googlegroups.com, da...@cheney.net


On Saturday, 21 March 2015 07:19:47 UTC+2, Peter Waldschmidt wrote:

I am parsing a large JSON document.

The data in the source document is laid out in a fairly organized way, such that it makes sense to read it. Related fields are clustered together, etc...

The way that I render this data makes the most sense to the end user if it is organized in the same order as it appears in the JSON source data.

Why are you showing JSON to the end-user in the first place?

Is the JSON some sort of configuration file? JSON is not ideal for end-user configurations... something like ini/toml works better.

Peter Waldschmidt

unread,
Mar 21, 2015, 9:43:53 AM3/21/15
to golan...@googlegroups.com, da...@cheney.net

This is not a question of whether the current implementation confirms to the spec (it does). I'm also not arguing that the JSON encoder/decoder should preserve order in every case, but rather that there should be some way that the developer can determine the original order and write out an object in a predictable order.

There are many cases where JSON is used as a (semi) human readable config format, for example. Having the file scramble itself every time you pass it through the decoder/encoder is not user friendly.

In my case, I am provided with arbitrary complex JSON data, which I don't have control over. I'm not displaying that information to the end user directly, but I transform it and display it back to the end user in a form in which the structure of the rendered information is similar in structure to the original JSON (i.e. ordering, clustering, naming). The original JSON has a logical order (guaranteed or not), and the end user benefits by seeing their output ordered in the same general way.

I don't think this is an uncommon scenario. 

  1. Imagine if you were writing a configuration file editor (possibly a web page), where the end user could use a friendly interface to edit fields that were stored in an underlying JSON file. If the order of the fields were randomly changing during the process, that would be pretty unworkable as a solution.
  2. Another case, let's say someone has serialized data from a database table as an array of objects, each record is an object with each field in the object rendered as a JSON key/value pair. If you are displaying this data to the end user, showing them records in which the columns are randomly in different orders would be a real problem.
Guaranteed or not, the order of data in a JSON document is sometimes important. Why should we make it impossible for a developer to get order information from the json package? Especially when it's so simple. This proposal doesn't change anything in the current API or implementation semantics. The code impact is miniscule, no more than a handful of lines. And, there's no other way to accomplish this in the current encoding/json package without rewriting it on your own.

I know we are still talking API philosophy, but I had some thoughts about a more natural API implementation than I originally suggested. I'll go ahead and throw it out there.


type
Member struct {
   
Key   string
   
Value interface{}    
}

type
OrderedObject []Member

func
(* Decoder) UseOrderedObject()


If
UseOrderedObject is called on the Decoder, then the Decoder will return an OrderedObject rather than a map[string]interface{} when each JSON object is decoded. 

Similarly, when the Encoder marshals an OrderedObject, it would write out the JSON object preserving order in the output stream.

Egon Elbre

unread,
Mar 21, 2015, 10:27:56 AM3/21/15
to golan...@googlegroups.com, da...@cheney.net
On Saturday, 21 March 2015 15:43:53 UTC+2, Peter Waldschmidt wrote:

This is not a question of whether the current implementation confirms to the spec (it does). I'm also not arguing that the JSON encoder/decoder should preserve order in every case, but rather that there should be some way that the developer can determine the original order and write out an object in a predictable order.

There are many cases where JSON is used as a (semi) human readable config format, for example. Having the file scramble itself every time you pass it through the decoder/encoder is not user friendly.

Is using JSON as a config file a good decision in the first place?

The problem I see with this is that this fix needs to be implemented in multiple levels until it's properly fixed. Imagine that you wish to swap out your current interface with SPA - i.e. JavaScript makes a request to the server. Now you use JSON.parse(data)... now you need to start fixing V8 and other implementations.

In my case, I am provided with arbitrary complex JSON data, which I don't have control over. I'm not displaying that information to the end user directly, but I transform it and display it back to the end user in a form in which the structure of the rendered information is similar in structure to the original JSON (i.e. ordering, clustering, naming). The original JSON has a logical order (guaranteed or not), and the end user benefits by seeing their output ordered in the same general way.

So:

* Can you change from JSON to something else?
Maybe xml or edn is better suited for this?

* Can you change the structure of JSON?
If the order is important use a JSON array.

I agree that displaying information in that way and grouping things together provide benefit. But the problem is in JSON rather than the API Go is providing.

Although, I know there are cases where you need to make the best out of the worst case. Basically, I don't think it should be part of encoding/json - it simply pretends to fix the problem - but in reality the problem is that JSON objects are unordered and trying to keep them ordered at all levels means essentially creating your own custom data format.

If you really have no other option, i.e. changing the structure/format, I would suggest forking the json package to json2/json-ordered or something.

There are also tons of json packages - maybe some of them already solves your problem http://godoc.org/?q=json; although I didn't notice one.

Alternatively, there is a pending issue for json Tokenizer (https://github.com/golang/go/issues/6050). With a Tokenizer implementing your usage case would become easier.

+ Egon

I don't think this is an uncommon scenario. 

  1. Imagine if you were writing a configuration file editor (possibly a web page), where the end user could use a friendly interface to edit fields that were stored in an underlying JSON file. If the order of the fields were randomly changing during the process, that would be pretty unworkable as a solution.
  2. Another case, let's say someone has serialized data from a database table as an array of objects, each record is an object with each field in the object rendered as a JSON key/value pair. If you are displaying this data to the end user, showing them records in which the columns are randomly in different orders would be a real problem.
You can sort them by name to display them. I.e. html/template does that when you display map values.

Ugorji

unread,
Mar 21, 2015, 10:59:46 AM3/21/15
to golan...@googlegroups.com, da...@cheney.net
A valid use-case for this is to support canonical encodings i.e. an object is always encoded into the same sequence of bytes. This is important in cryptography, etc (although json is not a good candidate format for this).

github.com/ugorji/go/codec package supports this well with the "Canonical" Encode Option (See http://ugorji.net/blog/go-codec-primer#canonical-encoding-of-values ). This package supports other encodings which mandate canonical support, but the option is available to all encodings it supports (json, cbor, msgpack, binc, simple).

Peter Waldschmidt

unread,
Mar 21, 2015, 3:23:14 PM3/21/15
to golang-dev, Egon Elbre
Is using JSON as a config file a good decision in the first place?

* Can you change from JSON to something else?
Maybe xml or edn is better suited for this?

* Can you change the structure of JSON?
If the order is important use a JSON array.

I agree with you, that JSON has disadvantages as a config file. My case doesn't have anything to do with config. I was trying to relate to other use cases. 

I do not have control over the format. In fact, there is no definite format. I have to be able to read essentially any valid JSON document.

If you really have no other option, i.e. changing the structure/format, I would suggest forking the json package to json2/json-ordered or something.

There are also tons of json packages - maybe some of them already solves your problem http://godoc.org/?q=json; although I didn't notice one.

Well, I've got to build it one way or the other, I was just hoping to be able to add some value to the json package since it's a lot of value for changing a few lines of code. 
 
Alternatively, there is a pending issue for json Tokenizer (https://github.com/golang/go/issues/6050). With a Tokenizer implementing your usage case would become easier.

I agree, while the Tokenizer is more than I need, it would solve the problem. I may give it a go. 

 

Peter Waldschmidt

unread,
Mar 22, 2015, 5:18:14 PM3/22/15
to golang-dev
I've sent a CL with the changes. It's very simple.

https://go-review.googlesource.com/#/c/7930/

Brad Fitzpatrick

unread,
Mar 22, 2015, 7:32:38 PM3/22/15
to Peter Waldschmidt, golang-dev
I've replied on the CL.


roger peppe

unread,
Mar 24, 2015, 9:00:02 AM3/24/15
to Peter Waldschmidt, Dave Cheney, golang-dev
On 21 March 2015 at 05:19, Peter Waldschmidt <pe...@waldschmidt.com> wrote:
> Currently, the data is being rendered in a confusing way because the keys
> are jumbled by the map. Even worse, they are not the same from one run to
> another.

I don't believe this is true. When marshaling a map to JSON, the
values are ordered
lexically by key. When marshaling a struct, the values are ordered in
the same order
they are in the struct. So the output should be predictable at any rate.

Kamil Dziedzic

unread,
May 5, 2017, 7:35:48 AM5/5/17
to golang-dev
I'm another person facing this problem:/

I'm gonna throw one more example to the topic: pseudo json in MongoDB shell.
Some commands in MongoDB require order of the keys

> db.runCommand({ find: "myColl", filter: { category: "cafe" } })
{
    "cursor" : {
        "id" : NumberLong(0),
        "ns" : "test.myColl",
        "firstBatch" : [ ]
    },
    "ok" : 1
}
> db.runCommand({ filter: { category: "cafe" }, find: "myColl" })
{
    "ok" : 0,
    "errmsg" : "no such command: 'filter', bad cmd: '{ filter: { category: \"cafe\" }, find: \"myColl\" }'",
    "code" : 59,
    "codeName" : "CommandNotFound"
}

So if you Marshal the command, and Unmarshal it again you will get it in unpredictable order, which means it can't be used again as mgo cmd.

Marshaling can be done outside of the json library with something as ugly as below code (D is a slice):

func (d D) MarshalJSON() ([]byte, error) {
   
var b bytes.Buffer

   
if d == nil {
        b
.WriteString("null")
       
return nil, nil
   
}

    b
.WriteByte('{')

   
for i, v := range d {
       
if i > 0 {
            b
.WriteByte(',')
       
}

       
// marshal key
        key
, err := json.Marshal(v.Name)
       
if err != nil {
           
return nil, err
       
}
        b
.Write(key)
        b
.WriteByte(':')

       
// marshal value
        val
, err := json.Marshal(v.Value)
       
if err != nil {
           
return nil, err
       
}
        b
.Write(val)
   
}

    b
.WriteByte('}')

   
return b.Bytes(), nil
}


but Unmarshaling seems to be impossible without changes in json package.

So, anybody tried to solve this problem?


Best, Kamil Dziedzic

roger peppe

unread,
May 5, 2017, 10:36:39 AM5/5/17
to Kamil Dziedzic, golang-dev
On 5 May 2017 at 10:13, Kamil Dziedzic <arv...@gmail.com> wrote:
> I'm another person facing this problem:/
>
> I'm gonna throw one more example to the topic: pseudo json in MongoDB shell.
> Some commands in MongoDB require order of the keys
>
>> db.runCommand({ find: "myColl", filter: { category: "cafe" } })
> {
> "cursor" : {
> "id" : NumberLong(0),
> "ns" : "test.myColl",
> "firstBatch" : [ ]
> },
> "ok" : 1
> }
>> db.runCommand({ filter: { category: "cafe" }, find: "myColl" })
> {
> "ok" : 0,
> "errmsg" : "no such command: 'filter', bad cmd: '{ filter: { category:
> \"cafe\" }, find: \"myColl\" }'",
> "code" : 59,
> "codeName" : "CommandNotFound"
> }
>
> So if you Marshal the command, and Unmarshal it again you will get it in
> unpredictable order, which means it can't be used again as mgo cmd.

Ignoring the fact that this is a terrible feature of MongoDB,
this is a large part of why gopkg/mgo.v2/bson provides the bson.D type - it
can be used to provide keys in an arbitrary order.

If you're going to talk to MongoDB in Go, I'd strongly suggest using
the above package rather than producing commands for the mongodb
shell (strictly speaking the shell doesn't even produce/accept JSON,
as it has extra types for its values).
You could probably use the Decoder type if you really needed to do this.

roger peppe

unread,
May 5, 2017, 10:38:29 AM5/5/17
to Kamil Dziedzic, golang-dev
One more thing: structs do preserve ordering of keys, and I'm not
aware of any values printed by the mongo shell where ordering
matters, so assuming you know what you're wanting to produce,
you should be OK just declaring a struct type for the value you
want to print.

Kamil Dziedzic

unread,
May 5, 2017, 12:03:16 PM5/5/17
to roger peppe, golang-dev
On 5 May 2017 at 16:36, roger peppe <rogp...@gmail.com> wrote:
Ignoring the fact that this is a terrible feature of MongoDB,

Yes, exactly, but they won't fix that probably ever.
So I have no influence over this external resource, but I need to deal with it.
 
this is a large part of why gopkg/mgo.v2/bson provides the bson.D type - it
can be used to provide keys in an arbitrary order.

Yes, I'm using it and I tried to use bson.D but it won't solve my problem for two reasons:

1. Marshaling bson.D produces json array `[]` and there is no way to Unmarshal json object `{}` to bson.D
2. Even if I would be ok with Marshaling bson.D to json array, current implementation is broken and doesn't work with nested objects

In any case 2. is not really acceptable to me as I need to Unmarshal from:
`{ find: "myColl", filter: { category: "cafe" } }`
not from
`[{"Name":"find","Value":"myColl"},{"Name":"filter","Value":[{"Name":"category","Value":"cafe"}]}]`

off topic: bson includes source of `json` package as internal package, and then some modifications were applied. I'm curious how much it diverged over time from original `json` package. They did that probably because extending json package was impossible (I have the same issue with allowing for ordered objects) so they forked it and modified it - in other words both duplicate a lot of code. Just a digression that I'm quite skeptical that it's still in sync with original package - it's kinda weird to me that both packages don't share the same code, but rather fork was done.

 
If you're going to talk to MongoDB in Go, I'd strongly suggest using
the above package rather than producing commands for the mongodb
shell (strictly speaking the shell doesn't even produce/accept JSON,
as it has extra types for its values).


Correct, but if I already have json which I want to use I need first Unmarshal it to bson.D which is impossible at the moment because you cant Unmarshal json object as map and keep the order of keys. It's the same code for bson and json. Both don't support this. Same issue.
 
You could probably use the Decoder type if you really needed to do this.

Hah, I took second look and somehow I missed Token() and More() funcs - I will dig into this and see if this helps - thanks!

Still though I'm surprised so many funcs in json package are internal, it would be pretty easy to extend the code if some functionality was declared as public. It's also hard to trust that bson is up to date with json package since it cloned it... well easy to check:
https://github.com/golang/go/tree/master/src/encoding/json last change February 2017
https://github.com/go-mgo/mgo/tree/v2-unstable/internal/json May 2016
 


Best, Kamil Dziedzic

Russ Cox

unread,
May 5, 2017, 12:26:20 PM5/5/17
to Kamil Dziedzic, roger peppe, golang-dev
On Fri, May 5, 2017 at 12:03 PM, Kamil Dziedzic <arv...@gmail.com> wrote:
Still though I'm surprised so many funcs in json package are internal, it would be pretty easy to extend the code if some functionality was declared as public.

And then we'd never be able to change those internal details again.

Russ

Kamil Dziedzic

unread,
May 6, 2017, 6:31:38 PM5/6/17
to roger peppe, golang-dev
You could probably use the Decoder type if you really needed to do this.

Did anyone actually tried to do this using Decoder?

Is see patch https://go-review.googlesource.com/c/7930/ got rejected in favor of Decoder https://github.com/golang/go/issues/6050 but I don't see how one can use Decoder to do this.

The problem is that both Token() and Decode() consume stream.

I thought I will use Token() to check if it's equal to Delim(`{`) and then range over elements and Decode them to bson.D. But if value of one of the elements is another nested json object then Decode will decode it to map[string]interface{} so all nested objects will be again unordered. If Decode() wouldn't consume the stream I could decode first, check if it's map[string]interface, and if so then I could re-decode it to bson.D. However since first Decode already consumed stream I can't retry the operation.

Or if Token() wouldn't consume stream then I could check if next element is object and if so then Unmarshal/Decode it as whole to bson.D.

Sadly both functions consume stream so there is no way to check if next element is object/array and then decide if parse manually or let Decode() do it's job.

Am I missing something? Did anyone verified actually that Decoder really surpasses this patch: https://go-review.googlesource.com/c/7930 ?


Best, Kamil


On 5 May 2017 at 16:36, roger peppe <rogp...@gmail.com> wrote:

Matt Harden

unread,
May 6, 2017, 7:11:47 PM5/6/17
to Kamil Dziedzic, roger peppe, golang-dev
Why do you care about nested objects in this case? Isn't it just the toplevel object that needs the order preserved, and is it just the first key/value pair that has to come first? That seems relatively easy to handle.

--
Reply all
Reply to author
Forward
0 new messages