Lack of hash/map type

874 views
Skip to first unread message

Golly

unread,
Feb 17, 2012, 4:01:39 AM2/17/12
to JSON Schema
Hi all, first time poster here :)

So today I read through the spec and started using it, and after a
little frustration I have a proposal.

I think schema definitions would benefit greatly by adding a new type
for hashes/maps, and to be precise, I mean a collection of keys and
values with the keys being unique. Every programming language has this
is some form and I see that the array type is in the spec.
Is there a good reason that a hash type was omitted?

I've been working around this by writing schema like this:

type: object
patternProperties: {
'^\d+$' => {type: string, required: true}
}
additionalProperties: false,

That's a simple example but once you start trying to define nested
hashes it becomes so unclear. I think this could be exponentially
clearer and easier if we had keys/values properties and admittedly of
less importance, a type: hash.
Example:

type: hash
keys: {type: integer}
values: {type: string}

What are your thoughts?

Thanks,
DB

Francis Galiegue

unread,
Feb 18, 2012, 12:50:38 PM2/18/12
to json-...@googlegroups.com

This:

{
4: "a"
}

is not legal JSON, keys in an object must be strings.

On the other hand:

{
"a": "b",
"a": "c"
}

_is_ valid JSON. What the JSON spec does NOT say is what value should
be taken into account. Some JSON parser implementations will take the
first, or the last, or any other in between -- I even recall one JSON
parser aggregating such values in an array (["a", "b"])!

Basically, it means you can NOT rely on the JSON parser behaving in
one particular way.

--
Francis Galiegue, fgal...@gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)

Xample

unread,
Feb 18, 2012, 4:35:26 PM2/18/12
to JSON Schema
Could you provide a real life usage of this ?
In most of the cases,a hash is supposed to be use for optimization,
nothing prevents you to name a key as your hash. Moreover, while using
a string as the key, most of the json engines are smart enough already
to index them using the hash of the string (to be then able to
retrieve it's value quickly.)

David

unread,
Feb 20, 2012, 1:05:01 AM2/20/12
to JSON Schema
Thanks for the input Francis and Xample.
I didn't realise that property names were restricted to strings. Oops!
Nor did I realise JSON itself isn't clear on how exactly to handle
multiple multiple values per property.

Ok now I have a new, different proposal :D
The schema spec contains a uniqueItems attribute. I think we need
another kind uniqueness attribute.

Let me explain: The reason I was so keen on using using hashes is
because it guarantees (key) uniqueness and because a lot of data that
I deal with has some form of unique id that relates a bunch of
supplementary information.
Example, user contact details (here's that real-life example, Xample):
  "user_contacts": {
     "<user_id>": {
        "he...@gmail.com": {"types": ["email","home"]}
        "he...@hotmail.com": {"types": ["email","work"]}
        "+61434555555": {"types": ["phone","mobile"], "priority": 50}
    },
    ...
  }

Now if I rearrange this according to this new information I end up
with this:
  "user_contacts": [
    {"user_id": 3,
    "contacts": [
      {"contact": "he...@gmail.com", "types": ["email","home"]},
      {"contact": "he...@hotmail.com", "types": ["email","work"]},
      {"contact": "+61434555555", "types": ["phone","mobile"],
"priority": 50},
     ]
    },
    ...
    ]

Now, in reality, I do have real-life constraints regarding the schema.
* User_id's can only be specified once (i.e. my app wont aggregate
separate blocks for the same user id)
* Each kind of contact can only be specified once but reasons
mentioned about.
Thus the following JSON would be invalid
   ...
      {"contact": "he...@gmail.com", "types": ["email","home"]},
      {"contact": "he...@gmail.com", "types": ["email"], "priority":
30},
   ...

Currently there doesn't seem to be a way of declaring those kind of
real-life constraints, which does seem to be a design goal (i.e.
attributes such minimum and uniqueItems exist).
What do you think about a new, similar attribute to address said
problem?
i.e.
"contact" shall be unique within the scope of the "contacts" array.
"user_id" shall be unique within the scope of the "user_contacts"
array.








On Feb 19, 4:50 am, Francis Galiegue <fgalie...@gmail.com> wrote:
> This:
>
> {
>     4: "a"
>
> }
>
> is not legal JSON, keys in an object must be strings.
>
> On the other hand:
>
> {
>     "a": "b",
>     "a": "c"
>
> }
>
> _is_ valid JSON. What the JSON spec does NOT say is what value should
> be taken into account. Some JSON parser implementations will take the
> first, or the last, or any other in between -- I even recall one JSON
> parser aggregating such values in an array (["a", "b"])!
>
> Basically, it means you can NOT rely on the JSON parser behaving in
> one particular way.
>
> --
> Francis Galiegue, fgalie...@gmail.com
> "It seems obvious [...] that at least some 'business intelligence'
> tools invest so much intelligence on the business side that they have
> nothing left for generating SQL queries" (Stéphane Faroult, in "The
> Art of SQL", ISBN 0-596-00894-5)




David

unread,
Feb 20, 2012, 1:09:31 AM2/20/12
to JSON Schema
A quick & nasty solution idea:
Uniqueness with variable scope could be declared using 6.2.1. slash-
delimited fragment resolution (http://tools.ietf.org/html/draft-zyp-
json-schema-03#section-6.2.1)

Example:
uniqueWithin: "#/foo/anArray"

Xample

unread,
Feb 20, 2012, 9:39:28 AM2/20/12
to json-...@googlegroups.com
It looks like you come from the XML world :-) Try to think as if you had to code a class with refs to other classes. i.e.

var users = [
{
  "id":1234,
  "contacts":[123,456,789],
  "emails":
     {
     "home":["fi...@mail.com","sec...@mail.com"],
     "work":null
     }
"phones":
     {
     "home":["+123456789"],
     "work":["+1987654321"]
     }
},
{… another user object …}
]
Here each contact have it's own information and references to other contacts. This said, you can also use the json pointers to make those id maps (still do it wisely and prefer archiving the contacts into an object and not an array as indexes could change). This would give (assuming all the users are referenced under their id in an object "users" ):

{
  "contacts":["#/users/12345","#/users/456","#/users/789"],
}



David

unread,
Feb 21, 2012, 11:01:43 PM2/21/12
to JSON Schema
Haha no, not the XML world. Of XML, I am not a fan :P

Thanks for the example but my issue isn't really that I'm not happy
with the way I'm representing my data in JSON, it's that the current
JSON schema spec allows uniqueness to be made a requirement, but only
in the context of an array. What I'd like to do is have a way to
broaden the scope of that uniqueness requirement hence my suggestion.

You showed some very different approaches in your example which is
cool, but unfortunately it still suffers from the same problem. Even
if you declared all email addresses in a unique array and used json
pointers like to you mentioned, there's nothing stopping a single
email address from being referenced, or pointed-to twice.

Xample

unread,
Feb 24, 2012, 3:20:21 AM2/24/12
to json-...@googlegroups.com
Does it really matter ? A mail can be used in several domains at the same time.
Otherwise, as the keys in a json object are supposed to be unique. I would have done something like
emails:
{
"a...@email.com":{"place":"work"},
"ano...@email.com":{"place":"home"},
}

Try using the uniqueness of a key as much as possible instead of dealing with arrays, you can still enumerate through all the keys of a dictionary the same way as an array.


David

unread,
Feb 27, 2012, 5:09:29 PM2/27/12
to JSON Schema
In my example with email addresses it doesn't look like it matters but
I'm just trying to give any kind of example here.
Forgetting this example though, there are definitely cases where it
does matter, especially when unique ids are involved.

Imagine an array of objects, each object has a unique id and you want
to ensure that each object is only included in the array once.
We can't use uniqueItems because the objects are all different, what
we need is some way of saying the id of each object must be unique.
Without this ability there are two problems it can potentially cause:
1) If two objects share the same unique id, the behaviour of the
consumer becomes non-deterministic. Which one gets processed, first or
last? It becomes unpredictable unless...
2) The burden of handling duplicate objects is consistently pushed on
to the consumer when in this case, they'd just prefer to reject it.
Instead they need to:
i - Check for duplicates and reject manually.
ii - Check for duplicates and ignore manually.
iii - Do nothing at the cost of performance degradation. (i.e. say
they are persisting the objects then they create a transaction, write
write write to db, save, next object and then just replace all that
work that they did).

Seeing as JSON schema spec already has uniqueItems we just need to
increase the scope of that check. It's currently hardcoded to array
elements. If we had a "uniqueWithin" element that allowed a user
defined scope (either relative "../../" or absolute "#/my/stuff") then
many valid real-world validations would be easy.

Francis Galiegue

unread,
Feb 28, 2012, 3:25:23 AM2/28/12
to json-...@googlegroups.com
On Mon, Feb 27, 2012 at 23:09, David <japg...@gmail.com> wrote:
> In my example with email addresses it doesn't look like it matters but
> I'm just trying to give any kind of example here.
> Forgetting this example though, there are definitely cases where it
> does matter, especially when unique ids are involved.
>
> Imagine an array of objects, each object has a unique id and you want
> to ensure that each object is only included in the array once.
> We can't use uniqueItems because the objects are all different, what
> we need is some way of saying the id of each object must be unique.

Yep, that'd be nice to have... The problem is how to specify it.

Maybe using JSON Pointer, we could do something: complement
uniqueItems and add uniqueProperties for objects.

Let's take uniqueItems as an example. We could specify:

* uniqueItems: false (the current default);
* uniqueItems: # (equivalent of uniqueItems: true);
* uniqueItems: #/some/path

So, for instance, if you have items like:

{
"email": "so...@value.here"
}

in an array and want email to be unique, you could write:

{
"type": "array",
"items": {
"type": "object",
//etc
},
"uniqueItems": "#/email"
}

The rule would also be that if a JSON Pointer is dangling for a value
(item in an array, or values in an object), the instance is considered
invalid.

Comments?

--
Francis Galiegue, fgal...@gmail.com

Xample

unread,
Feb 29, 2012, 1:06:26 PM2/29/12
to JSON Schema
* uniqueItems: false (the current default);
* uniqueItems: # (equivalent of uniqueItems: true);
* uniqueItems: #/some/path

// Ok, however I do not like the way we define a rule "from outside",
I think we should keep the rules within the same schema scope.

MongoDB uses a map reduce for those kind of job:
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/

Does someone knows if something similar exists on json schema ?

Would something as below (i.e. creating a content on validation) be
relevant :
{
"$content":"list#/some/path", // would create an array of all the
nodes at the right path
"uniqueItems":true
}

?


On Feb 28, 9:25 am, Francis Galiegue <fgalie...@gmail.com> wrote:
> On Mon, Feb 27, 2012 at 23:09, David <japgo...@gmail.com> wrote:
> > In my example with email addresses it doesn't look like it matters but
> > I'm just trying to give any kind of example here.
> > Forgetting this example though, there are definitely cases where it
> > does matter, especially when unique ids are involved.
>
> > Imagine an array of objects, each object has a unique id and you want
> > to ensure that each object is only included in the array once.
> > We can't use uniqueItems because the objects are all different, what
> > we need is some way of saying the id of each object must be unique.
>
> Yep, that'd be nice to have... The problem is how to specify it.
>
> Maybe using JSON Pointer, we could do something: complement
> uniqueItems and add uniqueProperties for objects.
>
> Let's take uniqueItems as an example. We could specify:
>
> * uniqueItems: false (the current default);
> * uniqueItems: # (equivalent of uniqueItems: true);
> * uniqueItems: #/some/path
>
> So, for instance, if you have items like:
>
> {
>     "email": "s...@value.here"
>
> }
>
> in an array and want email to be unique, you could write:
>
> {
>     "type": "array",
>     "items": {
>         "type": "object",
>         //etc
>     },
>     "uniqueItems": "#/email"
>
> }
>
> The rule would also be that if a JSON Pointer is dangling for a value
> (item in an array, or values in an object), the instance is considered
> invalid.
>
> Comments?
>
> --
> Francis Galiegue, fgalie...@gmail.com
Reply all
Reply to author
Forward
0 new messages