desktopcouch schemas and validation

1 view
Skip to first unread message

eric casteleijn

unread,
Nov 5, 2009, 2:01:43 PM11/5/09
to desktop...@googlegroups.com
I was just reminded of an issue I would like everyone interested in
desktopcouch to weigh in on, after (re)stumbling across the following
question:

https://answers.edge.launchpad.net/desktopcouch/+question/79479

Where Benoît brings up the very good point that we currently don't
really have a way of telling whether a particular json structure
conforms to any of our desktopcouch schemas defined here:

http://www.freedesktop.org/wiki/Specifications/desktopcouch/


It's an issue we have pretty much willfully ignored for the initial
release. We have defined the schemas in a very informal way and mostly
by example, and we've already run into predictable problems with this
approach, where different applications have very different ideas of how
to represent datetimes, for instance.

I think we need to formalize the schemas slightly more and also come up
with a more formal "meta" schema, to live on the above top level url,
which defines what any destopcouch record (no matter what type of record
it actually is) should conform to, so that we can use it as a template
for new record type schemas.

I emphatically don't want to take this to the level of detail of schema
languages like dtd/xsd/relaxng/schematron because if we do that we might
as well have picked an XML database. I want to be extremely pragmatic,
to the point of maybe having some simple validator code for an informal
specification be the only formal decider of whether or not something is
an instance of that specification. Nevertheless I would like to
*somewhat* formalize the informal specs we currently have, so that:

"last_change_date": "<last modification date, in UTC>",

would become something like:

"last_change_date": string, containing ISO 8601 datetime representation,
extended, UTC (of the form YYYY'-'MM'-'DD['T'HH[':'MM[':'SS]]'Z']) - the
date(time) the record was last modified.

and for a simpler field:

"title": "<note's title>",

would become:

"title" string - the title of the record

and then the page with this semiformal schema would also provide
validator code.

We'll also need to have a way to describe the mergeable list type we
invented. Something like, instead of:

"phone_numbers": {
"00b10d79-cf9e-459b-bbb1-3a01727114ed": {
"description": "home",
"number": "myHomePhone"
},
"3d56e68f-c314-43bf-8020-58f7682065b6": {
"description": "mobile",
"number": "myMobilePhone"
}}

have:

"phone_numbers": mergeable list of:
{"description": string - description of the phone number,
"number": string - the phone number}


I think desktopcouch.records should have a Validator base class that
takes care of checking whether there is a record_type, and perhaps, when
"application_annotations" is present, whether it is well-formed.

Anyone creating a new schema would be burdened with writing the informal
specificiation, and more importantly, a working validator.

The validators should probably validate json, not instantiated
desktopcouch objects.

These are very rough and initial thoughts, meant to be shot full of
holes, so have at it!

--
- eric casteleijn
https://launchpad.net/~thisfred
http://www.canonical.com

Rodrigo Moya

unread,
Nov 5, 2009, 4:57:55 PM11/5/09
to desktop...@googlegroups.com
On Thu, 2009-11-05 at 14:01 -0500, eric casteleijn wrote:
> I was just reminded of an issue I would like everyone interested in
> desktopcouch to weigh in on, after (re)stumbling across the following
> question:
>
> https://answers.edge.launchpad.net/desktopcouch/+question/79479
>
> Where Benoît brings up the very good point that we currently don't
> really have a way of telling whether a particular json structure
> conforms to any of our desktopcouch schemas defined here:
>
> http://www.freedesktop.org/wiki/Specifications/desktopcouch/
>
>
> It's an issue we have pretty much willfully ignored for the initial
> release. We have defined the schemas in a very informal way and mostly
> by example, and we've already run into predictable problems with this
> approach, where different applications have very different ideas of how
> to represent datetimes, for instance.
>
right, indeed :-)

>
> I emphatically don't want to take this to the level of detail of schema
> languages like dtd/xsd/relaxng/schematron because if we do that we might
> as well have picked an XML database. I want to be extremely pragmatic,
> to the point of maybe having some simple validator code for an informal
> specification be the only formal decider of whether or not something is
> an instance of that specification. Nevertheless I would like to
> *somewhat* formalize the informal specs we currently have, so that:
>
> "last_change_date": "<last modification date, in UTC>",
>
> would become something like:
>
> "last_change_date": string, containing ISO 8601 datetime representation,
> extended, UTC (of the form YYYY'-'MM'-'DD['T'HH[':'MM[':'SS]]'Z']) - the
> date(time) the record was last modified.
>
> and for a simpler field:
>
> "title": "<note's title>",
>
> would become:
>
> "title" string - the title of the record
>

this sounds great to me

> and then the page with this semiformal schema would also provide
> validator code.
>
> We'll also need to have a way to describe the mergeable list type we
> invented. Something like, instead of:
>
> "phone_numbers": {
> "00b10d79-cf9e-459b-bbb1-3a01727114ed": {
> "description": "home",
> "number": "myHomePhone"
> },
> "3d56e68f-c314-43bf-8020-58f7682065b6": {
> "description": "mobile",
> "number": "myMobilePhone"
> }}
>
> have:
>
> "phone_numbers": mergeable list of:
> {"description": string - description of the phone number,
> "number": string - the phone number}
>

right, much better

>
> I think desktopcouch.records should have a Validator base class that
> takes care of checking whether there is a record_type, and perhaps, when
> "application_annotations" is present, whether it is well-formed.
>
> Anyone creating a new schema would be burdened with writing the informal
> specificiation, and more importantly, a working validator.
>
> The validators should probably validate json, not instantiated
> desktopcouch objects.
>
> These are very rough and initial thoughts, meant to be shot full of
> holes, so have at it!
>

I really like it, so if there are no complains, I'll start changing the
existing schemas to follow your proposal, so that we can start writing
the validators

Manuel de la Peña

unread,
Nov 6, 2009, 2:33:07 AM11/6/09
to desktop...@googlegroups.com
Regarding the description of the telephone numbers, addresses and email addresses etc... it would be nice not to only add a more explicit definitions like you mentioned ("description": string - description of the phone number) but maybe providing a list of the different possible strings.

If you check the VCARD definition: http://www.imc.org/pdi/vcard-21.txt you will see that they provide values for the different "descriptions" of such type of data. From my point of view it woul be a good idea to follow such a specification because it will simplify the mapping between vcards and records, at less face it, with that done, you could easily write code to transform records to vcards and vice versa easily (I even propose myself to write it!!).

Maybe changing the description to (example for a telephone number) "description": string - description of the phone number; value: home | work | cell | car | pager would simplify the use of the records for other applications.

Kr,

Manuel

2009/11/5 Rodrigo Moya <rodrig...@canonical.com>

eric casteleijn

unread,
Nov 6, 2009, 8:05:56 AM11/6/09
to desktop...@googlegroups.com
Manuel de la Peña wrote:
> Regarding the description of the telephone numbers, addresses and email
> addresses etc... it would be nice not to only add a more explicit
> definitions like you mentioned (*"description": string - description of
> the phone number*) but maybe providing a list of the different possible
> strings.
>
> If you check the VCARD definition: http://www.imc.org/pdi/vcard-21.txt
> you will see that they provide values for the different "descriptions"
> of such type of data. From my point of view it woul be a good idea to
> follow such a specification because it will simplify the mapping between
> vcards and records, at less face it, with that done, you could easily
> write code to transform records to vcards and vice versa easily (I even
> propose myself to write it!!).
>
> Maybe changing the description to (example for a telephone number)
> *"description": string - description of the phone number; value: home |
> work | cell | car | pager* would simplify the use of the records for
> other applications.

Sounds good to me, at least as a possible way to limit values of a
field. Each of the schemas will then be free to choose whether and for
which fields they want to use these.

Rodrigo Moya

unread,
Nov 17, 2009, 4:33:54 PM11/17/09
to desktop...@googlegroups.com
On Thu, 2009-11-05 at 22:57 +0100, Rodrigo Moya wrote:
>
> >
> > I think desktopcouch.records should have a Validator base class that
> > takes care of checking whether there is a record_type, and perhaps, when
> > "application_annotations" is present, whether it is well-formed.
> >
> > Anyone creating a new schema would be burdened with writing the informal
> > specificiation, and more importantly, a working validator.
> >
> > The validators should probably validate json, not instantiated
> > desktopcouch objects.
> >
> > These are very rough and initial thoughts, meant to be shot full of
> > holes, so have at it!
> >
> I really like it, so if there are no complains, I'll start changing the
> existing schemas to follow your proposal, so that we can start writing
> the validators
>
I just did some changes to the wiki page, to include most of Eric's
suggestions. It still needs some better explanations (specially of what
a mergeable list is), but at least, I think, it's much clearer now

Reply all
Reply to author
Forward
0 new messages