New list of proposals

Frederick Giasson

unread,

Nov 11, 2009, 4:05:49 PM11/11/09

to bib...@googlegroups.com

Hi everybody,

Many things have been said recently, and this email tries to some most
of them in terms of proposal.

The main minding behind these proposal is the following one: put as
little burden on the shoulders of the data publisher.

This means having as little conventions as possible in the "dataset"
file (where records are described).

The way we reach this goal is by putting more logic in the "structure"
schema.

Here is what we would propose to do....

=================

Cardinality constraints:
-----------------------------

Cardinality constraints specify the number of minimum and maximum values
that can be specified for an attribute. "minValues" and "maxValues" are
proposed for these attributes that are part of the structure schema.
"author"'s schema description would looks like:

"author": {
"prefLabel": "author",
"description": "The name(s) of the author(s) (in the
case of more than one author, separated by and)",

"allowedType": "Document",
"allowedValue": ["String", "Person"],

"minValues": "1"
}

With such a structure schema, it means that one to multiple values can
be defined for this attribute.

Note: no requirement statement is entailed by the minValues, nor the
maxValues statements. More about requirements bellow.

Ordering constraints:
--------------------------

Much discussion arose around "ordering of arrays". Ordering is not
something that should be defined when we describe records. Ordering
constraints should be defined in the structure schema that defines the
usage of attributes that describe records. That way, we don't have to
add complex structural and syntactic conventions that would burden the
data publisher.

So, what we propose is to add a "orderValues" structure schema attribute
that could have two value "ordered" and "unordered". By default, the
values of any attribute are *unordered*. Lets take the "author" example:

In the structure schema, we would have this definition for the "author"
attribute:

"author": {
"prefLabel": "author",
"description": "The name(s) of the author(s) (in the
case of more than one author, separated by and)",

"allowedType": "Document",
"allowedValue": ["String", "Person"],

"minValues": "1",

"orderedValues": "ordered"
}

This means that if the data publisher defined multiple authors for the
"author"attribute, the system that will process that information would
keep the order of these authors since the "author" attribute specify
that if more than one values is specified for this attribute, the order
as to be saved.

Lets take this record example:

{
"id": "ap79",
"type": "Article",
"author": [
{
"name": "David Aldous",

"ref": "@Aldous_David"
},
{
"name": "James Pitman",

"ref": "@Pitman_jim"
}
],

"series": {
"name": "Annals of Probabiliity",

"ref": "@Annals_of_Probability"
},

"title": "On the zero-one law for exchangeable events",
"year": "1979",
"subject": "Markov chain, Coupling, Zero one law
Exchangeable events",
"volume": "7",
"pages": "704-723",
"mrClass": "60F20 (60J10)",
"mrid": "MR537216"
}

In this example, the list of "author" is ordered. So the system that
will ingest this data will have to keep that order when it will save &
manipulate the data, since it is what has been specified in the
structure schema.

If no order is specified in the schema, then they are *unordered* lists
by default.

Note: I won't go forward with the "0", "1", proposition we discussed
about in earlier threads on this forum. This is putting too much burden
on the shoulders of the data publishers for no real gain. This really
has to be part of the description of the schema.

Requirement constraints:
-------------------------------

There as been some discussion about the possibility to state that a list
of attributes is required to describe a particular type of record.
Requirements is described at the level of the "type". So, we have to
specify that when a data publisher publish records of certain type(s),
using a certain structure schema, that he has to use a certain number of
attributes to describe these records in order to comply with the schema.

Here is an example of the definition of a "Person" record type for a
specific structure schema:

"Person": {
"prefLabel": "person",
"description": "A person, usually in the role of
creator, editor or maintainer of a document",

"requiredAttribute": ["name", "gender"]
}

This means that for a given dataset that links to such a structure
schema, all Person records of this dataset have to be described with *at
least* the "name" and "gender" attributes.

=================

Tell me what you think about these propositions!

Thanks,

Take care,

Fred

Benjamin Kalish

unread,

Nov 11, 2009, 4:35:42 PM11/11/09

to bib...@googlegroups.com

Hi Fred,

I think these look like good solutions. I like the idea of specifying
whether lists are ordered or unordered in the schema.

We will have to be careful in specifying how we wish empty strings,
lists, and objects to be handled when considering both minValues and
requiredAttributes. Is

{ "type" : "Person", "name" : "", "gender" : "" }

a valid Person according in your example? And what about an author
attribute set as:

"author" : [ "" ]

Perhaps this would be the kind of record for which parsers would
generate a warning but not an error.

Benjamin Kalish
4 Lawn Ave, Apt 2L
Northampton, MA 01060-2221
Phone: 413-687-7738
Email: bka...@gmail.com

Frederick Giasson

unread,

Nov 11, 2009, 4:39:51 PM11/11/09

to bib...@googlegroups.com

Hi!

> I think these look like good solutions. I like the idea of specifying
> whether lists are ordered or unordered in the schema.
>
> We will have to be careful in specifying how we wish empty strings,
> lists, and objects to be handled when considering both minValues and
> requiredAttributes. Is
>
> { "type" : "Person", "name" : "", "gender" : "" }
>
> a valid Person according in your example? And what about an author
> attribute set as:
>
> "author" : [ "" ]
>
> Perhaps this would be the kind of record for which parsers would
> generate a warning but not an error.
>

Agreed.

I think the solutions outlined here, and the fact that all these things
are defined in the schema, makes a good balance between simplicity,
usability and expressiveness.

Thanks,

Take care,

Fred

Jack Alves

unread,

Nov 12, 2009, 3:19:06 PM11/12/09

to bib...@googlegroups.com

One thing I'm unsure about is whether cardinality and order constraints should be specified in the attribute. This means that any type that uses the attribute is constrained in the same way.

I'm not used to having attribute definitions independent of type definitions. I don't know that it is bad. I'm just not sure what limitations come with this model. This discussion thread highlights related issues. "requiredAttribute" is the first case where a type definition explicitly refers to independently defined attributes.

My question is whether it is important to support metadata specific to the binding of a type and attribute. I don't know if this functionality is important for bibliographic schema. It may also be possible to add the functionality later if necessary using techniques like what is used for "requiredAttribute".

jack

Frederick Giasson

unread,

Nov 12, 2009, 3:38:09 PM11/12/09

to bib...@googlegroups.com

Hi Jack!

> One thing I'm unsure about is whether cardinality and order
> constraints should be specified in the attribute. This means that any
> type that uses the attribute is constrained in the same way.
>
> I'm not used to having attribute definitions independent of type
> definitions. I don't know that it is bad. I'm just not sure what
> limitations come with this model. This discussion thread highlights
> related issues. "requiredAttribute" is the first case where a type
> definition explicitly refers to independently defined attributes.

This is a good observation, and here are some supplemental information
to understand some tradeoffs.

In XML schemas, it is what they do if my memory is good: they define
ComplexTypes, and a complex type is defined by its attributes and all
their attributes such as: required, etc. But there is a reason why they
called these things: ComplexType.... because they can become quite
complex :)

Right now, the attributes are described in such a way that it is the
definition of these attributes that "constrain" their usage (with the
allowedValue, allowedType attributes, and now minValues, maxValues and
orderedValues). The logic is certainly different than the one of system
such as the XML schemas.

I think that the beauty of processing that way is that users and systems
implementators will use smaller and simpler constructs to create their
schemas.

But I hear what you say here: lets say we have an attribute A, with some
cardinality and order constraints. You can use that attribute to define
types X and Y. However, you create a new type W, which would like to
re-use attribute A, but for a different purpose, with different
cardinality and order constraints. What do you do?

Well, the minding I personally have here is: if such a constrains are
that different, is A really the good attribute to describe W? Maybe it
is just a matter of creating another attribute B with these constraints.

That way, all defined attributes are purposefully created while keeping
the schema as simple as possible (even if fewer attributes). You don't
have to create and maintain these complex types which is a major benefit.

I really think these are too different modeling practices, with good and
bad.

About the "requiredAttribute" attribute that is "attached" to a type,
then this is another question. The attribute allowedType is enough to
know how to use a vocavulary, because you know how an attribute can be
used, to describe what kind of record. However, there are usecases where
something like "requiredAttribute" is helpful when a system ingest datasets.

Lets say you ingest some dataset on the BKN People node. You are
expecting all people to have at least a name and a homepage, and all
document to have a title, abstract and some kind of ID. Without
something like "requiredAttribute", we have nothing to model this,
except if you hard code this in the system. And if such "requirement"
needs are not specified somewhere, some UI pages can look funky because
of key missing pieces of data.

Otherwise, would you have something similar/related to suggest? These
are still proposals :)

> My question is whether it is important to support metadata specific to
> the binding of a type and attribute. I don't know if this
> functionality is important for bibliographic schema. It may also be
> possible to add the functionality later if necessary using techniques
> like what is used for "requiredAttribute".

So, you are questioning the utility (at the moment), of only
requiredAttribute, or all the others as well?

Thanks,

Take care,

Fred

Jack Alves

unread,

Nov 12, 2009, 7:44:49 PM11/12/09

to bib...@googlegroups.com

My comments are meant to expose questions that may not be obvious to everyone. I'm poking at the model and raising questions as I notice things. I don't have a strong bias toward a specific design. Generally, simple is better as long as we don't set a trap.

Frederick Giasson

unread,

Nov 12, 2009, 7:51:31 PM11/12/09

to bib...@googlegroups.com

Hi Jack!

> My comments are meant to expose questions that may not be obvious to
> everyone. I'm poking at the model and raising questions as I notice
> things. I don't have a strong bias toward a specific design.
> Generally, simple is better as long as we don't set a trap.

Hoo sure, don't get me wrong, I appreciate these new perspectives and
these kind of meaningful questions that always put a model in different
perspective. These were my current observations according to other
models, maybe you have a different viewpoint because of your work with
Freebase, so if something doesn't hold, or if you see more important
issues, or ways to make things even simpler or more effective, than I am
open anything :)

Thanks and please continue to rise such questions and perspectives!

Take care,

Fred

Reply all

Reply to author

Forward