alternate display for current schema

8 views
Skip to first unread message

Jack Alves

unread,
Nov 12, 2009, 8:04:54 PM11/12/09
to bib...@googlegroups.com
I know the current schema is in review and changes are planned. While I was reviewing the schema I thought it would be helpful to see attributes displayed with their associated type. So I wrote (hacked together) a little script. The output shows each type with its attributes, and attributes of its subtypes. Types are in listed in alphabetical order. The first type is,

TYPE: academic article (AcademicArticle) - A scholarly academic article, typically published in a journal.
subTypeOf: [Article]
subTypeOf: [Document]
...

This shows that academic article is a subTypeOf Article which is a SubTypeOf Document. The listing then displays all the attributes of Document.


One type I don't understand is,

TYPE: website (Website) - A group of Webpages accessible on the Web.
subTypeOf: [Collection]
    owner (owner)                                     <Person, Organization>
    editor (editor)                                   <String, Person>

I expected website to have a URL as a property of the type or a subtype. I don't know how this type is intended to be use. I haven't review the schema in detail yet. I'll look for things I don't understand. The schramm dataset doesn't use many of the types in the schema. I know everyone is waiting for the next rev of the spec with a dataset but I think some review the current spec is valuable. I don't expect the entire structure to change. It would be best to expose as many questions as possible now.

I attached the output, my funky python script, and the bibjson_schema.json I used to generate the output.

jack






Jack
bibout.txt
bibjson_format_schema.py
bibjson_schema.json

Frederick Giasson

unread,
Nov 13, 2009, 10:55:52 AM11/13/09
to bib...@googlegroups.com
Hi Jack!
> I know the current schema is in review and changes are planned. While
> I was reviewing the schema I thought it would be helpful to see
> attributes displayed with their associated type.
Certainly
> So I wrote (hacked together) a little script. The output shows each
> type with its attributes, and attributes of its subtypes. Types are in
> listed in alphabetical order. The first type is,
>
> TYPE: academic article (AcademicArticle) - A scholarly academic
> article, typically published in a journal.
> subTypeOf: [Article]
> subTypeOf: [Document]
> ...
>
This is great!

You clearly understood the usage of the allowedType attribute. So, all
the attributes listed under each type are the attributes one can use to
describe a record of that type.

If we take an example such as:

TYPE: library (Library) - A library organization
subTypeOf: [Organization]
fax (fax) <String : phone>
name (name) <String>
address (address) <String>
postal code (postalCode) <String>

Then is means that the attributes "fax", "name", "address" and
"postalCode" can be used to describe a record of type "Library". This is
what we can find out by *only* using the current BibJSON schema.
However, we could think about linking to other vocabularies that would
better define organizations such as a Library. It is where the concept
of re-use of vocabularies take place.


This output also point to isssues in the current description of the
schema. Lets take a Magazine. Only by quickly looking at this file, I
noticed that something was wrong with this one:


TYPE: magazine (Magazine) - A periodical of magazine Articles. A
magazine is a publication that is issued periodically, usually bound in
a paper cover, and typically contains essays, stories, poems, etc., by
many writers, and often photographs and drawings, frequently
specializing in a particular subject or area, as hobbies, news, or sports.
subTypeOf: [Periodical]
subTypeOf: [Collection]
owner (owner) <Person, Organization>
editor (editor) <String, Person>


A Magazine is a periodical, and a collection of articles. However, it is
not normal that we only have owner and editor for this type of record!

It should at least have a title, a creation date, etc. (not an author
since the author is defined at the level of the articles, part of the
magazine periodical)

If we check how the "title" attribute has been defined:

"title": {
"prefLabel": "title",
"description": "The title of the work",

"allowedType": "Document",
"allowedValue": "String"
},


We see that the "allowedType" is Document only. So, we should enhance
the vocabulary to include a Collection as well (since a collection of
things can have a title) We would have something like:


"title": {
"prefLabel": "title",
"description": "The title of the work",

"allowedType": ["Document", "Collection"],
"allowedValue": "String"
},


Now, since a Magazine is a Periodical and that a Periodical is a
Collection, then it means that the "title" would appear in the list of
usable attribute to describe a magazine record.


So, this is a good example of how the current BibJSON vocabulary has to
be enhanced.



> One type I don't understand is,
>
> TYPE: website (Website) - A group of Webpages accessible on the Web.
> subTypeOf: [Collection]
> owner (owner) <Person,
> Organization>
> editor (editor) <String, Person>
>
> I expected website to have a URL as a property of the type or a
> subtype. I don't know how this type is intended to be use. I haven't
> review the schema in detail yet. I'll look for things I don't
This comes from BIBO, and maybe unnecessary for BibJSON.

A Website is an aggregation (collection) of Webpages. A webpage is
considered to be a document, and a website is a collection of such
documents. A Website doesn't really have any URL attribute to described
it. It is more a logical container defined by its ID which we could
consider its domain name. But each Webpage has (and most have) a URL
identifier.

So, lets take that example:


{
"id": "page_1",
"title": "Web page A",

"isPartOf": "@website"
}

{
"id": "page_2",
"title": "Web page B",

"isPartOf": "@website"
}

{
"id": "website",
"title": "Website AB",
}



It is how it is intended to be used.

The granularity of this is that you can describe any information you
want about the actual web pages (URL, content, creator, creation date,
last update date, etc, etc, etc)

Then, you can describe information about the logical container of these
web pages (the Website), like: its general focus, relations to the
organization that maintain it, etc, etc, etc.



> understand. The schramm dataset doesn't use many of the types in the
> schema.
No. BibJSON is meant to be expressive enough to be able to describe
general bibliographic data usecases.
> I know everyone is waiting for the next rev of the spec with a dataset
> but I think some review the current spec is valuable.
I totally agree
> I don't expect the entire structure to change. It would be best to
> expose as many questions as possible now.
>
Sure, but it always come back to keep one distinction in mind:


(1) defining the vocabulary (what we are talking about here)

and

(2) defining the notation

I think we are pretty close to have the proper notation in mind. As soon
as we agree on the notation, development can be started to update the
current implementation in the parsers, and to create the validtor, etc.

The description of the vocabulary doesn't have to be finished before
starting these efforts, in fact, starting these efforts will helps
defining the vocabulary, as you shown with your tools. These are two
different activities. And also, I don't think the BibJSON vocabulary
will ever be finished. A vocabulary always evolve over time :)




Also, about your note in the bibout.txt file "NOTE: Type "Object" is not
explicitly defined in the BibJSON schema.
The "Object" type in this document is included to show attributes that
refer to it.".

This comes from the specification itself. "Object" is the root "type".
If "Object" is mentioned, then it means that any kind of record can be
used (in the context of the attributes allowedValue and allowedType for
example).

If allowedType or allowedValue is not specified in the schema, "Object"
is considered to be these allowedType and allowedValue.



Thanks!


Take care,


Fred

Benjamin Kalish

unread,
Nov 13, 2009, 6:53:09 PM11/13/09
to bib...@googlegroups.com
Hey all,

I'm going to take this opportunity to share some of the nitpicky type
notes about attributes that I took when I first read the Spec. I took
a very different approach from Jack, but I think the spirit is
similar. First though, how many of us have access to Google Wave? I
suspect it would be a good environment for discussing details of this
type.

1. The attribute 'author' should have 'Person', 'Array(Person)',
'Organization', 'Array(Organization)', and 'Array(String)' as allowed
value values, in addition to 'String'.
2. The attribute 'associatedDepartment' should have 'String',
'Array(String)' and 'Array(Department)' as allowed value, in addition
to 'Department'.
3. The attribute 'booktitle' should have 'Array(String)' as an
allowed value in addition to 'String'. A method (perhaps a separate
attribute) should be provided for specifying the authoritative tile.
4. The attribute 'chapter' should not be allowed for the the type
'Document' but only for the types 'Chapter' and perhaps
'DocumentPart'.
5. The attribute 'cites' should have 'Array(Document)' as an
allowed value in addition to 'Document'.
6. The attribute 'editor' should have 'Array(Person)' and
'Array(String)' as allowed values in addition to 'Person' and
'String'.
7. The attribute 'hasPart' should have 'Array(Object)' as an
allowed value in addition to 'Object'.
8. 'interviewer' and 'interviewee' are listed as attributes of a
person but they should be attributes of 'Interview' or
'InterviewPage'.
9. The attribute 'institution' should have 'Array(String)' and
'Array(Organization)' as allowed values in addition to 'String' and
'Organization'. (A document might have a printer, a distributor, and a
sponsor, for example.)
10. The attribute 'name' should have 'Array(String)' as an allowed
value in addition to 'String'.
11. The attribute 'note' should be allowed for any type of object.
12. The attribute 'organization' should be not be allowed for
'Document'. It's allowed type should instead be 'Event'.
13. The attribute 'presents' should have 'Array(Document)' as an
allowed value in addition to 'Document'.
14. The attribute 'subject' should have 'Array(Object)' and
'Array(String)' as allowed values in addition to 'Object' and
'String'.
15. The attribute 'title' should have 'Array(String)' as an allowed
value in addition to 'String'.
16. The attribute 'prefUrl' should not have the allowed value
'Array(String)'. Allowing multiple values for this attribute makes it
redundant with the 'href' attribute.
17. The attribute 'uri' should not have the allowed value
'Array(String)'. The description for this attribute specifies that it
is to be used for canonical URIs. Canonical URIs are, by definition,
unique.

Ben

Frederick Giasson

unread,
Nov 16, 2009, 11:02:37 AM11/16/09
to bib...@googlegroups.com
Hi!
> 1. The attribute 'author' should have 'Person', 'Array(Person)',
> 'Organization', 'Array(Organization)', and 'Array(String)' as allowed
> value values, in addition to 'String'.
>
Good.

Note: cardinality is handled (at least based on our recent discussions)
on the cardinality attributes of the schema.

So, the only missing allowedValues is Organization.
> 2. The attribute 'associatedDepartment' should have 'String',
> 'Array(String)' and 'Array(Department)' as allowed value, in addition
> to 'Department'.
>
Yup, Addition of "String" if you want to be able to describe it with a
literal only (and not only by reference to a department record)

> 4. The attribute 'chapter' should not be allowed for the the type
> 'Document' but only for the types 'Chapter' and perhaps
> 'DocumentPart'.
>
Good
> 6. The attribute 'editor' should have 'Array(Person)' and
> 'Array(String)' as allowed values in addition to 'Person' and
> 'String'.
>
Ok

> 8. 'interviewer' and 'interviewee' are listed as attributes of a
> person but they should be attributes of 'Interview' or
> 'InterviewPage'.
>
These are mean to be attributes that links an Interview with the people
that were taking place in these interviews.
> 11. The attribute 'note' should be allowed for any type of object.
>
Ok
> 12. The attribute 'organization' should be not be allowed for
> 'Document'. It's allowed type should instead be 'Event'.
>
Ok
> 13. The attribute 'presents' should have 'Array(Document)' as an
> allowed value in addition to 'Document'.
> 14. The attribute 'subject' should have 'Array(Object)' and
> 'Array(String)' as allowed values in addition to 'Object' and
> 'String'.
> 15. The attribute 'title' should have 'Array(String)' as an allowed
> value in addition to 'String'.
> 16. The attribute 'prefUrl' should not have the allowed value
> 'Array(String)'. Allowing multiple values for this attribute makes it
> redundant with the 'href' attribute.
> 17. The attribute 'uri' should not have the allowed value
> 'Array(String)'. The description for this attribute specifies that it
> is to be used for canonical URIs. Canonical URIs are, by definition,
> unique.
>

Well, many of your comments are related to cardinality. So, if we agree
on the way to handle cardinalityin the schema, I could upgrade it
according to these comments and the new cardinality method.


Thanks!


Take care,


Fred
Reply all
Reply to author
Forward
0 new messages