Metadata for attributes

Jack Alves

unread,

Mar 3, 2010, 12:39:29 PM3/3/10

to bib...@googlegroups.com

Jim started a dataset that describes identifiers used in BKN datasets. Fred used the data to update the BibJSON ontology. A few issue came up that we need to make a decision about.

- What naming convention should we use for attributes that are identifiers? Jim suggested identifiers end with "_id". That seems simple and works for me.

- How should we represent metadata describing attributes?

- What is the mechanism to associate an identifier with the dataset root url?

A link to the MatchSciNet (mrauthid) attribute is below followed by the text for the entry. The objective here is to represent root urls so that identifiers for a person can be machine-processed into various urls.

It would be helpful to understand how others have done this. I know Freebase handles this is with a URI Template type that has a property for the base url. The template value must include a marker ({key}) for the id location. For instance, http://www.ams.org/mathscinet/search/author.html?mrauthid={key}. Documentation for the Freebase approach is at, http://wiki.freebase.com/wiki/Enumerated_property_with_URI_Template . A 12/31/09 message to this email list describes a specific case I implemented when importing Math Genealogy data into freebase. I attached a file that describes the steps I used in detail in the freebase page on bkn projects site, .

URI Template is one case of metadata for attributes and covers the most relevant bits for the case of identifier metadata below. In the case below I don't think "example" and "numberof" are necessary.

--------------------------------------------------------------------------------------------------
http://www.bibkn.org/conStruct/ontology/bkn#mrauthid

Attribute name: MathSciNet author ID

Attribute URI: http://purl.org/ontology/bkn#mrauthid

Description: Unique identifier assigned to authors in the MathSciNet (Mathematical Reviews) database. Allows construction of deep links into the MathSciNet database.

example: 140080

link: http://www.ams.org/mathscinet/collaborationDistance.html?AuthorTargetName=Erd%F6s%2C%20Paul&group_target=189017&group_source=140080
MR Erdos Number

link: http://www.ams.org/mathscinet/search/publications.html?pg1=IID&s1=140080
MR Publication List [Subscribers only]

link: http://www.ams.org/mathscinet/search/author.html?mrauthid=140080
MR Author Profile [Subscribers only]

link: http://www.ams.org/mathscinet/search.html?iid=140080
MR Search [Subscribers only]

link: http://www.ams.org/mathscinet/mrcit/individual.html?mrauthid=140080
MR Citations [Subscribers only]

Numberof MRAuthID assigned: 550,000

Jack Alves

unread,

Mar 4, 2010, 2:08:42 PM3/4/10

to bib...@googlegroups.com

Jim and I had a discussion recently that includes the issues in this original thread. The larger scope of issue like these is BibJSON Governance. What issues do we need to address? How do we make decisions about things, and who makes the final call? How do things get implemented?

I created a page on the BKN Project site to begin collecting and organizing thoughts,

http://sites.google.com/site/bibknproject/projects/bibjson/bibjson-governance
(Let me know if you need access to the site)

For now the content of the page is the following hodgepodge of notes from my conversation with Jim:

What are best practices for naming types and attributes?

The current schema has a mix of underscore, camelCase
What is # used for in the ontology urls? I believe this is just an anchor in a schema doc.
Other constraints like use of space characters

Where should schema be stored? How will people and machines find it?

How are namespaces managed?

Identifiers

need a naming convention like _id or a metadata that indicates the attribute is an id.
mechanism to associate base/root urls to identifiers
metadata to describe id
who decides names for ids
example:

name: IMS member ID
Aliases: IMSmember, MemIMS memims,
format: 5[n] - some way to describe validation for id, maybe a regEx validator, and maybe an example
base url: - could be multiple urls, might be an object with additional metadata

Benjamin Kalish

unread,

Mar 4, 2010, 11:26:22 PM3/4/10

to bib...@googlegroups.com

Hi Jack,

I was thinking it might be good if to have the specification, schema, and accompanying documents all live in a repository such as Google Code. This would provide version control and an issue tracker, both of which I think would be very helpful.

One particularly nice thing about such a system would be the ease with which one could bring up not just a specific version of the schema, but a snapshot of what all the BibJSON docs looked like at a given time.

There is an existing Google Code repository for BKN, although it might be better to have a separate repository for BibJSON so that you could have a separate list of committers and project owners.

Benjamin Kalish

Jack Alves

unread,

Mar 5, 2010, 12:03:28 PM3/5/10

to bib...@googlegroups.com

This makes sense to me for the schema and spec.

Nitin Borwankar

unread,

Mar 6, 2010, 2:00:56 PM3/6/10

to bib...@googlegroups.com

IMHO, schema, spec and sample code form the trifecta for putting in version control.
Often sample code is out of sync with a spec and keeping them all in sync in a VCS helps a new user test and use the standard with minimum confusion.
I, for one, have no idea which of the many sample datasets we have are compliant with the spec to what level, because this knowledge has not been externalized.

------------------------------------------------------------------
Nitin Borwankar
nborw...@gmail.com

Reply all

Reply to author

Forward