Description, access, and unflattening records

Benjamin

unread,

Nov 10, 2009, 10:46:45 PM11/10/09

to bibjson

Jack recently posted an example record in which the original BibTeX
author string was "J. Pitman and David Aldous" and suggested the use
of a 'bibtex_label' attribute in order to preserve the original string
within BibJSON. Jack is absolutely right that this string should be
preserved. I'd like to talk about why.

Standard practice for bibliographies is to cite information such as
the publisher name, author name, title, etc. as it appears on the
item. The purpose of such data is to provide description. On the other
hand, when entering bibliographic data into a simple searchable
catalog the convention is generally to select such names from a
controlled vocabulary. The purpose of this data is to provide access.
Both conventions are necessary and metadata formats that are meant to
serve both needs, access and description, must allow both conventions
and must allow for a clear distinction between the two.

Care must be taken when unflattening bibiographic data to make sure
that the relationships conveyed are the relationships which were
originally intended. Descriptive data can only be turned into access
data with care, and access data can never be turned into descriptive
data without consulting the original instance.

Contrary to what the specification says, you can't take the
affiliation of a paper and say that it is really the affiliation of
the papers author. An author has current affiliations and historic
affiliations. A paper is associated with the the affiliation under
which the paper was published. If Jim were to leave Berkley and get a
job at MIT the papers he wrote at Berkley would still be associated
with Berkley. There affiliation would not change, though Jim's
affiliation would have.

Often, it will turn out that what has been traditionally viewed as a
single attribute must be viewed as several distinct attributes before
we can unflatten a dataset. A good example is the "journal" attribute.
A paper is traditionally associated with the name of the journal in
which it was published. It is tempting to remove the journal name from
the paper's record, replacing it by a reference to the journal's
record, however the name of the journal may change over time while the
journal name that is associated with the paper should be static. The
solution is to provide two attributes, one for the name of the journal
(at the time of the paper's publication), and another for the
identifier.

As for Jack's issue, how best to preserve the string "J. Pitman and
David Aldous"? While this may not be an exact transcription of the
author's names as they appeared in the paper, it will be close and
should only differ according to the (rather eccentric) conventions of
BibTeX. Such strings are, therefore, descriptive. They need not be set
aside in a special 'bibtex_label' attribute, but BibJSON should
provide an attribute for descriptive strings of this kind, regardless
of their origin. We will have to have some discussion about how to
properly name such an attribute, I am sure. One possibility would be
to call it a "statement of responsibility", which is a well recognized
concept in librarianship. (The "statement of responsibility" consists
of the names of those given credit on the chief source of information
for an instance, together with the words associating them with the
title, for example "by Russell Hoban; pictures by Lillian Hoban" or
"Rodolfo Baggioa and Chris Cooper".)

Frederick Giasson

unread,

Nov 11, 2009, 10:56:43 AM11/11/09

to bib...@googlegroups.com

Hi Benjamin!

I totally agree with the usecase you outlined above. The only thing I
would like to make clear here is:

(1) vocabulary definition
(2) notation definition

What you described above is a need to properly extend the BibJSON
*vocabulary* (multiple affiliation attributes, etc); which is what is
needed, and I am natually fine with that.

However, what we have to take care about is everything related to the
textual outline of a text corpus (like the author string example above).
All proposed solutions to date are targeting the definition of the
notation. I personally think that it should target the definition of the
vocabulary for the reasons I outlined on the " structured string use
cases - as requested by Jack/Fred on tech conf call 2009-11-10 " thread
on this forum.

Thanks!

Take care,

Fred

Benjamin Kalish

unread,

Nov 11, 2009, 1:16:18 PM11/11/09

to bib...@googlegroups.com

> What you described above is a need to properly extend the BibJSON *vocabulary*

Yes! Exactly! If we don't have a vocabulary that allows us to make the
conceptual distinctions we wish to draw we are bound to be frustrated.

I'm afraid I haven't yet had time to wade through all the material in
the thread you referenced, but the more I work with actual datasets
the more I think that: a) some sort of structured string is desirable
for specifying descriptive data; b) there is no need to use structured
strings to provide metadata which is intended only for access. (That
doesn't mean that I think structured strings *shouldn't* be used to
specify non-descriptive metadata. I remain unconvinced either way.)

Benjamin Kalish

Reply all

Reply to author

Forward