Hi all,
As I tried to explain in the past and as I commented in the past
versions of the spec: I think there is a problem of concerns with this
proposition at some different levels. Lets enumerate them here, and then
lets talk about them.
(1) Storing of information related to specific citation styles; and
trying to outline documents structures and sentences ( related to point
(b) and (c) )
(2) Modifying the data model for programming language purposes (related
to points (a) )
The problem I see with (b) and (c) is one of purpose and focus. I
personally don't see BibJSON as an document & text outline language
since we have been talking about publication of bibliographic related
data (authors, institutions, publishers, documents information, etc),
and not about the publication of internal document and citations
structures. There are two totally different purposes that cannot be
reconciled in a single vocabulary/ontology. (at least, I won't ever
suggest that).
What I consider important for BibJSON is to have the good granular level
to be able to describe each entity implied in any bibliographic process
(authors, documents, editors, publishers, events (conferences, etc),
etc). Once this expressiveness is reached, consumers of data can display
they ingested from different sources, the way they want, with the style
they want.
Lets take an example to demonstrate what I mean here. Lets take a
"citations generation application". This application would ingest
BibJSON and generate different bibliographies using different citation
styles (Chicago, etc). What is important for this application is to have
all the information at hands to display the *same bibliographic data*
according to different styles. Such an application won't care about
conjunctions specified by the data publisher.
In fact, who the data publisher is to tell people how they should
display information?
As for point (c), why should BibJSON try to describe the outline of a
text corpus? (at least, it is what I understand from the explanation)
So, this raises a scoping issue with BibJSON.
Now, for the point (2) above, programming considerations impact the
development of the BibJSON spec. A distinction has to be made between:
(1) a data representation format which aim to be easily used by data
publisher to describe information its own, and to help him to easily
transmit this data to external agents; and (2) parsed structure created
from (1).
(2) should never impact the good development of (1). The goal is
two-fold: (1) we want to have to create a format that is the perfect
tradeoff between simplicity and expressiveness, and (2) having
programming API that parse this structure in easily re-usable data
structure to enable what you demonstrate in (a). And this should be true
with *any* programming language: Python, JavaScript, PHP, C, C#, etc.
We shouldn't ever say: I want to change the data format (bibjson) in
that way because it is easier to implement it in the programming
language X. We have to say: I want to make that change to the data
format because it is less confusing, make it simpler and easier to use
(to publish data) and it keeps its expressiveness power. Hoo and by the
way, it becomes easier to parse too :)
To answer what Nitin said in his email:
(a) is already possible with the current spec. It is the description of
a record.
(b) I won't propose to do that
(c) If my understand is right; I won't propose to do that neither
(d) We have two choices: we create multiple entities for the same entity
and links them together with some attribute. Or we do what Jack
suggested below.
Now:
> In case a), I would expect for there to be a type "address" with
> attributes for text, city, state, ... I recommend against using "text"
> as an attribute name unless it really doesn't represent a specific
> attribute. For the address example, "text" is meant to be the display
> name of the object.
I agree. But at the end, this is really a vocabulary consideration, and
not a notational one.
> In case b), I don't understand the purpose of $$conjunction. Why does
> that need to be stored? Can the value represent distinct meanings? It
> doesn't seem necessary in the example because there is an implicit
> "and" when listing authors. The example mentions that $$author1 "is a
> structured string with potentially many more keys such as MR_ID,
> affiliation etc". In the example is seems like you are using $$author1
> as a variable.
>
> authors = [ $$author1, $$conjunction, $$author2 ]
>
> If the content of the structured strings are shown would this look
> like an ordered list of objects?
>
> authors =
> [
> {
> "name": "Jim Pitman",
> "MRid": "6785565"
> },
> {
> "conjunction": "and"
> },
> {
> "name": "David Aldous",
> "MRid": "87855"
> }]
>
> Or is the structured string meant to be the definitive object for the
> author? In the latter case you are saying every attribute of Jim
> Pitman should be displayed. That would mean listing all publications
> because they would be attributes of the Jim Pitman object.
I agree. And what I don't like is that the code above suggest that we
have three authors, and that the conjunction is one of them. Otherwise,
a structure, like the lists we talked about in another thread, could be
introduced. Like a list, it won't have any semantic meaning other than
that it is a structure that order things in some ways. Maybe we could
endup with something like:
"authors": {
"textOutline": {
"0": {
"text": "Jim Pitman",
"ref": "@jpitman"
},
"conjuntion": " and ",
"1": {
"text": "David Aldous",
"ref": "@daldous"
}
}
}
Here, "textOutline" and "conjunction" would be reserved processing
keywords; and at least this would be consistent with the general ideas
of structures such as lists, etc.
In the example above it would make sense to use the attribute "text"
since we are talking about the outline structure of a text corpus
("text" + "conjuntion" + "text").
The semantic of the above would mean something like: There are two
authors @jpitman and @daldous. Additionally, we have a textOutline for
these two authors that would generate the literal "Jim Pitmand and David
Aldous".
The definition of the "author" attribute would remain the same:
"author": {
"prefLabel": "author",
"description": "The name(s) of the author(s) (in the
case of more than one author, separated by and)",
"allowedType": "Document",
"allowedValue": ["String", "Person"],
"minCardinality": "1"
},
Any structure can be used to as a value of the attribute "author"
(sequence, bag, textOutline, etc). They doesn't impact the validation of
the data according to the allowedValue of the "author" attribute.
However, all objects belonging to these structures ("0" and "1" above)
have to comply with the allowedValue specified in the structure schema
for "author".
The logic being this is that allowed values of an attribute can be
structured in any ways without impacting on the semantic of the
relationships. So, for "author", we could have have another object ("2")
which would be of time "Document" for example.
> For case d), BibJSON needs to support a list for altLabel. An object
> might look like,
>
> {
> "prefLabel": "Jim Pitman",
> "altLabel": {
> "0": "J. Pitman",
> "1": "James Pitman",
> "2": "jpitman",
> " }
> }
>
In fact it would be:
{
"prefLabel": "Jim Pitman",
"altLabel": [
"J. Pitman",
"James Pitman",
"jpitman",
]
}
(according to the current spec)
Or we could think about:
{
"prefLabel": "Jim Pitman",
"altLabel": {
"unorderedList": {
"0": "J. Pitman",
"1": "James Pitman",
"2": "jpitman"
}
}
}
I think I would enable both methods to be used and make them
semantically equivalent.
Thanks!
Take care,
Fred