record example

2 views
Skip to first unread message

jack

unread,
Nov 10, 2009, 2:07:34 PM11/10/09
to bibjson
Let's take this one record as an example from the existing schramm
BibJSON.

How would we include a text label that represents the author and two
objects with refs.

{
"id": "ap79",
"type": "Article",
"author": {
"ref": "@Aldous_David_and_Pitman_Jim"
},
"series": {
"ref": "@Annals_of_Probability"
},
"title": "On the zero-one law for exchangeable events",
"year": "1979",
"subject": "Markov chain, Coupling, Zero one law,
Exchangeable events",
"volume": "7",
"pages": "704-723",
"mrClass": "60F20 (60J10)",
"mrid": "MR537216"
},

jack

unread,
Nov 10, 2009, 2:22:36 PM11/10/09
to bibjson
One way this could be represented is:

{
"id": "ap79",
"type": "Article",
"author": {
"bibtex_label: "J. Pitman and David Aldous",
"ref": [
{
"name": "David Aldous",
"id": "@Aldous_David"
},
{
"name": "James Pitman",
"id": "@Pitman_jim"
}
]
},
"series": {
"ref": {
"name": "Annals of Probabiliity,
"id": "@Annals_of_Probability"
}
},
"title": "On the zero-one law for exchangeable events",
"year": "1979",
"subject": "Markov chain, Coupling, Zero one law,
Exchangeable events",
"volume": "7",
"pages": "704-723",
"mrClass": "60F20 (60J10)",
"mrid": "MR537216"
},

I believe BibJSON supports adding a new attribute like "bibtex_label".
I'm not clear if BibJSON supports the structure I used for "ref".

Frederick Giasson

unread,
Nov 10, 2009, 2:33:54 PM11/10/09
to bib...@googlegroups.com
Hi Jack!

I will come back to this with another thread once I finished some
investigation & thinking, however your example would looks like this
according to the current bibjson spec:

{
"id": "ap79",
"type": "Article",
"author": {
{
"bibtex_label: "J. Pitman and David Aldous",
"name": "David Aldous",

"ref": "@Aldous_David"
},
{
"bibtex_label: "J. Pitman and David Aldous",
"name": "James Pitman",

"ref": "@Pitman_jim"
}
},
"series": {
"name": "Annals of Probabiliity,

"ref": "@Annals_of_Probability"
},

"title": "On the zero-one law for exchangeable events",
"year": "1979",
"subject": "Markov chain, Coupling, Zero one law,
Exchangeable events",
"volume": "7",
"pages": "704-723",
"mrClass": "60F20 (60J10)",
"mrid": "MR537216"
},


Then we can think about changing "name" or "bibtex_label" or whatever by
"text" as we discussed over this last call.

My next email will explains what the above means, and how we could
change the serialization to make it easier to understand with a couple
of propositions.
Yes, exactly. We only have to add them to the schema and we are done.
> I'm not clear if BibJSON supports the structure I used for "ref".
>
I made a couple of modifications that comply with the current spec.



Thanks,


Take care,

Fred

jack

unread,
Nov 10, 2009, 2:45:25 PM11/10/09
to bibjson
A representation that I believe conforms to the BibJSON spec uses a
new "author_list" attribute and type, and uses prefLabel. I understand
prefLabel in this example is instance specific. I think that is fine
with Jim. Anyone who outputs BibJSON could use the author's object
label as the default for prefLabel here.

{
"id": "ap79",
"type": "Article",
"author_list": {
"bibtex_label: "J. Pitman and David Aldous",
"author": [
{
"prefLabel": "David Aldous",
"ref": "@Aldous_David"
},
{
"prefLabel": "James Pitman",
"ref": "@Pitman_Jim"
}
]
},
"series": {
"prefLabel": "Annals of Probabiliity,

Frederick Giasson

unread,
Nov 10, 2009, 3:25:36 PM11/10/09
to bib...@googlegroups.com
Hi Jack
This is not conforms to the current spec.

The problem with the above is that you end-up with that:

record --> attribute --> attribute --> linked-record

But I know that what you are trying to archive is to create/generate new
structure such as lists. No new structure have been created in BibJSON
(yet) that were not in the JSON spec. So, right now, we can create lists
with JSON arrays introduced with brackets ([...]).

I think we should have some discussions about lists, and what they are.
Well, there are many kind of such structures such as colelctions,
ordered lists (sequences), unordered lists (bag), etc. But the question
is: should we introduce them in BibJSON, and if yes, what would be the
best way to do this?


In the past, we had several discussion about JSON "objects" and JSON
"arrays", what should we use and when, etc.

So, here is a question for the group: what if everything is a JSON
object, and that we don't use any arrays (brackets). And if we want to
introduce a structure (a bag, a sequence, etc), we do this with some
core attributes that have been added for that special purpose.

Lets try with the example below, which switched the logic in Jack's example.

{
"id": "ap79",
"type": "Article",
"author": {
"bibtex_label: "J. Pitman and David Aldous",
"sequence": {
{
"prefLabel": "David Aldous",
"ref": "@Aldous_David"
},
{
"prefLabel": "James Pitman",
"ref": "@Pitman_Jim"
}
}
},
"series": {
"prefLabel": "Annals of Probabiliity,
"ref": "@Annals_of_Probability"
},
"title": "On the zero-one law for exchangeable events",
"year": "1979",
"subject": "Markov chain, Coupling, Zero one law,
Exchangeable events",
"volume": "7",
"pages": "704-723",
"mrClass": "60F20 (60J10)",
"mrid": "MR537216"
},



So, we could have keywords such as "sequence" when the order is
important, and "bag" when the order is not important (or "listSequence)
and "listBag"; or "orderedList" and "unorderedList"; etc...)

When no "structural" keywords are used like what we have with "series",
it means that the reference is to a single record.

The example above would mean something like:

"The article "On the zero-one law for exchangeable events" has two
authors, and the names of the authors appears in the ordered list "David
Aldous" and "James Pitman". Additionally, we have some "meta-data" that
gives us some more information about the bibtex author string."


Then, in the structure schema, the definition of the "author" attribute
would be something like:

"author": {
"prefLabel": "author",
"description": "The name(s) of the author(s) (in the
case of more than one author, separated by and)",

"allowedType": "Document",
"allowedValue": ["String", "Person"],

"minCardinality": "1"
},

So, author attribute described that way means:

Someone can use the "author" attribute to describe a "Document" record.
The value of this attribute can be either a "String" or a "Person"
record. The minimum cardinality is 1, but there can be multiple values.

What is important is to see is what "minCardinality" and
"maxCardinality" implies. If we can have multiple values for an
attribute (minCardinality >= 1 or maxCardinality >= 2), that implies
that a "list" structure can be expected by a parser/validator. This
means that the following three examples would be valid BibJSON:

{
"id": "ap79",
"type": "Article",
"author": {
"sequence": {
{
"prefLabel": "David Aldous",
"ref": "@Aldous_David"
},
{
"prefLabel": "James Pitman",
"ref": "@Pitman_Jim"
}
}
}
}


OR

{
"id": "abc",
"type": "Article",
"author": {
"sequence": {
{
"prefLabel": "Bob",
"ref": "@Bob"
}
}
}
}


(a sequence of a single value)

OR

{
"id": "abc",
"type": "Article",
"author": {
"prefLabel": "Bob",
"ref": "@Bob"
}
}



After thinking about this, I think it would make sense to only use JSON
objects to describe everything, and to introduce these new "structural
keywords" instead. One advantage is that we can have multiple different
kind of structures, and that we don't confuse structured related
information between JSON's syntax and BibJSON's.

Any thoughts?


Thanks!


Take care,


Fred

Jack Alves

unread,
Nov 10, 2009, 3:58:57 PM11/10/09
to bib...@googlegroups.com
The general ideas seems flexible and powerful. I don't think it is legal JSON to have objects in objects without keys. So you would either need arrays or use numeric keys as a convention. For example, the sequence object without an array could be,

{
    "sequence": {
       "0": {

              "prefLabel": "Bob",
              "ref": "@Bob"
              } ,
        "1":{

Frederick Giasson

unread,
Nov 10, 2009, 4:57:02 PM11/10/09
to bib...@googlegroups.com
Hi Jack!

> The general ideas seems flexible and powerful. I don't think it is
> legal JSON to have objects in objects without keys. So you would
> either need arrays or use numeric keys as a convention. For example,
> the sequence object without an array could be,
>
> {
> "sequence": {
> "0": {
> "prefLabel": "Bob",
> "ref": "@Bob"
> } ,
> "1":{
> "prefLabel": "James Pitman",
> "ref": "@Pitman_Jim"
> }
> }
> }

Well, you got me here. It is true that it was not valid, and this
solution works fine I think. So, we could have:

{
"sequence": {
"0": {
"prefLabel": "Bob",
"ref": "@Bob"
} ,
"1":{
"prefLabel": "James Pitman",
"ref": "@Pitman_Jim"
}
}
}


In the example above, the number have a meaning since it is a sequence.
So, items are sequence as described by the "number" attributes.

{
"bag": {
"0": {
"prefLabel": "Bob",
"ref": "@Bob"
} ,
"1":{
"prefLabel": "James Pitman",
"ref": "@Pitman_Jim"
}
}
}


In the example above, numbers doesn't have any meaning except that they
list items belonging to the bag.




Still make sense?


Thanks,


Fred

Jack Alves

unread,
Nov 10, 2009, 5:16:20 PM11/10/09
to bib...@googlegroups.com
Yes, I think this will work.

Frederick Giasson

unread,
Nov 10, 2009, 5:18:02 PM11/10/09
to bib...@googlegroups.com
Hi
> Yes, I think this will work.

Good. I will sleep on this, but I think I like this solution to handles
listing structures.


Everybody else: any comments/suggestions vis-a-vis this thread?


Thanks,

Fred

Benjamin

unread,
Nov 10, 2009, 9:52:35 PM11/10/09
to bibjson
I must have missed something here. Arrays provide a simple and concise
notation for ordered lists. What advantage is there to using an object
with "numeric keys" (they are still strings in your example) instead
of using an array?

Benjamin

Benjamin

unread,
Nov 10, 2009, 9:55:56 PM11/10/09
to bibjson
The issue of how to preserve a string such as "J. Pitman and David
Aldous" is an important one, and one Jim and I have devoted some time
to. What we are dealing with is the distinction between description
and access. I will post a new thread on this issue in just a moment.

Benjamin

Frederick Giasson

unread,
Nov 11, 2009, 10:46:13 AM11/11/09
to bib...@googlegroups.com
Hi Benjamin,
> I must have missed something here. Arrays provide a simple and concise
> notation for ordered lists. What advantage is there to using an object
> with "numeric keys" (they are still strings in your example) instead
> of using an array?
>

Yes, JSON brackets are a mean for this: ordered lists.

However, what about these other types of lists:

- Unordered lists
- Collections
- Etc.

Then, we have the notions of finite set of things, and infinite set of
things which can be important.

And now we possibly have another structure which is "textOutline". So,
what as been discussed here is a general concept for structures that
could be use to handle all these things without impacting the semantic
of the records being described.


Thanks,


Fred

Frederick Giasson

unread,
Nov 11, 2009, 10:49:12 AM11/11/09
to bib...@googlegroups.com
Hi,

> The issue of how to preserve a string such as "J. Pitman and David
> Aldous" is an important one, and one Jim and I have devoted some time
> to. What we are dealing with is the distinction between description
> and access. I will post a new thread on this issue in just a moment.
>

Well, it is why I was wondering about trying to "cluster" the structure
in all these attributes and values... If the only goal is to preserve
the string "J. Pitman and David Aldous", why not simply doing this:


"authorCitation": "J. Pitman and David Aldous"?


Much simpler, it won't make the specification even more confusing by
introducing all kind of new structures.

Such things is always a tradeoff between: needs, simplicity and
expressiveness.


Thanks!


Take care,


Fred

Benjamin Kalish

unread,
Nov 11, 2009, 12:14:37 PM11/11/09
to bib...@googlegroups.com
I don't see why we need to differentiate between ordered and unordered lists. Whether or not the order of a list is significant will be determined by the attribute. There should never be a case where the same attribute can take either an ordered or an unordered list and a parser must discriminate between the two.

I feel the same way about collections and sets. It shouldn't be the parser's job to remove duplicates. That type of burden should lie with the the dataset creator and whatever tools he or she uses.

If there are any uses for which a distinction between these types of list will be significant, please share them. I can't imagine what they would be.

Benjamin Kalish
4 Lawn Ave, Apt 2L
Northampton, MA  01060-2221
Phone: 413-687-7738
Email: bka...@gmail.com

Frederick Giasson

unread,
Nov 11, 2009, 12:34:44 PM11/11/09
to bib...@googlegroups.com
Hi Benjamin

> I don't see why we need to differentiate between ordered and unordered
> lists.
This decision can have a huge impact on several systems that can ingest
your data. If the publisher of data consider that the order of values
for an attribute is not important, than it *won't* index the data the
same way it would if the order of values *is* important.

This as huge impact on multiple things in several systems that could use
your data.

The specific thing that happen is that if you want to keep values
ordered, you need a special order key. This will be implemented
differently depending of your system: relational database, triple store,
flat file system, etc.

*This* has to be considered, and is the reason why we *should* make a
distinction. If we don't care, then use a unordered list, otherwise you
will add processing (time and additional processing structures) in any
system that will manage this data.

This is the reason why.

> Whether or not the order of a list is significant will be determined
> by the attribute. There should never be a case where the same
> attribute can take either an ordered or an unordered list and a parser
> must discriminate between the two.

I think this depends on the usecase and what the publisher of data wants.

> I feel the same way about collections and sets. It shouldn't be the
> parser's job to remove duplicates. That type of burden should lie with
> the the dataset creator and whatever tools he or she uses.
The difference between a collection (finite) and a sequence (infinite)
has nothing to do with duplicated items.

In any case, *all* the burden has to be on the shoulders of the
parser/validation. Who knows if the data publisher know what is his
doing? If he has a bug in his script that duplicate items? Add syntactic
and semantic errors? Al this has to be handled by the parser/validator
systems. Otherwise you can add all kind of corrupted data in your system.

Also, on a general note, the general idea being BibJSON *is* exactly th
opposite of this statement "That type of burden should lie with the the
dataset creator and whatever tools he or she uses". The initial idea
behind BibJSON & related is exactly the opposite: to make it as easy and
natural as possible to the data publisher to publish his data without
making all these considerations.


> If there are any uses for which a distinction between these types of
> list will be significant, please share them. I can't imagine what they
> would be.

I hope I answer this question above.


Thanks!



Take care,


Fred

Benjamin Kalish

unread,
Nov 11, 2009, 12:53:35 PM11/11/09
to bib...@googlegroups.com
> Well, it is why I was wondering about trying to "cluster" the structure in all these attributes and values... If the only goal is to preserve the string "J. Pitman and David Aldous", why not simply doing this:
>
>
> "authorCitation": "J. Pitman and David Aldous"?

That is the simplest and perhaps the best solution. There are some
reasonable objections to it however. So long as the audience for
descriptive strings is humans, a string such as "J. Pitman and David
Aldous" is perfectly adequate, and this is exactly how librarians have
traditionally approached the issue. If we wish to support the
automatic formatting of bibliographic citations then we need to make
these strings machine readable as well.

If we decide this is important then I think there are three solutions:

1) The tool which formats the citation bares the burden of decomposing
the statement of responsibility into its component parts. This is how
BibTeX works, but BibTeX requires its users to jump through hoops in
order to do so and the result is that BibTeX's author strings are
rarely as accurate as one might like. A tool working with BibJSON data
might use the fact that it also has access to the names of the authors
in a controlled form to its advantage, but this is complicated and
unreliable. Imagine trying to decompose a string like "by Per Brinch
Hansen with an introduction by Donald Knuth". Would you want to write
the code that turns that into "Brinch Hansen, P."?

2) We can include an additional attribute for "transcribedNames" which
would be structured but would not include information about the
original statement of responsibility such as whether the word "with"
or "and" had been used. This attribute would take a list of name
objects where each name object has attributes for "givenName",
"surname", and so forth so that bibliographies can be formatted
properly. (Because different citation styles have different rules for
abbreviating first names and inverting foreign names.)

The main objection to this would be that the data is redundant which
can lead to inconsistencies.

3) The third option is to provide a JSON object from which we can
generate both the statement of responsibility and properly formatted
versions of the authors' names for bibliographies. This would be
something like the hypertext convention proposed in earlier
specifications, but it need not be so broadly defined. In particular,
there is no need for it to serve any purposes other than the two
listed above. We could still have an ordinary "author" attribute to
provide metadata for access if we thought that would be easier to
parse. (I.e. we could use the hypertext convention to add structure to
descriptive metadata but use a less free form approach like the one
proposed in the current spec for everything else.)

Of course, none of this is necessary if we are willing to accept badly
formatted bibliographies. Most citation managers don't preserve this
information. As a result they produce improperly formatted
bibliographies; most people don't seem to mind.

Benjamin Kalish

Benjamin Kalish

unread,
Nov 11, 2009, 1:03:07 PM11/11/09
to bib...@googlegroups.com
Hi Fred,

I don't think requiring that the user not include duplicate items in
their data is an unacceptable burden. I agree that BibJSON should make
it easy and natural for the dataset publisher, but we have to draw the
line somewhere. We could require that every BibJSON parser include a
spell check, but we both know that would be silly. The question is
where to draw the line.

The problem as I see it is that in trying to make BibJSON easier to
use by adding more datastructure we are also making it more
complicated by adding new types. Whether or not this is worth it has
to be decided on a case by case basis.

As for infinite sequences, no data structure can represent them and
they don't occur in bibliographic data anyhow, so why worry about
them?

Benjamin Kalish

Benjamin Kalish

unread,
Nov 11, 2009, 1:41:21 PM11/11/09
to bib...@googlegroups.com
Oh, and as for publisher intention, I understand that databases and
the like need to know whether or not order is significant. What I'm
having trouble with is imagining a bibliographic attribute for which
one data publisher will want to supply an ordered list and another
data publisher will insist that the list be unordered. Even if one
publisher provided author names in the order in which they are given
on the original document and another lists them in an arbitrary order,
why would the the second publisher object to their arbitrary order
being preserved?

Benjamin Kalish
4 Lawn Ave, Apt 2L
Northampton, MA 01060-2221
Phone: 413-687-7738
Email: bka...@gmail.com



Frederick Giasson

unread,
Nov 11, 2009, 4:12:29 PM11/11/09
to bib...@googlegroups.com
Hi,

> That is the simplest and perhaps the best solution. There are some
> reasonable objections to it however. So long as the audience for
> descriptive strings is humans, a string such as "J. Pitman and David
> Aldous" is perfectly adequate, and this is exactly how librarians have
> traditionally approached the issue. If we wish to support the
>
Good
> automatic formatting of bibliographic citations then we need to make
> these strings machine readable as well.
>
And it is exactly why we have all the structured information as well.
The thing I was saying is that there is no need to have any structure of
the kind of "conjunction" since different conjunctions will be used
depending on the citation style you want. This is true for bibliographic
citations as well as anything else you can think of.

I don't think I ever said that we didn't have to structure such
unstructured data.

Remember, my company is called Structured Dynamics, the suit of products
is called: conStruct, structWSF, and so on....

So sure that this structure is needed :)



Thanks,


Fred

Frederick Giasson

unread,
Nov 11, 2009, 4:14:35 PM11/11/09
to bib...@googlegroups.com
Hi,

> I don't think requiring that the user not include duplicate items in
> their data is an unacceptable burden. I agree that BibJSON should make
> it easy and natural for the dataset publisher, but we have to draw the
> line somewhere. We could require that every BibJSON parser include a
> spell check, but we both know that would be silly. The question is
> where to draw the line.
>
It is not a *requirement*. Naturally data publisher won't duplicate
items for fun. But it can happen from time to time for different
reasons. So, the parser/validator has to manage it. I have nothing else
to say about that.

> The problem as I see it is that in trying to make BibJSON easier to
> use by adding more datastructure we are also making it more
> complicated by adding new types. Whether or not this is worth it has
> to be decided on a case by case basis.
>
Totally agree, see my thread "New list of proposals" for more
information about this.

> As for infinite sequences, no data structure can represent them and
> they don't occur in bibliographic data anyhow, so why worry about
> them?
>

Idem.


Thanks!


Take care,


Fred

Frederick Giasson

unread,
Nov 11, 2009, 4:16:35 PM11/11/09
to bib...@googlegroups.com
Hi,

> Oh, and as for publisher intention, I understand that databases and
> the like need to know whether or not order is significant. What I'm
> having trouble with is imagining a bibliographic attribute for which
> one data publisher will want to supply an ordered list and another
> data publisher will insist that the list be unordered. Even if one
> publisher provided author names in the order in which they are given
> on the original document and another lists them in an arbitrary order,
> why would the the second publisher object to their arbitrary order
> being preserved?
>

There are, but I think *I* went too far into this stuff. Refer to my
proposal in the "New list of proposals" thread I started, and tell me if
it is a good balance between simplicity, flexibility and usability.

Thanks,


Fred
Reply all
Reply to author
Forward
0 new messages