Maybe a bit overblown ("far more advanced"?), but I'd just add that
I'm thinking about a similar workflow for pandoc (which recently added
full CSL 1.0 support), with similar needs.
Bruce
I'd sure like to see such support. One or two years ago, I checked the
state of Zotero and the state of the LaTeX tools and I decided that the
best way to integrate the two was to come up with my own homegrown
solution.
> It is not clear to
> me what the best way to do this is, for the following reasons:
>
> * LaTeX users need to cite by an identifying key which is not visible
> in the Zotero interface
Right. As I recall, that was one of the major issues I had. The keys are
not readily accessible. They are also not generated correctly. Things
may have changed but last I checked, some of tools in the latex
toolchain won't work with accented characters in the keys. Unfortunately
Zotero does not strip them out. Here's an actual example:
dharmaśrī_pi_1978
The name of the author has diacritics. (And the title is one word
because the title is in Pinyin with each syllable demarcated by spaces.
I have not spent any time trying to figure a solution for that.) Here's
another fun one:
mdo-sṅags-bstan-paʼi-ñi-ma_dbu_1978
It does not handle articles (as in: the, a, an, un, une, des, les, le,
la, ein, die, etc.) well:
bareau_les_1955
I'd rather have the next word of the title rather than a single article
as the abbreviated title.
Besides problems like above, I just did not like the algorithm. For
instance, years are always put in the key. I'd rather have the year
added ONLY if it helps disambiguate an otherwise ambiguous key. This
means that adding a new entry can make an old key change. In practice, I
think it is a rare instance which can be readily fixed by the user. It
is just that if I know the author and title, I can figure out the key
fairly easily. It is rare that I know the year of publication.
I was also concerned by changes to Zotero mucking up the keys. Work on
your dissertation, Zotero changes the key generation algorithm and then
all the keys cease to work. Yay!
So I decided the best course of action would be to work manually:
export, regenerate keys and cleanup, run the latex toolchain. The export
bit is manual. All the rest is launched by "make". By generating my own
keys, I isolate myself from whatever changes are made to Zotero.
> If anybody has an comments on these things, I'd be glad to hear them.
> BibLaTeX/Biber is developing rapidly now and already has features far
> more advanced than any other bibliography systems I can think of
> (particularly in the areas of Unicode, sorting and cross-entry
> inheritance) and it would be a shame not to be able to pull the data
> from Zotero.
Indeed it would and maybe the time is ripe for something like this. For
what it is worth, I'm attaching my rekeying code to this email. It is an
ugly beast but it may give some ideas. Besides regenerating the keys, it
does some ad hoc cleanup which is probably of interest to no one else.
This code is used on a daily basis but only by me, and only for one
project so far. So things are hardcoded including the name of the input
(biblio.bib) and output (normalized.bib) files. Both input and output
are bibtex files. You need pybliographer to run it.
Ciao,
Louis
> I don't mind so much the accents in keys as biber has full UTF-8
> support for keys too but the issue is more that when actually
> typesetting, you need to be able to uniquely refer to an entry in
> order to cite it. I really don't like the idea of calculating the key
> from fields on the fly - far too much room for breakage there. There
> is a unique key in the SQL-lite backend but it's not exposed in any
> way. Say a user cites "xyz" in a latex document and as a data source
> points to a Zotero db or Atom feed. I need to be able to look for that
> key in the data source, get the entry and then parse it into the
> internal biblatex data model format. Also, users need to be able to
> have easy access to the keys of the entries or they can't cite them by
> key in the first place ...
The difficulty with the citation key issue is just that it's by
definition a local (to a file, or a database, or in this case, a user)
identifier. This assures a processor can find the correct item (and
requires the processor have access to those particular records, which
is a big limitation in many contexts).
But also, the tradition in the BibTeX world is that this identifier is
human-readable, and that it can be ideally recalled from memory.
Finally, it should be stable.
So we have three requirements in the context of zotero:
1. unique to a library
2. human-readable
3. stable
... and I would add a fourth requirement:
4. that these user-based identifiers can be associated with global
identifiers (URIs, including DOIs as URI). E.g. we need to stop
privileging local identifiers.
So those are the requirements.
Right now, we have URIs for Zotero items that look like:
<http://www.zotero.org/bdarcus/items/2VDXDIMR>
So we have a key like "2VDXDIMR". This solves requirements #1 and #3,
but fails on #2.
We could add a field for the user data where we might have RDF that looks like:
<http://www.zotero.org/bdarcus/items/2VDXDIMR> a rl:Item ;
bibtex:key "elden-2009-territory" ;
rl:source <info:isbn13:9780816654833> .
E.g. people can add the key themselves in their user data, and can edit them.
But that fails on requirement #3.
Alternately, we could just say the id itself should be a
human-readable slug, such that the item URI becomes:
<http://www.zotero.org/bdarcus/items/elden-2009-territory>
While that introduces other issues, it does solve all of the
requirements I note.
Bruce
On Fri, Feb 11, 2011 at 12:17 PM, Richard Karnesky <karn...@gmail.com> wrote:
>> So we have three requirements in the context of zotero:
>> 4. that these user-based identifiers can be associated with global
>> identifiers (URIs, including DOIs as URI). E.g. we need to stop
>> privileging local identifiers.
>
> I'm not completely opposed to this, as:
> Local ID + context = global ID
> (so if you know that the key and the library it came from, you don't
> need much more)
Right.
But there is a subtle thing I keep mentioning, which is that the
zotero item should not be the source data itself, and so they should
have different identifiers.
So there's user data about the item (notes, tags, creator, updated
date-time, maybe key) and then there's item metadata itself (title,
authors, etc.). They should be distinct in both the data layer and the
UI.
That allow you to get to the source data indifferent ways:
- you can get to the user data for the source either with the user +
key, or user + global URI
- you can get to some representation of the data with only the global URI
So I want to decouple processing from specific zotero item ids.
Bruce
I'm not limiting this discussion to BibTeX; trying to fold it into
broader discussion. And in that context, it is relevant.
> Also, to drive home my point regarding access dates: at present, a
> global URI described with BIBO metadata would not provide the
> information needed to cite a webpage, unless the webpage contains a
> cite-able issue date.
I think this is trivial compared to much bigger issues around lack of
document portability.
Bruce
> We're not really looking for a better format than bibtex as biber will
> always support this format but now biber has modular drivers to access
> other data sources so I looked around to see which ones are likely
> candidates and Zotero is certainly one of them. I'd prefer a direct
> connection than an "export first" route like Zotero RDF but perhaps
> this isn't realistic at the moment give the slightly different
> approach (the necessity of keys in the latex bibliography world).
> Biber uses a hacked btparse library to allow the bibtex C routines to
> deal with some UTF-8 cases it needs and the internal UTF-8
> capabilities of biber are very good so it's not so much a limitation
> that's driving this, it's more a desire to allow biber/biblatex users
> to draw from non-bibtex data sources. We have a beta RIS driver and
> are working on a dedicated biblatexml format which maps closely to the
> internal data structures biber uses for bibliography processing but
> there are clearly some major sources like Zotero which I'd like to be
> able to deal with ...
Have you looked closely at bibutils? This seems to me the most
comprehensive data and format library out there (supports mods, ris,
endnote/refer, OOXML, bibtex, biblatex, etc.), and the mappings are
all laid in the C source as a series of simple maps. If nothing else,
you might be able to borrow the essential mapping logic and model
(which is the hard part).
FWIW, I had originally recommended Chris use MODS for his core format,
since he had a custom XML format.
BIBO RDF is based on things I learned working with MODS, but is also
designed to really exploit RDF and linked data principles, while fully
supporting (I hope) the Zotero data. That world is linked together not
by local keys, but global URIs. So the scope is more ambitious, you
might say.
To support BIBO in biber, though, a couple of things you'd want:
1) a generic RDF parser
2) a way to map those RDF triples to your own internal model; here's
just one example I wrote in Python using rdf-object mapper (which
include #1 above):
<https://github.com/bdarcus/bibo-py>
From comments in the models.py file:
"""
This provides basic object mapping for key classes and relations in the
Bibliographic Ontology (bibo). Examples:
>>> book = Book('<http://example.net/books/1>')
>>> book.date = "2001"
>>> publisher = Organization('<http://abcbooks.com>')
>>> publisher.name = "ABC Books"
>>> publisher.city = "New York"
>>> book.title = "Some Book Title"
>>> book.publisher = publisher
>>> print(book.publisher.name)
... ABC Books
""
3) a way to map these global objects to local document keys, and vice
versa (an easy step, but still a level of indirection*).
Bruce
* E.g., in the python example, you're talking something like:
dockey == zotero_item.source
... where you find the zotero item by its label property.
> Is there a reason why the Wiki link on the BIBO main page is broken?
Which "wiki link"? And you mean this "main page"?
Bruce
I haven't explored name representations in RDF, but I'm sure there has
been some work done in this sphere already. citeproc-js also prefers
to consume more fine-grained name data than Zotero provides, and it
would be good to bring more flexible names to Zotero itself some day.
Working out a reasonable way to represent the names in BIBO/RDF could
be an important step in that direction.
Avram
Yeah, that's why we defer to FOAF for agent (including name) representation.
But it's worth keeping in mind that RDF is beautifully extensible. So
you could invent your own properties for these details (and as Avram
notes, we tackled some of this in the CSL/citeproc-js arena).
Perhaps a better approach would be to post a note on the bibo google
group (where foaf people hangout also) laying out your concerns, and
see if we could come up with a solution.
One wrinkle to consider is that RDF is, like a relational database, a
fundamentally unordered data model. So solutions are best that don't
depend on order where avoidable.
Bruce
It's best not to get too focused on the syntax. If you were to run an
RDF parser on that files, you'd see that those nodes get expanded to a
full URI.
Keys in RDF are URIs, so as I said earlier, you'd need a way to map
the URI to a local key. Certainly Zotero will provide that, but don't
yet.
Also, I submitted a bug report here that explained those values are
more appropriate as <http://zotero.org/user/doe/12353124>.
Bruce
My suggestion for you, then, is to do what we've done in the CSL
world: create a JSON representation which matches your internal model.
It'd be nice if we could settle on a common one, but I have a feeling
that may be hard.
For comparison, CSL JSON currently looks like (though it's not
specified, and so subject to change):
{
"id": "doe:99",
"authors": [
{
"family": "Doe",
"given": "Jane"
}
],
"title": "The Title",
"container-title": "Publication Title",
"issued": [2000, 3, 14]
}
> 2. The lack of a suer-defined citation key in entries is a major
> problem and there is now real way round it. The natural model for
> citation key based systems is that users cite by key, the backed takes
> the key, looks through a data source for an entry with that key and
> constructs some object from that entry to use in e.g. typesetting.
> This is fast since you don't have to look in entries without that key.
> Using auto-generated keys is slow and messy - you have to open every
> data source entry, construct a key from some information and then
> compare it to the key the user used - slow, messy and ugly. The Zotero
> model of using as "key" whatever it thinks is the best uniquely
> identifying information (URL, DOI, whatever) doesn't play well with
> this - the key needs to be stable. More importantly, it needs to be
> user defined so that people can avoid special chars which are not
> allowed in citation keys in some systems. URLS are horrible in this
> regard - LaTeX certainly can't use most URLS as keys as they contain
> all sort of LaTeX unfriendly characters. Even the default Zotero key
> which is used if there is no "identifying" information breaks LaTeX
> (#item_nn). Also, when you are reading document sources with
> citations, keys like URLS are pretty useless if you want a quick idea
> of what the citation refers to. Traditional citation keys, user-
> defined, like "Smithetal:2010" are much better.
Yes, I've run into this myself today trying to use Zotero-sourced data
as a source for a pandoc/citeproc-based workflow. I ended up writing
code to add the right keys to the MODS output, and it was definitely a
PITA.
I'd be in favor of what I've previously suggested:
a) a field to hold this data; I'd call it something generic like label
b) a better function to create a default key/label and populate the field
c) but, it could be edited by the user
This label would then map to:
- bibtex key
- mods:mods/@ID
- a property on the RDF
- ID on RIS
- etc.
Bruce
This has been green-lighted for the next revision of the Zotero data
model (probably Zotero 2.2) (see
https://github.com/ajlyon/zotero-bits/issues#issue/24), and so the
Zotero folks are interested in getting this into CSL soon too.
Avram
Not following the last point. What is missing in CSL?
Bruce
Oh, that's just me not knowing much about CSL and making mistakes left
and right. This is of course already in CSL 1.0. This will hopefully
be in Zotero and mapped to the existing citation-label in CSL.
Avram