citeproc JSON from the web API

270 views
Skip to first unread message

Erik Hetzner

unread,
Dec 6, 2011, 1:30:08 AM12/6/11
to zotero-dev
Hi,

Is there any progress on getting citeproc compatible JSON from the
Zotero web API?

I believe this provides information about the format:

https://github.com/citation-style-language/schema

I am interested because, to begin with, I would like to extend zot4rst
to create a program to proprocess a markdown to fetch all referenced
citations from zotero and store them locally, so that pandoc can
process them.

Thanks for any help!

best, Erik

Avram Lyon

unread,
Dec 6, 2011, 1:37:11 AM12/6/11
to zoter...@googlegroups.com
On a marginally related note, the folks at CrossRef are now providing
citeproc-compatible JSON (and running citeproc-js, maybe using
citeproc-node):
http://www.crossref.org/CrossTech/2011/11/turning_dois_into_formatted_ci.html
(Thanks to Frank for the heads-up.)

A good step towards getting citeproc JSON in and out of the API would
be import and export translators for Zotero, since the announced plan
was to support all the export translators via the API eventually.

Avram

> Sent from my free software system <http://fsf.org/>.
>
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
>

Dan Stillman

unread,
Dec 6, 2011, 2:22:53 AM12/6/11
to zoter...@googlegroups.com
On 12/6/11 1:30 AM, Erik Hetzner wrote:
> Is there any progress on getting citeproc compatible JSON from the
> Zotero web API?

Sure. I've just added content=csljson as an API option. Let us know if
you notice any problems.

format=csljson will be added once we decide how to handle item limits
for non-Atom formats. (format=bib, the only other one publicly
available, is currently limited to 150 items.)

We're using application/x-csl+json as the content type. CrossRef is
apparently using application/citeproc+json, but 1) this isn't a
registered type and should therefore be prefixed with "x-" and 2) the
CSL spec calls this format "JSON schema for CSL input data", with no
mention of "citeproc". But this should be discussed on the xbiblio list.

Bruce D'Arcus

unread,
Dec 6, 2011, 7:58:57 AM12/6/11
to zoter...@googlegroups.com

One caveat to keep in mind: that data format may, and likely will, change.

Erik Hetzner

unread,
Dec 7, 2011, 2:18:36 AM12/7/11
to zoter...@googlegroups.com, Dan Stillman
At Tue, 06 Dec 2011 02:22:53 -0500,

Thanks, Dan! I’ll let you know if there are any difficulties. This
should be a big help for users of pandoc and zot4rst.

best, Erik

Erik Hetzner

unread,
Dec 11, 2011, 3:33:07 PM12/11/11
to zoter...@googlegroups.com, Dan Stillman
At Tue, 06 Dec 2011 02:22:53 -0500,
Dan Stillman wrote:
>

Hi Dan,

Thanks for this. I am trying to get my code to work.

I am having some troubles with parsing one entry with feedparser
(python), but it appears to be feedparser bug. I have submitted this
bug to feedparser, and post it here in case others have similar
issues:

http://code.google.com/p/feedparser/issues/detail?id=316

While researching the problem, I did discover that the feed is
invalid. The <entry> requires a <summary>.

Additionally, it is being served as HTTP/1.0, which doesn’t seem
right.

Thanks again!

best, Erik

Dan Stillman

unread,
Dec 11, 2011, 4:54:02 PM12/11/11
to zotero-dev
On 12/11/11 3:33 PM, Erik Hetzner wrote:
> While researching the problem, I did discover that the feed is
> invalid. The<entry> requires a<summary>.

No it doesn't. (It's best to include supporting evidence when making
spec claims.)

http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.entry

"atom:entry elements MUST contain an atom:summary element in either of
the following cases:
the atom:entry contains an atom:content that has a "src" attribute (and
is thus empty).
the atom:entry contains content that is encoded in Base64; i.e., the
"type" attribute of atom:content is a MIME media type [MIMEREG], but is
not an XML media type [RFC3023], does not begin with "text/", and does
not end with "/xml" or "+xml"."

http://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fapi.zotero.org%2Fgroups%2F1%2Fitems

> Additionally, it is being served as HTTP/1.0, which doesn�t seem
> right.

There's nothing inherently wrong with HTTP/1.0. Part of our
load-balancing stack serves only HTTP/1.0 to clients. It still accepts
HTTP/1.1 requests and converts implicit HTTP/1.1 keep-alives to
"Connection: keep-alive", which clients support for HTTP/1.0 responses.

Erik Hetzner

unread,
Dec 11, 2011, 5:23:33 PM12/11/11
to zoter...@googlegroups.com, Dan Stillman
At Sun, 11 Dec 2011 16:54:02 -0500,

Dan Stillman wrote:
>
> On 12/11/11 3:33 PM, Erik Hetzner wrote:
> > While researching the problem, I did discover that the feed is
> > invalid. The<entry> requires a<summary>.
>
> No it doesn't. (It's best to include supporting evidence when making
> spec claims.)
>
> http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.entry
>
> "atom:entry elements MUST contain an atom:summary element in either of
> the following cases:
> the atom:entry contains an atom:content that has a "src" attribute (and
> is thus empty).
> the atom:entry contains content that is encoded in Base64; i.e., the
> "type" attribute of atom:content is a MIME media type [MIMEREG], but is
> not an XML media type [RFC3023], does not begin with "text/", and does
> not end with "/xml" or "+xml"."
>
> http://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fapi.zotero.org%2Fgroups%2F1%2Fitems

Hi Dan,

Sorry for not being clear & providing the data the was failing
validation. You will need to paste this into the feed validator
(http://validator.w3.org/feed/#validate_by_input):

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:zapi="http://zotero.org/ns/api">
<title>First Book</title>
<author>
<name>egh</name>
<uri>http://zotero.org/egh</uri>
</author>
<id>http://zotero.org/egh/items/ZBZQ4KMP</id>
<published>2011-09-01T01:13:56Z</published>
<updated>2011-09-01T01:13:56Z</updated>
<link rel="self" type="application/atom+xml" href="https://api.zotero.org/users/1254/items/ZBZQ4KMP?content=csljson"/>
<link rel="alternate" type="text/html" href="http://zotero.org/egh/items/ZBZQ4KMP"/>
<zapi:key>ZBZQ4KMP</zapi:key>
<zapi:itemType>book</zapi:itemType>
<zapi:creatorSummary>Doe</zapi:creatorSummary>
<zapi:year>2005</zapi:year>
<zapi:numChildren>0</zapi:numChildren>
<zapi:numTags>0</zapi:numTags>
<content type="application/x-csl+json">{"id":"ZBZQ4KMP","type":"book","title":"First Book","publisher":"Cambridge University Press","publisher-place":"Cambridge","event-place":"Cambridge","author":[{"family":"Doe","given":"John"}],"issued":{"raw":"2005"}}</content>
</entry>

Here is the response:

This feed does not validate.

line 20, column 0: Missing summary [help]

</entry>

In fact, reading section 4.1.3.3 of RFC 4287, it seems that perhaps
the content needs to be base64 encoded, but I haven’t read the rest of
the spec carefully, so I am not sure.

> > Additionally, it is being served as HTTP/1.0, which doesn’t seem


> > right.
>
> There's nothing inherently wrong with HTTP/1.0. Part of our
> load-balancing stack serves only HTTP/1.0 to clients. It still accepts
> HTTP/1.1 requests and converts implicit HTTP/1.1 keep-alives to
> "Connection: keep-alive", which clients support for HTTP/1.0 responses.

OK, thanks for clearing this up. I thought perhaps somebody had made a
mistake somewhere.

Thanks again for your help!

best, Erik

Message has been deleted

Dan Stillman

unread,
Dec 12, 2011, 3:30:50 AM12/12/11
to zotero-dev
On 12/11/11 5:23 PM, Erik Hetzner wrote:
> Sorry for not being clear& providing the data the was failing

OK, the issue here is that value of the <content> block's 'type'
attribute, 'application/x-csl+json' (and 'application/json', for that
matter, which content=json uses), fails Atom's test for textual content,
in that "the 'type' attribute of atom:content is a MIME media type, but
is not an XML media type, does not begin with 'text/', and does not end
with '/xml' or '+xml'", and the content should therefore technically be
Base64-encoded. The summary field itself isn't really an issue ( 4.1.1.1
says " the absence of atom:summary is not an error, and Atom Processors
MUST NOT fail to function correctly as a consequence of such an
absence"), but the lack of Base64-encoding could break processors. (In
fact, it looks like you're not the first to hit this, since your
FeedParser ticket[1] was merged with one created by Stephan Hügel[2],
who wrote Pyzotero — I think he ended up parsing the feed as straight
XML, possibly partly because of this.)

This is a bit of a tricky issue. It seems pretty clear that Atom's
Base64-encoding requirement is to allow the transfer of binary or
otherwise human-unreadable data. There's really no point to
Base64-encoding JSON, which is designed to be easy to read. But with the
official JSON MIME type, we're going to break some processors.

So, we have a couple options here:

1) Switch to Base64-encoded JSON in a new API version. Have pprint=1
serve 'text/json' and 'text/x-csl+json' without B64-encoding for human
consumption. Encourage existing clients to use an explicit version
number in requests. After a month or two, switch to the new version as
the default, and eventually deprecate the old version. Downside: we'd
have to support versioning, which we've avoided up to now. Not the end
of the world, but a little annoying for everybody.

2) Switch to 'text/json' and 'text/x-csl+json', which wouldn't need to
be Base64-encoded, by default. While the MIME type matters for feed
processors, it doesn't really matter for client code, which can just
assume it's getting back the content type it requested. Downside: we'd
be including incorrect MIME types in our responses, which might
encourage the use of those in other places.

I guess I'm inclined to go with #2, since I can't think of anything
other than feed processors that needs to care about the MIME type. The
API docs could clarify the reason for the incorrect values. If we
started serving multiple content types in the same response as
subelements of <content>, we could identify the subelements by their
parameter values (e.g., "csljson") rather than MIME types (which would
necessitate either being inconsistent or having clients look for the
invalid values).

Thoughts?

[1] http://code.google.com/p/feedparser/issues/detail?id=316
[2] http://code.google.com/p/feedparser/issues/detail?id=284

Stephan Hügel

unread,
Dec 12, 2011, 6:49:49 AM12/12/11
to zoter...@googlegroups.com
Number 2 seems to be the more sensible option.

Erik, you can simply monkeypatch feedparser like so:
http://goo.gl/ZFskX (lines 51 - 69 inclusive), and you can easily adapt it to do the same for x-csl+json, which I'll probably have to do myself.

Erik Hetzner

unread,
Dec 12, 2011, 1:19:12 PM12/12/11
to zoter...@googlegroups.com, Dan Stillman
At Mon, 12 Dec 2011 03:24:11 -0500,

Hi Dan,

Speaking for myself, as a new API user, I can easily use the new
version, so my vote is for #1. But this of course does not take into
account existing client API client users.

Since text/json is not a registered mimetype, why not use either
text/javascript (technically correct, I think, although obsolete) or
text/plain (also technically correct) for the the pretty printed
version?

best, Erik

Erik Hetzner

unread,
Dec 12, 2011, 1:19:54 PM12/12/11
to zoter...@googlegroups.com, Stephan Hügel
At Mon, 12 Dec 2011 03:49:49 -0800 (PST),

Thanks for the tip, Stephan. I will use this to see if I can get my
code to work.

best, Erik

Avram Lyon

unread,
Dec 12, 2011, 1:31:24 PM12/12/11
to zoter...@googlegroups.com
On Mon, Dec 12, 2011 at 12:30 AM, Dan Stillman <dsti...@zotero.org> wrote:
[..]

> 1) Switch to Base64-encoded JSON in a new API version. Have pprint=1 serve
> 'text/json' and 'text/x-csl+json' without B64-encoding for human
> consumption. Encourage existing clients to use an explicit version number in
> requests. After a month or two, switch to the new version as the default,
> and eventually deprecate the old version. Downside: we'd have to support
> versioning, which we've avoided up to now. Not the end of the world, but a
> little annoying for everybody.
>
> 2) Switch to 'text/json' and 'text/x-csl+json', which wouldn't need to be
> Base64-encoded, by default. While the MIME type matters for feed processors,
> it doesn't really matter for client code, which can just assume it's getting
> back the content type it requested. Downside: we'd be including incorrect
> MIME types in our responses, which might encourage the use of those in other
> places.

What if we replaced 'application/json' with 'text/x-zotero+json' or
something of the sort? The fact that it is JSON means very little to a
naive interpreter of the Atom feed, so we could just as well move away
from the general JSON MIME type altogether.

I'm perfectly happy with any solution we settle on-- it would require
just a tiny change in Zandy's code.

Avram

Dan Stillman

unread,
Dec 12, 2011, 1:47:39 PM12/12/11
to zoter...@googlegroups.com

Yeah, I was just considering the same in response to Erik's question,

"why not use either text/javascript (technically correct, I think,

although obsolete) or text/plain (also technically correct)", the answer
being that Atom defaults to "text" anyway in the absence of a type, so
the only reason to include a MIME type at all is to improve the
readability of the response, and it might as well be explicit.

Erik Hetzner

unread,
Dec 12, 2011, 2:17:41 PM12/12/11
to zoter...@googlegroups.com, Dan Stillman
At Mon, 12 Dec 2011 13:47:39 -0500,

Hi Dan,

That makes sense, I hadn’t known that.

What about using vendor media types? [1]:

text/vnd.zotero.csl+json
text/vnd.zotero.citation+json

best, Erik

1. http://codebetter.com/sebastienlambla/2011/02/01/minting-new-internet-media-type-identifiers/

Avram Lyon

unread,
Dec 12, 2011, 2:54:18 PM12/12/11
to zoter...@googlegroups.com
On Mon, Dec 12, 2011 at 11:17 AM, Erik Hetzner <e...@e6h.org> wrote:
> What about using vendor media types? [1]:
>
>  text/vnd.zotero.csl+json
>  text/vnd.zotero.citation+json
>
> best, Erik
>
> 1. http://codebetter.com/sebastienlambla/2011/02/01/minting-new-internet-media-type-identifiers/

With the caveat that Zotero isn't the vendor for CSL, I think this is
a worthwhile approach.

Avram

Dan Stillman

unread,
Dec 12, 2011, 9:11:59 PM12/12/11
to zotero-dev

CSL JSON isn't Zotero-specific. The real one (subject to discussion on
xbiblio) will likely be application/x-csl+json (or without the "x-" if
the CSL project gets around to registering it). And the Zotero one isn't
supposed to be a type that would get used anywhere else (and if it did
it would start with application/), so there's no point in making it look
like it is.

With that in mind, though, I think there's a better solution. As I noted
earlier, we're planning to offer multiple content types in responses at
some point, with the subelements identifiable by their 'content'
parameter values (e.g., "csljson"). For single responses with non-XML
content, we can just leave out atom:type, letting it default to "text",
and use zapi:type to indicate the type for someone viewing the response:

<content zapi:type="csljson">
{...}
</content>

or

<content type="xhtml" zapi:type="bib">
<div xmlns="http://www.w3.org/1999/xhtml">...</div>
</content>

Bruce D'Arcus

unread,
Dec 12, 2011, 9:17:40 PM12/12/11
to zoter...@googlegroups.com

What's involved in registering?

> And the Zotero one isn't supposed to be a type that would get used anywhere else (and if it did it would start with application/), so there's no point in making it look like it is.
>
> With that in mind, though, I think there's a better solution. As I noted earlier, we're planning to offer multiple content types in responses at some point, with the subelements identifiable by their 'content' parameter values (e.g., "csljson"). For single responses with non-XML content, we can just leave out atom:type, letting it default to "text", and use zapi:type to indicate the type for someone viewing the response:
>
> <content zapi:type="csljson">
>    {...}
> </content>
>
> or
>
> <content type="xhtml" zapi:type="bib">
> <div xmlns="http://www.w3.org/1999/xhtml">...</div>
> </content>
>
>

Avram Lyon

unread,
Dec 12, 2011, 11:05:28 PM12/12/11
to zoter...@googlegroups.com
On Mon, Dec 12, 2011 at 6:17 PM, Bruce D'Arcus <bda...@gmail.com> wrote:
>> CSL JSON isn't Zotero-specific. The real one (subject to discussion on
>> xbiblio) will likely be application/x-csl+json (or without the "x-" if the
>> CSL project gets around to registering it).
>
> What's involved in registering?

Per RFC 4288 / BCP 13 (http://www.ietf.org/rfc/rfc4288.txt):
While public exposure and review of media types to be registered in
the vendor tree is not required, using the ietf-...@iana.org
mailing list for review is strongly encouraged to improve the quality
of those specifications. Registrations in the vendor tree may be
submitted directly to the IANA.

Details in the RFC.

Avram

Erik Hetzner

unread,
Dec 13, 2011, 12:53:38 AM12/13/11
to zoter...@googlegroups.com, Avram Lyon
At Mon, 12 Dec 2011 20:05:28 -0800,

Note, however, that registering the media type “application/csl+json”
(as opposed to “application/vnd.citationstyles.csl+json”) requires a
standards body to endorse some sort of recommendation or RFC.

best, Erik

Erik Hetzner

unread,
Dec 13, 2011, 1:14:19 AM12/13/11
to zotero-dev
At Mon, 12 Dec 2011 15:14:54 -0500,

Dan Stillman wrote:
>
> CSL JSON isn't Zotero-specific. The real one (subject to discussion on
> xbiblio) will likely be application/x-csl+json (or without the "x-" if
> the CSL project gets around to registering it). And the Zotero one isn't
> supposed to be a type that would get used anywhere else (and if it did
> it would start with application/), so there's no point in making it look
> like it is.

Getting application/csl+json registered with IANA would necessarily
involve a formal standards body (see RFC 4288), something that I doubt
will happen. That is why I suggested the vnd tree. Subtypes using an
x- are generally frowned upon these days, in my opinion with good
reason.

You are right that JSON data should be an application type, not a text
type, especially since JSON itself is an application type.

With regard to your last point, I think that you wanted to make it
clear what the content was, in which case you may as well give it a
useful media type, so that client’s can process it.

There is no requirement to register media types in the vnd tree.
Formats tend to, in my experience, show up in strange place (that is,
they get reused), so you may as well give it a useful media type.

So why not:

application/vnd.zotero.citation+json
application/vnd.citationstyles.citation+json

Obviously the 2nd media type needs correction from the CSL devs.

> With that in mind, though, I think there's a better solution. As I noted
> earlier, we're planning to offer multiple content types in responses at
> some point, with the subelements identifiable by their 'content'
> parameter values (e.g., "csljson"). For single responses with non-XML
> content, we can just leave out atom:type, letting it default to "text",
> and use zapi:type to indicate the type for someone viewing the response:
>

> […]

This might be a better idea. Frankly I wouldn’t want to go down that
road, with two different “type” attributes, but I don’t really
understand the use case.

best, Erik

Dan Stillman

unread,
Dec 13, 2011, 4:58:55 PM12/13/11
to zoter...@googlegroups.com
On 12/12/11 9:11 PM, Dan Stillman wrote:
> With that in mind, though, I think there's a better solution. As I
> noted earlier, we're planning to offer multiple content types in
> responses at some point, with the subelements identifiable by their
> 'content' parameter values (e.g., "csljson"). For single responses
> with non-XML content, we can just leave out atom:type, letting it
> default to "text", and use zapi:type to indicate the type for someone
> viewing the response:
>
> <content zapi:type="csljson">
> {...}
> </content>
>
> or
>
> <content type="xhtml" zapi:type="bib">
> <div xmlns="http://www.w3.org/1999/xhtml">...</div>
> </content>

This is now implemented as part of multi-format support, though it
hasn't yet been pushed to production.

https://github.com/zotero/dataserver/commit/cfc9015f11c09e0b8194a9c6e301cf07b8f731e3

This should fix parsing problems with feed processors when requesting
JSON data, which will no longer be expected to be Base64-encoded.
Existing clients shouldn't be affected as long as they're not using feed
processors with broken namespace support (i.e., such that the feed
processor tries to use 'zapi:type' instead of 'type' and then tries to
Base64-decode the content).

Stephan Hügel

unread,
Dec 13, 2011, 5:07:03 PM12/13/11
to zoter...@googlegroups.com
Dan,
Will you post when it's live so I can grab some new responses for my unit tests?

Dan Stillman

unread,
Dec 13, 2011, 5:08:40 PM12/13/11
to zoter...@googlegroups.com
On 12/13/11 5:07 PM, Stephan Hügel wrote:
> Will you post when it's live so I can grab some new responses for my
> unit tests?

Yes, absolutely.

Dan Stillman

unread,
Feb 14, 2012, 9:24:45 PM2/14/12
to zoter...@googlegroups.com
On 12/6/11 2:22 AM, Dan Stillman wrote:
> On 12/6/11 1:30 AM, Erik Hetzner wrote:
>> Is there any progress on getting citeproc compatible JSON from the
>> Zotero web API?
>
> Sure. I've just added content=csljson as an API option. Let us know if
> you notice any problems.
>
> format=csljson will be added once we decide how to handle item limits
> for non-Atom formats. (format=bib, the only other one publicly
> available, is currently limited to 150 items.)

format=csljson is now available via the API. As with the export formats
I announced earlier today, it requires an explicit 'limit' parameter for
clarity.

We're using application/vnd.citationstyles.csl+json as the MIME type,
pursuant to a discussion on the CSL list. I'll post here if that changes.

Reply all
Reply to author
Forward
0 new messages