Xerxes and Zotero

10 views
Skip to first unread message

Scot Dalton

unread,
May 8, 2012, 10:34:06 AM5/8/12
to xerxes...@googlegroups.com
Hi all,

We have a problem whereby many of our local discovery layers do
citation management separately w/ their own quirks and "push to"
formats/tools. We're interested in centralizing citation management
into one translation service that will translate metadata from a
variety of local discovery sources to a variety of formats/tools. I
imagine we're going to leverage both the Zotero DB schema[1] and
Zotero translators[2] quite heavily in doing this work. We'd like to
know if anyone has created/is creating a Xerxes translator that we
could use. Otherwise, we'll look to create one to merge back to
Zotero and leverage that for our own work.

Thanks,
Scot


[1] https://github.com/zotero/zotero/blob/master/resource/schema/system.sql
[2] https://github.com/zotero/translators

--
Scot Dalton
Phone: (212) 998-2674
Web Services
Division of Libraries
New York University

Jonathan Rochkind

unread,
May 8, 2012, 11:56:37 AM5/8/12
to xerxes...@googlegroups.com, Scot Dalton
As far as I know, no.

But best might be to fix Xerxes to provide data in a _standard_ format
that Zotero already reads, and them make sure Zotero can find it, rather
than add a Xerxes-specific adapter to Zotero.

That's what I'd recommend.

Xerxes already has logic in it to create those human-style citations.
(It's _definitely_ imperfect logic, but hopefully good enough. Getting
_perfect_ citation data elements out of the actually existing data
Xerxes/Metalib has to work with is impossible; it could probably be
improved, but only with difficulty).

That's the right logic to get the individual citation elements out. You
just need to have Xerxes output them in some structured format Zotero
can read. Hopefully one of the formats Zotero can _already_ read is
suitable, Xerxes just needs to output in that format somewhere Zotero
can find it.

That's what I'd recommend.

Warning, from working on this sort of problem before, this stuff always
gets trickier than you expect, both in terms of getting the citation
data elements 'good enough', and in terms of getting Zotero to reliably
find them.

Walker, David

unread,
May 8, 2012, 12:00:57 PM5/8/12
to xerxes...@googlegroups.com, Scot Dalton
Yeah, when looking at this before, it seemed like unapi (maybe with RIS as the format?) would provide the best way.

But it's been awhile since I've contemplated that. Are you thinking of something similar, Scot?

--Dave

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845
--
You received this message because you are subscribed to the Google Groups "xerxes-portal" group.
To post to this group, send email to xerxes...@googlegroups.com.
To unsubscribe from this group, send email to xerxes-porta...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xerxes-portal?hl=en.

Jonathan Rochkind

unread,
May 8, 2012, 12:52:30 PM5/8/12
to xerxes...@googlegroups.com, Walker, David, Scot Dalton
Last time I tried unapi, it was a mess.

unapi also doesn't validate under HTML5.

But I'm not sure what Zotero can consume that's better. It might _have_
to be a Xerxes-specific one, if Zotero doesn't actually _have_ a usable
common format. But that's a sad thing for zotero.

Scot Dalton

unread,
May 8, 2012, 12:52:41 PM5/8/12
to Walker, David, xerxes...@googlegroups.com
If we were going to go with an existing Zotero format, I was thinking
COinS, since Xerxes already has a CO. I don't know much about unapi,
but will certainly look into it. My only concern with using an
existing format is data loss, but if it's going to be more trouble
than it's worth, then maybe it's not that big of a concern.

Jonathan, thanks for the warning. This stuff can definitely get
really hairy, really quickly, but I'm hoping that leveraging some of
the work the Zotero folks have already done will make it manageable.

-Scot

Walker, David

unread,
May 8, 2012, 12:55:30 PM5/8/12
to Scot Dalton, xerxes...@googlegroups.com
I don't think you want to use COINS since you'll lose quite a bit of data -- abstract, additional authors, subject terms (maybe not so useful), and so on.

Jonathan Rochkind

unread,
May 8, 2012, 2:19:34 PM5/8/12
to xerxes...@googlegroups.com, Scot Dalton, Walker, David
Yeah, COinS gives you some data loss on author names. (can only do
seperate first/last with ONE author, not multiple).

Really, someone should talk to Zotero and be like, look guys, _give_ us
a preferred common format, even if it's one you invent. We WANT to use
your preferred format, not make you maintain a custom adapter for our
format.

I don't know if Zotero has such a thing though.

Jonathan Rochkind

unread,
May 8, 2012, 2:22:16 PM5/8/12
to xerxes...@googlegroups.com, Scot Dalton, Walker, David
The problem with a custom adapter for Xerxes, if the custom adapter is
screen-scraping HTML, is that it'll break whenever we change Xerxes
HTML, either globally in stock Xerxes, or when someone locally does.

A standard format Zotero can give us that we can implement, and know
it'll keep working as long as we keep implementing it right, is really
the way to go.

If you want, you could try communicating with Zotero developers for such.

unapi ain't a good one though, if for no other reason than unapi
violates HTML5 and will cause HTML5 validation to fail.

I once proposed something somewhat similar to unapi but using
html5-compliant microdata conventions, but the zotero developer who was
in the email thread on such (on the Blacklight list) never replied to my
suggestion.

Scot

unread,
May 8, 2012, 7:52:49 PM5/8/12
to xerxes...@googlegroups.com
For a local Xerxes translator, I was thinking of using an AJAX call to get the
record in XML (appending format=xerxes, similar to the Primo
translator[1]) and then map the Xerxes XML to the Zotero schema. An
even easier way to do it would be to embed the MetaLib X-Service URL
in the page somewhere and then use the MarcXML translator[2] to parse
it, but I'd run into a same origin scripting issue. Is there anyway
for Xerxes to send back MarcXML?  Would we need to define a separate
format in Xerxes?

Thanks,
Scot

[1] https://github.com/zotero/translators/blob/master/Primo.js
[2] https://github.com/zotero/translators/blob/master/MARCXML.js

Jonathan Rochkind

unread,
May 8, 2012, 9:53:48 PM5/8/12
to xerxes...@googlegroups.com
If Zotero has a MarcXML translator, can't you just GIVE MarcXML to Zotero, and let it run it's own translator? You shouldn't need to run it's translator yourself somehow.

Xerxes deals in MarcXML already internally (at least for the Metalib part of Xerxes, that's what Metalib returns), so if there isn't some way to get MarcXML out now, we can easily add it. We could make &format=marcxml return the original marcxml the way &format=xerxes returns the Xerxes XML representation.

And then you can just let Zotero know about this somehow? We could even make Xerxes advertise it with a <link type="application/xml+marc"> tag (or whatever the 'best' content-type for marc-xml is.)

That would get you what you're thinking of doing with MarcXML and Zotero, right? No need to deal with "using an AJAX call" or manually using the Zotero MarcXML translator.

However, i suspect this (either way) won't get you as far as you think. Not all MarcXML is created equal. The Metalib-provided MarcXML is often idiosyncratic or odd. A lot of Xerxes work is already trying to deal with this and normalize it. But I don't think Xerxes normalization feeds back into new MarcXML neccesarily. I also suspect that Zotero's MarcXML translator is optimized for _book_ records (like from a catalog), and not _individual article_ records -- like in Metalib/Xerxes. It's unusual to put article records in MARC. You need to look in different parts of the marc record to get all the relevant citation details for an article that don't apply to a book (article title as well as journal title; volume, issue, page #, etc). I'd bet money Zotero isn't doing it. But you can investigate yourself.

I think it would be a better bet to _not_ use the raw MarcXML from Metalib, but piggyback on top of the normalization Xerxes is _already_ doing on top of the raw MarcXML to produce it's own human readable citations.

You just need to figure out _some_ format Zotero can take this. And it will not be hard to make Xerxes produce it, say at a URL that's the record page URL but with a "&format=something" on the end. It will still make data errors, but no more than it does currently for the human readable citations. That may or may not be good enough. You just need to get Zotero to know that when it's looking at a Xerxes item detail page, just add "&format=something" on the end to get the record detail in whatever 'something' format it wants. Something more complicated may be needed for search results pages. But it should _not_ have to include any "using an AJAX calls" or anything. Look at the code for Zotero unapi -- it will be something similar, but not based on unapi, but based on something else (perhaps microdata, I have thought this through and can tell you exactly how I think it should be done, or find my old email on this topic).

The first step is just figuring out what the heck format to give the data to Zotero in. I suspect MarcXML is not actually going to work well.

________________________________
From: xerxes...@googlegroups.com [xerxes...@googlegroups.com] on behalf of Scot [scotd...@gmail.com]
Sent: Tuesday, May 08, 2012 7:52 PM
To: xerxes...@googlegroups.com
Subject: Re: [xerxes-portal] Xerxes and Zotero

--
You received this message because you are subscribed to the Google Groups "xerxes-portal" group.
To view this discussion on the web visit https://groups.google.com/d/msg/xerxes-portal/-/11rg7Qq8_64J.

Walker, David

unread,
May 9, 2012, 9:39:00 AM5/9/12
to xerxes...@googlegroups.com
I, too, would stay away from MARC-XML.

I think RIS format would be the best bet, and I know Zotero already knows what to do with that. That's what we use when we *export* to Zotero.

It's just a matter of how to get the RIS to Zotero using this other way. If not unapi then however they are doing that these days.

--Dave

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845


-----Original Message-----
From: xerxes...@googlegroups.com [mailto:xerxes...@googlegroups.com] On Behalf Of Jonathan Rochkind

Scot Dalton

unread,
May 9, 2012, 9:47:50 AM5/9/12
to xerxes...@googlegroups.com
Right. Regarding the AJAX call, Zotero (via the Xerxes translator)
would be doing that, as you described. That's what the Zotero Primo
translator does. Since Xerxes can have non-marc sources, it sounds
like it's best to map the Xerxes XML to Zotero (hopefully through a
standard format.) Alternatively, we can embed some standard format
Zotero understands in the Xerxes page and do away with the need for a
Xerxes translator (i.e. the callback to get the Xerxes record in
standard format.)

Just a quick look through the existing translators shows some
possibilities for standard formats:
- Embedded Metadata [1]
- unAPI [2]
- RDF [3]
- RIS [4]
- DC RDF [5]
- Zotero RDF [6]
- BibTex [7]

Looks like there are options.

Thanks,
Scot

[1] https://github.com/zotero/translators/blob/master/Embedded%20Metadata.js
[2] https://github.com/zotero/translators/blob/master/unAPI.js
[3] https://github.com/zotero/translators/blob/master/RDF.js
[4] https://github.com/zotero/translators/blob/master/RIS.js
[5] https://github.com/zotero/translators/blob/master/Unqualified%20Dublin%20Core%20RDF.js
[6] https://github.com/zotero/translators/blob/master/Zotero%20RDF.js
[7] https://github.com/zotero/translators/blob/master/BibTeX.js

Scot Dalton

unread,
May 9, 2012, 12:52:36 PM5/9/12
to xerxes...@googlegroups.com
So we're going to write a small unapi server in our local Xerxes
instance as a first pass at this. The unapi server will expose a
Xerxes record as RIS via the current ris.xsl.

Including <link rel="unapi-server" type="application/xml"
title="unAPI" href="/unapi"> and
<abbr title="group%3D<GROUP>%26resultSet%3D<RESULT_SET>%26startRecord%3D<RECORD>"></abbr>
should be enough for Zotero.

Hope this is a reasonable approach to get started.

Thanks,
Scot

Jonathan Rochkind

unread,
May 9, 2012, 1:13:46 PM5/9/12
to xerxes...@googlegroups.com, Scot Dalton
Again I'll warn you that unapi tags do not validate under HTML5.

But it's a place to start.

There are a couple possible ways to reproduce the functionality of unapi
using pretty simple HTML5 microdata conventions. I have investigated it
before. Zotero would have to be changed to recognize it though.

I remembered I actually wrote up my proposal/thoughts on this on my
blog, hooray, so here you go if you're interested:

http://bibwild.wordpress.com/2011/09/22/alternate-format-microdata/



I would recommend looking into changing Xerxes so _any_ record can be
obtained in RIS format via a URL with, say, &format=ris in it. (Sadly, I
don't think the Xerxes URL framework will currently support a ".ris"
format suffix, or actual HTTP content negotiation, but no big deal). You
should be able to use that endpoint in unapi, OR in a microdata-based
solution.
>>> Xerxes deals in MarcXML already internally (at least for the Metalib part of Xerxes, that's what Metalib returns), so if there isn't some way to get MarcXML out now, we can easily add it. We could make&format=marcxml return the original marcxml the way&format=xerxes returns the Xerxes XML representation.

Walker, David

unread,
May 9, 2012, 1:30:49 PM5/9/12
to xerxes...@googlegroups.com, Scot Dalton
Just wanted to point out that Xerxes 1.x is using HTML 4, and this is, in part, a function of us using XSLT as our view templates.

So the HTML 5 issue with unapi is not pressing for Xerxes.

I'm looking into using HTML 5 for Xerxes 2.x, but, again, XSLT presents some challenges in that regard.

--Dave

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845


-----Original Message-----
From: xerxes...@googlegroups.com [mailto:xerxes...@googlegroups.com] On Behalf Of Jonathan Rochkind
Sent: Wednesday, May 09, 2012 10:14 AM
To: xerxes...@googlegroups.com
Cc: Scot Dalton

Scot Dalton

unread,
May 9, 2012, 2:08:51 PM5/9/12
to Jonathan Rochkind, xerxes...@googlegroups.com
On Wed, May 9, 2012 at 1:13 PM, Jonathan Rochkind <roch...@jhu.edu> wrote:
> There are a couple possible ways to reproduce the functionality of unapi
> using pretty simple HTML5 microdata conventions. I have investigated it
> before. Zotero would have to be changed to recognize it though.
>
> I remembered I actually wrote up my proposal/thoughts on this on my blog,
> hooray, so here you go if you're interested:
>
> http://bibwild.wordpress.com/2011/09/22/alternate-format-microdata/
>
Will definitely look into this.


> I would recommend looking into changing Xerxes so _any_ record can be
> obtained in RIS format via a URL with, say, &format=ris in it. (Sadly, I
> don't think the Xerxes URL framework will currently support a ".ris" format
> suffix, or actual HTTP content negotiation, but no big deal). You should be
> able to use that endpoint in unapi, OR in a microdata-based solution.

Seems like the format=ris scheme is just a change in actions.xml for
metasearch/record. Adding
<view format="ris">xsl/ris.xsl</view> does the trick. Other formas
just require the corresponding xsl. Xerxes architecture FTW!

We should have this up on our dev server shortly.

Thanks,
Scot

P.S. Just want to acknowledge that Barnaby Alter is doing all the
coding on this. My PHP is rusty to say the least.

Scot Dalton

unread,
May 9, 2012, 2:37:17 PM5/9/12
to Jonathan Rochkind, xerxes...@googlegroups.com
So we got this working on our dev box.

The HTML that gets embedded on the record page is minimal[1].
Changes to our local actions.xml also were also straightforward[2].
We added an unapi php view[3] and an apache rewrite rule[4].

It seems to work. Zotero is grabbing the relevant metadata.

We'll probably roll this out in production in the next few days.

The only issue we see is with the unapi URL (/metasearch/unapi).
Since it's inheriting from MetasearchRecord, it needs a group,
resultSet and recordId. If those are missing, it throws an error. We
can change the inheritance to make it work more cleanly, but for now
it meets our needs.

We're happy to more formally share our code as well. Should we put
this in the Xerxes cookbook?

Thanks,
Scot

[1] http://pastie.org/3885520
[2] http://pastie.org/3885533
[3] http://pastie.org/3885537
[4] http://pastie.org/3885547

Walker, David

unread,
May 9, 2012, 2:41:41 PM5/9/12
to xerxes...@googlegroups.com, Jonathan Rochkind
I'd be happy to add it to the main code, if you'd like to send a patch.

--Dave

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845


-----Original Message-----

Scot Dalton

unread,
May 9, 2012, 3:02:38 PM5/9/12
to xerxes...@googlegroups.com
Sounds good. We don't use the folder functionality in Xerxes, but
we'll implement unapi for those records as well.

Thanks,
Scot

Walker, David

unread,
May 9, 2012, 3:04:07 PM5/9/12
to xerxes...@googlegroups.com
The folder includes an export to Zotero already, so there's no particular need to do this there.

It basically just spits out the RIS with appropriate header so that, if you are using FF, Zotero sees it an captures them. Endnote would, too, if you used that.

Barnaby Alter

unread,
May 10, 2012, 2:06:06 PM5/10/12
to xerxes...@googlegroups.com
Hi David,

Patch is attached. I created it off the latest trunk in Xerxes, so that's for Xerxes 1.x. I haven't delved into the 2.x code yet so can't currently create a patch for that.

unapi.patch

Walker, David

unread,
May 10, 2012, 2:07:07 PM5/10/12
to xerxes...@googlegroups.com
Thanks!

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845


-----Original Message-----
From: xerxes...@googlegroups.com [mailto:xerxes...@googlegroups.com] On Behalf Of Barnaby Alter
Sent: Thursday, May 10, 2012 11:06 AM
To: xerxes...@googlegroups.com
Subject: Re: [xerxes-portal] Xerxes and Zotero

Walker, David

unread,
May 17, 2012, 10:32:43 AM5/17/12
to xerxes...@googlegroups.com
Barnaby, can you give me a brief explanation for the changes to .htaccess in this commit?

I'd like to stay away from making changes to that file.

--Dave

-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845

-----Original Message-----
From: xerxes...@googlegroups.com [mailto:xerxes...@googlegroups.com] On Behalf Of Barnaby Alter
Sent: Thursday, May 10, 2012 11:06 AM
To: xerxes...@googlegroups.com
Subject: Re: [xerxes-portal] Xerxes and Zotero

Barnaby Alter

unread,
May 23, 2012, 1:16:20 PM5/23/12
to xerxes...@googlegroups.com
Hey David,

So, unapi expects an id= param appended to the script so it can generate a url with that id and a specified format (i.e. RIS). But for a given search in Xerxes there is a group, resultSet and startRecord which make up the id for a record in metasearch and a record= param in the folder record. I just used those htaccess lines to rewrite the given Xerxes schema for IDs to match what unapi expects. If you know something quick and dirty let me know, say something in actions.xml, otherwise I will look more into it, but the htaccess edits were my solution.

Thanks,

Barnaby Alter
Web Development
Division of Libraries
NYU

Reply all
Reply to author
Forward
0 new messages