Urgent translator fix needed

13 views
Skip to first unread message

Avram Lyon

unread,
Aug 25, 2010, 7:01:21 PM8/25/10
to zotero-dev
Dear Zotero people,

As the academic season gets underway in much of the world, there are
some alarmed users who are concerned that they won't be able to
effectively demonstrate Zotero for their new students because of some
broken translators; it seems that databases tend to put off their
summer redesigns until late summer, putting us in a bind as we try to
fix things before universities reopen.

One particularly concerning translator is EBSCO (see
http://forums.zotero.org/discussion/3601/proquest-ebscohost-broken/);
it is a fairly complicated translator as it currently stands, and it
is broken. A fix is not going to be a one-liner; it may require
something closer to, but not quite, a rewrite. I can't do that for at
least a week, since I'm about to move countries.

So I'm asking you, Zoterans, if you can help out. It would be very
good for our Zotero evangelists in the trenches if we could have this
high-profile translator working ASAP. If you can help, please post in
the discussion I linked to or respond to this email.

Best wishes,

Avram

Bruce D'Arcus

unread,
Aug 25, 2010, 7:05:59 PM8/25/10
to zoter...@googlegroups.com

Do we have any contacts at EBSCO such that we might talk to them about avoiding this in the future? Maybe convince them to adopt a standard like RDFa?

> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>

odie5533

unread,
Aug 25, 2010, 7:36:31 PM8/25/10
to zotero-dev
Which translator specifically is broken, the ProQuest.js or
EBSCOhost.js? I tried writing unit tests for my translators, but using
xpcom is far from trivial (I gave up).

--
odie5533
> zotero-dev+...@googlegroups.com<zotero-dev%2Bunsu...@googlegroups.com>
> .> For more options, visit this group at
>
> http://groups.google.com/group/zotero-dev?hl=en.
>
>

Avram Lyon

unread,
Aug 25, 2010, 7:44:27 PM8/25/10
to zotero-dev
It is EBSCOhost.js that is broken. The discussion on the forums is a
revival of an old thread that initially included ProQuest. As far as I
know, ProQuest is currently fine.

I also have worked on running unit tests and also quickly gave up.
It's a little off-topic, but I think that unit tests would be most
easily added as a feature of Scaffold, since it is already talking to
all the bits of Zotero that you would need for unit testing.

Avram

2010/8/26 odie5533 <odie...@gmail.com>:

> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.

Dan Stillman

unread,
Aug 25, 2010, 8:09:51 PM8/25/10
to zoter...@googlegroups.com
On 8/25/10 7:05 PM, Bruce D'Arcus wrote:
>
> Do we have any contacts at EBSCO such that we might talk to them about
> avoiding this in the future? Maybe convince them to adopt a standard
> like RDFa?
>

Trevor's already in talks with them (after they contacted us via private
e-mail after the recent changes) and there may be some movement on at
the very least embedding RIS links that won't break, but I suspect such
changes won't happen soon enough for the current breakage. He may be
able to provide more details, though.

Frank Bennett

unread,
Aug 26, 2010, 10:04:34 AM8/26/10
to zotero-dev
I could take a look early next week, but until then I'm pretty well
occupied as well, unfortunately. Rintze and I did a little checking
earlier today, and it looks like once you call the Export link, the
link to the metadata that is returned from the server (or can be
derived) is unrestricted. You need to make that call apparently, to
unlock the page, but after that you can just toss the derived link at
the RIS/EndNote translator, for individual pages at least.



Frank

Bruce D'Arcus

unread,
Aug 26, 2010, 10:07:36 AM8/26/10
to zoter...@googlegroups.com
On Wed, Aug 25, 2010 at 8:09 PM, Dan Stillman <dsti...@zotero.org> wrote:
>
>  On 8/25/10 7:05 PM, Bruce D'Arcus wrote:
>>
>> Do we have any contacts at EBSCO such that we might talk to them about avoiding this in the future? Maybe convince them to adopt a standard like RDFa?
>>
>
> Trevor's already in talks with them (after they contacted us via private e-mail after the recent changes) and there may be some movement on at the very least embedding RIS links that won't break ...

 That'd work too. But this screen-scraping business is just stone-age.

Bruce

Frank Bennett

unread,
Aug 26, 2010, 5:15:18 PM8/26/10
to zotero-dev
On Aug 26, 11:07 pm, "Bruce D'Arcus" <bdar...@gmail.com> wrote:
Do disagreement from this corner.

I've taken a closer look and poked at their server a bit. I'm afraid
that this one is beyond me. While there's no question that once
unlocked, a link to the RIS data works anywhere in the clear, the
critical step involves a server call driven by a little Javascript
function and a form that seems to be unavailable in the sandboxed
document (?). I've tried mimicking it with a straight post call, but
a critical variable (__EVENTTARGET) isn't getting through. It
probably doesn't make sense for me to spend any more time on this; it
needs the hand of someone with a better knowledge of HTTP, and (one
would hope) an assist on the API from the folks who run the server.

Oh, well. If EBSCOhost doesn't want to play, there is always JSTOR,
and PubMed and ...

Frank



>
> Bruce

Frank Bennett

unread,
Aug 26, 2010, 5:33:36 PM8/26/10
to zotero-dev
On Aug 27, 6:15 am, Frank Bennett <biercena...@gmail.com> wrote:
> On Aug 26, 11:07 pm, "Bruce D'Arcus" <bdar...@gmail.com> wrote:
>
> > On Wed, Aug 25, 2010 at 8:09 PM, Dan Stillman <dstill...@zotero.org> wrote:
>
> > >  On 8/25/10 7:05 PM, Bruce D'Arcus wrote:
>
> > >> Do we have any contacts at EBSCO such that we might talk to them about avoiding this in the future? Maybe convince them to adopt a standard like RDFa?
>
> > > Trevor's already in talks with them (after they contacted us via private e-mail after the recent changes) and there may be some movement on at the very least embedding RIS links that won't break ...
>
> >  That'd work too. But this screen-scraping business is just stone-age.
>
> Do disagreement from this corner.

(In my earlier message, "Do disagreement" was meant to be "No
disagreement", of course. Must learn to spell one day. Sorry for the
extra traffic.)

Josh (EP)

unread,
Aug 31, 2010, 4:56:26 PM8/31/10
to zotero-dev
Just to introduce myself, my name is Josh Geller, and I am manager of
EBSCOhost Application Development, responsible for the releases of
EBSCOhost that have been problematic to Zotero in the past. It was not
our intent to break Zotero, and I would like to help insure Zotero
does not break in the future. I am offering my time to give support on
matters concerning the interface questions, missing forms, looking
through our source code for answers, etc.

Bruce D'Arcus

unread,
Aug 31, 2010, 7:05:20 PM8/31/10
to zoter...@googlegroups.com
Hi Josh,

On Tue, Aug 31, 2010 at 4:56 PM, Josh (EP) <jge...@ebscohost.com> wrote:

> Just to introduce myself, my name is Josh Geller, and I am manager of
> EBSCOhost Application Development, responsible for the releases of
> EBSCOhost that have been problematic to Zotero in the past. It was not
> our intent to break Zotero, and I would like to help insure Zotero
> does not break in the future. I am offering my time to give support on
> matters concerning the interface questions, missing forms, looking
> through our source code for answers, etc.

Notwithstanding the immediate issues you're offering to help with, do
you have any thoughts about how to avoiding future breakage?

Bruce

Josh (EP)

unread,
Aug 31, 2010, 8:48:20 PM8/31/10
to zotero-dev
On Aug 31, 7:05 pm, "Bruce D'Arcus" <bdar...@gmail.com> wrote:
> Hi Josh,
>
Two ideas:

First would be for me to completely understand how Zotero is
interacting with the EBSCOhost application. That will allow me to know
if our changes will break your client. Help from this list in where to
start would be good. I have checked out a copy of the 2.0 branch on my
PC and am starting to investigate how to debug Firefox extensions.

Second is to open up our beta testing to the some of the developers in
your group. We do not have an open beta at the moment, but when we do
you will be able to confirm that all features are working correctly
before the rest of the world does.

Hopefully, approaching the problem from both sides will ensure our
users can do their work uninterrupted.

odie5533

unread,
Aug 31, 2010, 9:17:12 PM8/31/10
to zotero-dev
Hello Josh!
Zotero generally uses a combination of XPaths and regular expressions
on the DOM or HTML respectively. It also includes a generic COinS
parser. If you take a look in your Firefox profile folder (somewhere
around %AppData%/Local/Mozilla Firefox/.../) you will find a zotero
folder with a translators folder inside of it. The file in question is
EBSCOhost.js. I am told it no longer works, though my university's
proxy hasn't worked at all this semester so I can't access EBSCOhost
to check/fix the translator myself.

I see two possibilities here:
1. Rather than working on a quick-fix on the Zotero side, EBSCOhost
could offer metadata in the COinS format or a similar format which
someone else may know of. This would be picked up by the generic
parser in Zotero and likely make it easier for other scraping projects
to read from EBSCOhost. The problem here is a Zotero translator will
be needed anyways to download the PDFs or grab multiple citations (I
think). This requires EBSCOhost putting in hours to update, test, and
deploy an addition to the backend.
2. Just update the translator in Zotero. If both methods require an
updated translator to handle PDF downloads anyway it might not be very
useful to bother with an EBSCOhost COinS implementation.

To stop future breakage is simple: send an HTML file from a EBSCOhost
development server to the Zotero team and allow time for the
translator update to be created before deploying the update to the
main servers. This may require multiple HTML files which could be
swapped for early access to the development site. In other industries
it's common to send out a prototype of, as an example, a new game
system so that developers can begin creating games before the system
hits the shelves. Even in the website industry APIs are sometimes
opened before deployment to allow other developers to interact with
the site.
--
Best regards,
odie5533

Bruce D'Arcus

unread,
Aug 31, 2010, 9:25:59 PM8/31/10
to zoter...@googlegroups.com

So Dan Stillman is really the guy to get authoritative answers from
the Zotero end on, but echoing a bit of what odle suggests, it'd be
really nice as a general proposition to move away from requiring
screen scraping (which is what I presume is breaking here?) and to
either embed or link to more consistently structured representations.

COins is certainly one option that might be adequate for the narrow
range of content that EBSCOhost is dealing with (journal and news
articles?), but RDFa is another option for a more generic approach. I
believe someone in this thread also mentioned linking to RIS
representations as an option.

Bruce

Bruce D'Arcus

unread,
Aug 31, 2010, 9:30:48 PM8/31/10
to zoter...@googlegroups.com
Oh, and ...

On Tue, Aug 31, 2010 at 9:25 PM, Bruce D'Arcus <bda...@gmail.com> wrote:

...

> COins is certainly one option that might be adequate for the narrow
> range of content that EBSCOhost is dealing with (journal and news
> articles?), but RDFa is another option for a more generic approach.

If you're interested in RDFa, I could help if you have any questions.
I and some of the Zotero people have helped develop the BIBO RDF
ontology, which could be represented as RDFa. So could PRISM, which is
more common in the publishing industry. Both of them, in any case,
reuse a fair bit of Dublin Core.

Bruce

Dan Stillman

unread,
Aug 31, 2010, 10:26:10 PM8/31/10
to zoter...@googlegroups.com

Yeah, using a standard auto-discovered format would clearly be best
going forward. Ideally EBSCO wouldn't have to know anything about Zotero
and we wouldn't have to test EBSCO every time a change was made, because
everything would just work.

I'm not sure which of the auto-discovered formats we currently support
actually support saving of files, though. COinS does not, I don't
believe, though, as I've noted before, we could probably just decide on
a mechanism and add support for it, and then try to get it added to the
spec (and/or just get other COinS implementers to use it).

In any case, it'd be really nice to avoid RIS if at all possible, both
because it's lossy and because it's not auto-discovered.

- Dan

Josh (EP)

unread,
Sep 1, 2010, 11:20:41 AM9/1/10
to zotero-dev
I like the idea of having a short-term and long term solution. I am
reading up on some the documentation here:
http://www.zotero.org/support/dev/making_coins
OpenURL is something we have familiarity with here and could be a
possibility. I have read up on RDFa here:
http://www.w3.org/TR/xhtml-rdfa-primer/
and it also looks promising. These both feel like long term solutions
to me.

I will take a look at the ebscohost.js next and see what it is doing.
Fixing the translator will need to be the short-term solution.

Bruce D'Arcus

unread,
Sep 1, 2010, 11:27:45 AM9/1/10
to zoter...@googlegroups.com

Yes, that's reasonable.

So this is a related side issue ....

Maybe we can get zotero.org to actually implement the RDFa on its site
so as to demonstrate.

Dan, is that on the roadmap?

Bruce

Richard Karnesky

unread,
Sep 1, 2010, 4:28:48 PM9/1/10
to zotero-dev
On Aug 31, 7:26 pm, Dan Stillman <dstill...@zotero.org> wrote:
>
> I'm not sure which of the auto-discovered formats we currently support
> actually support saving of files, though.
> ....
> In any case, it'd be really nice to avoid RIS if at all possible, both
> because it's lossy and because it's not auto-discovered.

I'll give a plug for UnAPI.

This should work with any format that has attachment support in Zotero
(MODS XML, RDF, Refer/BibIX, BibTeX, RIS). UnAPI already has
reference implementations & implementation shouldn't be too hard (but
I will say that if COiNS is a "long term solution", unAPI might be one
too). It can be used with the RIS or BibTeX that EBSCO already
generates. When/If Ebsco migrates to a richer standard XML-based
export format, Zotero will immediately give those formats preference
over the "legacy" formats.


> COinS does not, I don't
> believe, though, as I've noted before, we could probably just decide on
> a mechanism and add support for it, and then try to get it added to the
> spec (and/or just get other COinS implementers to use it).

COinS would need more than just attachment support for it to store
rich metadata. Why bother?

The only advantages of OpenURL that I see are that it is "easy", it
can be used for static pages, and it is supported by other programs.
It seems like embedded RDF may gain the first two advantages given
better examples and documentation.

--Rick

Josh (EP)

unread,
Sep 11, 2010, 10:47:26 PM9/11/10
to zotero-dev
Here's the patch. Let me know if there is a better way to submit it.

Index: EBSCOhost.js
===================================================================
--- EBSCOhost.js (revision 6677)
+++ EBSCOhost.js (working copy)
@@ -54,8 +54,8 @@



- var xpath = '//
input[@id="ctl00_ctl00_MainContentArea_MainContentArea_topDeliveryControl_deliveryButtonControl_lnkExportImage"]';
- var persistentLink = doc.evaluate(xpath, doc, nsResolver,
XPathResult.ANY_TYPE, null).iterateNext();
+ var xpath = '//
input[@id="ctl00_ctl00_Column2_Column2_topDeliveryControl_deliveryButtonControl_lnkExport"]';
+ var persistentLink = doc.evaluate(xpath, doc, nsResolver,
XPathResult.ANY_TYPE, null);
if(persistentLink) {
return "journalArticle";
}
@@ -73,17 +73,16 @@
var hiddenInput;
var deliverString ="";
while(hiddenInput = hiddenInputs.iterateNext()) {
- deliverString = deliverString+hiddenInput.name.replace(/\$/g, "%24")
+"="+encodeURIComponent(hiddenInput.value) + "&";
+ if (hiddenInput.name !== "__EVENTTARGET" && hiddenInput.name !==
"") {
+ deliverString = deliverString+hiddenInput.name.replace(/\$/g,
"%24")+"="+encodeURIComponent(hiddenInput.value) + "&";
+ }
}
var otherHiddenInputs = doc.evaluate('//input[@type="hidden" and
contains(@name, "folderHas")]', doc, nsResolver, XPathResult.ANY_TYPE,
null);
while(hiddenInput = otherHiddenInputs.iterateNext()) {
deliverString = deliverString+hiddenInput.name.replace(/\$/g, "%24")
+"="+escape(hiddenInput.value).replace(/\//g, "%2F").replace(/%20/g,
"+") + "&";
}

-
- deliverString = deliverString
- +"&ctl00%24ctl00%24MainContentArea%24MainContentArea
%24topDeliveryControl%24deliveryButtonControl%24lnkExportImage.x=5"
- +"&ctl00%24ctl00%24MainContentArea%24MainContentArea
%24topDeliveryControl%24deliveryButtonControl%24lnkExportImage.y=14";
+ deliverString =
"__EVENTTARGET=ctl00%24ctl00%24Column2%24Column2%24topDeliveryControl
%24deliveryButtonControl%24lnkExport&" + deliverString;

return deliverString;
}
@@ -145,7 +144,7 @@
XPathResult.ANY_TYPE,
null).iterateNext();

if(searchResult) {
- var titlex = '//a[@class = "title-link"]';
+ var titlex = '//a[@class = "title-link color-p4"]';
var titles = doc.evaluate(titlex, doc, nsResolver,
XPathResult.ANY_TYPE, null);
var items = new Object();
var title;


On Sep 1, 11:27 am, "Bruce D'Arcus" <bdar...@gmail.com> wrote:

Dan Stillman

unread,
Sep 12, 2010, 4:27:27 PM9/12/10
to zoter...@googlegroups.com
On 9/11/10 10:47 PM, Josh (EP) wrote:
> Here's the patch. Let me know if there is a better way to submit it.
>
> Index: EBSCOhost.js

I've applied this on the trunk and the 2.0 Branch. I had to apply the
changes manually (uploading a new version to the Files section of this
group would be the better way to submit it, since e-mail adds newlines),
so it'd be great if a few people could test this, and then I'll push it
to users. It works for me.

You can test by either updating to the latest trunk or 2.0 Branch build
or by copying
https://www.zotero.org/svn/extension/trunk/translators/EBSCOhost.js to
the 'translators' directory in your Zotero data directory.

Thanks very much to Josh for the patch.

- Dan

Dan Stillman

unread,
Sep 12, 2010, 4:34:36 PM9/12/10
to zoter...@googlegroups.com

Actually I decided just to push it, given the number of users waiting
for a fix. Testing still appreciated, but downloading from the above
link is no longer necessary. You can update manually by clicking Update
Now in the General pane of the Zotero prefs. (Otherwise your copy of
Zotero should auto-update within 24 hours.)

Josh (EP)

unread,
Sep 13, 2010, 7:48:11 AM9/13/10
to zotero-dev
After looking at the translator code I am wondering why this hack is
in there:
//This is a hack, generateDeliveryString is acting up for single
pages, but it works on the plink url
where a persistent link is created to start a new session. I put
together the code that uses the same session, and it seems to work
fine.

Looks like it was added here:
https://www.zotero.org/trac/changeset/4515

Any history as to why the plink route was added?

Dan Stillman

unread,
Sep 14, 2010, 4:13:10 AM9/14/10
to zoter...@googlegroups.com
On 9/13/10 7:48 AM, Josh (EP) wrote:
> After looking at the translator code I am wondering why this hack is
> in there:
> //This is a hack, generateDeliveryString is acting up for single
> pages, but it works on the plink url
> where a persistent link is created to start a new session. I put
> together the code that uses the same session, and it seems to work
> fine.
>
> Looks like it was added here:
> https://www.zotero.org/trac/changeset/4515
>
> Any history as to why the plink route was added?

I couldn't tell you (and mcburton hasn't been around in a while), but I
imagine you have a better idea as to what's necessary to make it work,
so you should feel free to make any changes you see fit.

Also, some comments from a user:

http://forums.zotero.org/discussion/3601/proquest-ebscohost-broken/#Item_47

Most notably, PDF saving isn't working. If there's any way to fix that
it'd be great, since that's one of the main advantages of
translator-based saving.

- Dan

Reply all
Reply to author
Forward
0 new messages