New translator - Espacenet

14 views

Skip to first unread message

GillesP

unread,

Jul 7, 2008, 3:17:32 AM7/7/08

to zotero-dev

Hi,

You can find my first translator for Zotero here :
http://groups.google.com/group/zotero-dev/web/TradZotero_espacenet.txt

This is for the espacenet site ( http://ep.espacenet.com/ ) where we
can find european patent and more.
Criticism or comments are welcome.

Hope it will be useful.

Gilles

acrymble

unread,

Jul 16, 2008, 2:20:41 PM7/16/08

to zotero-dev

Nicely done Gilles,

If you decide to do any more translators in the future, you can allow
users to grab data from a search results page as well.

All I had to do to add this to your translator was change your doWeb
function to:

[CODE]
//Making a multi-entry page Zotero function

var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == 'x') return namespace; else return null;
} : null;

var articles = new Array();

if (detectWeb(doc, url) == "multiple") {
var items = new Object();

var titles = doc.evaluate('//td[3]/strong/a', doc, nsResolver,
XPathResult.ANY_TYPE, null);

var next_title;
while (next_title = titles.iterateNext()) {
items[next_title.href] = next_title.textContent;
}
items = Zotero.selectItems(items);
for (var i in items) {
articles.push(i);
}
} else {
articles = [url];
}
Zotero.Utilities.processDocuments(articles, scrape, function()
{Zotero.done();});
Zotero.wait();
[/CODE]

The XPath I'm using is the XPath for the links to articles on the
search page. This chunk of code will work for any search results page
that gives a list of links to individual articles so you can pretty
much cut and paste it into your own translators. All you will have to
change is the XPath so that it reflects the page you are working with.

The only things you will have to watch out for with this code are:

1) if the search results page and the individual entry page have
different domains (ie, one is www.search.abcdefg.com and the other is
www.abcdefg.com) it will not work.
2) if the page lists the title and a link (that is not embedded in the
title - but is something else such as "view citation" or "see record")
you will need 2 XPaths instead of one. One for the title, one for the
link. If this is the case, insert this piece of code instead of the
above while loop:

[CODE]
var titles = doc.evaluate('//td[3]/strong/a', doc, nsResolver,
XPathResult.ANY_TYPE, null);
var links = doc.evaluate('//td[3]/strong/a', doc, nsResolver,
XPathResult.ANY_TYPE, null);

var next_title;
while (next_title = titles.iterateNext()) {
items[next_links.iterateNext().href] = next_title.textContent;
}

[/CODE]

Reply all

Reply to author

Forward

0 new messages