The tutorial at http://dev.zotero.org/scaffoldtutorial is a bit better
developed now, but I've hit some stumbling blocks due to my being a
total javascript novice, and not knowing where to look for
documentation.
It's taken me 1/2 a day to get this far by myself, a bit of help would
greatly speed me up :S
The section I'm having trouble with is here:
http://dev.zotero.org/scaffoldtutorial#building_xpath_expressions_to_get_at_the_data
(alternative: http://xrl.us/7ggf)
Or pasted below in wiki format:
<code>
function doWeb(doc, url) {
scrape(doc,url);
}
function scrape(doc,url) {
var xpath = '//table[@class="whiteTable"]/tbody/
tr[@class="tableBodyWhite"]/td';
var thing = Zotero.Utilities.cleanString(doc.evaluate(xpath, doc,
null, XPathResult.ANY_TYPE, null).iterateNext().textContent);
Zotero.debug(thing);
}
</code>
This is cargo-culted from the [[http://stuff.co.nz]] translator and
results in following output in the Scaffold debug window:
<code>
===>You have requested to view the document below. Please select from
the following options:<===(string)
</code>
This is the code just before the download links. adding a "/a" at the
end of the above xpath expression will get us the "View" html link,
but there's another a href="" tag I don't know how to get at
containing the pdf.
I also don't know where to find the documentation for the doc.evaluate
calls in the scrape function.
Here's the correct xpath expression to get hold of the bibliographic
data in a single string:
<code>
var xpath = '//table/tbody/tr[5]/td[3]/div/table/tbody/tr/td/div/
table[3]/tbody/tr[6]/td';
</code>
But again, I"m currently at a bit of a loss what to do next.
If someone can point me to documentation or provide some concrete
advice I'd be most grateful.
The href is an attribute of the <a> tag, so you get it by adding
/@href to the end of your xpath string.
However, just doing that with the string you have now will give you
the href of about 13 links, not just the PDF href I assume you want.
"//tr[4]//a[contains(.,'View PDF')]/@href" which means, get the href
from the <a> tag that brackets text containing the string View PDF,
which is in the fourth <tr> (I need to specify the fourth tr, because
there are two View PDF links on the page, and I just want one href).
Note that the href is a relative link, not an absolute one, so you may
need to deal with that.
A very valuable tool for building xpaths is Xpath Checker[1].
[1] https://addons.mozilla.org/en-US/firefox/addon/1095
-Forest
--
931.210.3610
Frybrid
1218 10th Ave
Seattle, WA 98122
http://frybrid.com
> However, just doing that with the string you have now will give you
> the href of about 13 links, not just the PDF href I assume you want.
> Instead, use "//tr[4]//a[contains(.,'View PDF')]/@href" which means, get the href
On Oct 17, 1:56 pm, "Forest Gregg" <fgr...@googlemail.com> wrote:
> Let me ungarble:
>
> > However, just doing that with the string you have now will give you
> > the href of about 13 links, not just the PDF href I assume you want.
> > Instead, use "//tr[4]//a[contains(.,'View PDF')]/@href" which means, get the href
> > from the <a> tag that brackets text containing the string View PDF,
> > which is in the fourth <tr> (I need to specify the fourth tr, because
> > there are two View PDF links on the page, and I just want one href).
>
Aah thanks for that. I understand now. Firebug has a "copy xpath"
option which I used for the second example in the document. I'll
ditch the web developer recommendation for firebug when I get back to
it.
I'd still be extremely grateful for pointers as to where I can find
documentation for methods that I'm likely to use when developing
Zotero translators.
Thanks for your work on this. Some notes:
- Try Solvent from the SIMILE project
(http://simile.mit.edu/wiki/Solvent) for grabbing XPaths. (We suggest
this on the Translator Overview page.)
- The tutorial should be created at scaffold_tutorial rather than
scaffoldtutorial. (Just cut and paste the wiki markup, and save the old
page as empty to delete it.)
- Disabling extensions.zotero.cacheTranslatorData in about:config will
let you avoid restarting. We can probably change Scaffold to have it
refresh on insert regardless of the setting.
- Viewing debug output: http://www.zotero.org/documentation/debug_output
- Firebug is a great too for web development, but it isn't really useful
for getting under the hood of Zotero. You need Venkman
(http://www.mozilla.org/projects/venkman/) for that.