Developing a doi resolver plugin

John Didion

unread,

Dec 30, 2015, 12:36:08 PM12/30/15

to zotero-dev

I would like to develop a plugin to the Zotero Firefox client that, given an item that lacks a PDF, resolves the item's DOI, fetches the associated PDF, and attaches it to the item. From my limited understanding of the Zotero codebase, it looks like translators automatically create new items. Is there any way to use the existing translator framework to return the located PDF, or to attach it to an existing item rather than creating a new one?

Emiliano Heyns

unread,

Dec 31, 2015, 10:47:40 AM12/31/15

to zotero-dev

You don't want to use the translator infrastructure for that. The translators cannot change existing references. If your aim is to use doi.org to resolve to the place the article actually lives and then kick off an import, this is in principle possible but fragile; you'd have to recognize the new entries (which I suppose should be possible by marking them after importing them using a sub translator) and then either auto-merge the new and the old references, or move the new attachments (if any) to the old ones, and then delete the new ones.

Not impossible. But tricky and fragile. A more stable approach would be to create your own sandbox to run the translators in, but that would be non trivial.

John Didion

unread,

Jan 1, 2016, 1:26:03 PM1/1/16

to zotero-dev

Thanks Emiliano! I agree that creating duplicate items is not ideal, but I am fine with it if it's the only solution. My initial approach was going to be as follows:

1. Use the CrossRef translator to obtain a URL for the item's DOI
2. Use the translator infrastructure to match the article URL and kick off the automatic import, which will create a new item
3. Use Zotero.Items.merge to merge the new item with the existing one.

However, when I call Zotero.loadTranslator("search") I get an error saying "loadTranslator" is not a function. Does this mean it's not possible to access translators from non-translator plugins? Or am I just going about it incorrectly?

John Didion

unread,

Jan 1, 2016, 3:56:15 PM1/1/16

to zotero-dev

I've figured out how to do the DOI lookup and that's working well. Now the the challenge is to open the target page and scrape it. I discovered Zotero.Browser.createHiddenBrowser(). It would be ideal if I could load the page in the hidden browser, run the scraping there, and then merge the new item with the existing one when it's done. The code I have so far is below, but it's failing because Zotero_Browser._getTabObject is not exposed. Is there another way to do what I want?

        var browser = Zotero.Browser.createHiddenBrowser();

        notifierCallback = {
            notify: function(event, type, ids, extraData) {
                var resolved = false;

                if (event == 'add') {
                    var items = Zotero.Items.get(ids);
                    for (i = 0; i < items.length; i++) {
                        if ((doi && doi === items[i].getField("DOI")) ||
                                (url === items[i].getField("url")) {
                            resolved = true;
                            var newItems = [];
                            newItems.push(items[i]);
                            Zotero.Items.merge(item, newItems);
                            break;
                        }
                    }
                }

                if (resolved) {
                    Zotero.Notifier.unregisterObserver(notifierID);
              Zotero.Browser.deleteHiddenBrowser(browser);
              Zotero.doi2pdf.resolveNextItem();
                }
            }
        };

        var notifierID = Zotero.Notifier.registerObserver(notifierCallback, ['item']);

        var onpageshow = function() {
            var tab = Zotero_Browser._getTabObject(browser);
            var page = tab.getPageObject();
            if (page.translators && page.translators.length) {
                page.translate.setTranslator(page.translators[0]);
                Zotero_Browser.performTranslation(page.translate);
            }
        };

        browser.addEventListener("pageshow", onpageshow, false);

        browser.loadURI(url);

Emiliano Heyns

unread,

Jan 1, 2016, 6:27:30 PM1/1/16

to zotero-dev

On Friday, January 1, 2016 at 9:56:15 PM UTC+1, John Didion wrote:

I've figured out how to do the DOI lookup and that's working well. Now the the challenge is to open the target page and scrape it. I discovered Zotero.Browser.createHiddenBrowser(). It would be ideal if I could load the page in the hidden browser, run the scraping there, and then merge the new item with the existing one when it's done. The code I have so far is below, but it's failing because Zotero_Browser._getTabObject is not exposed. Is there another way to do what I want?

You indeed can't get at _getTabObject, but since it's just 9 lines of code, I'd just replicate it. The WeakMap it uses must be some kind of caching mechanism, which implies they're expensive to create. For your specific extension you could perhaps get away with creating one during startup and just holding on to it for re-use.

As far as page loading goes, you should just be able to set browser.contentDocument.location.

Mind that this part of Zotero I am not at all familiar with.

The rest of the code I can't really assess. I see a comparison against a non-declared variable "doi" in the add handler, so I'm guessing this is only partial code; same goes for "item". And beware that the merge will add things beyond just the attachments, it will also copy fields you just imported, possible obliterating data the user manually changed. You might be better off just moving the attachments the new reference has to the existing reference by calling setSource on them, and then moving the new references to the trash.

I think what you're trying to do is set up a temporary notifier which assumes everything added while it is active is scraped because your plugin kicked it off. That's actually a pretty neat idea, I hadn't though of that, but beware of race conditions.

Dan Stillman

unread,

Jan 4, 2016, 5:46:58 AM1/4/16

to zoter...@googlegroups.com

I can't help with the specifics here, but there shouldn't be any need to
save an extra item and merge it, and it'd be better for sync purposes to
avoid that. You should be able to just remove the existing handler that
saves the item and replace it with your own handler that does something
else with the extracted data. For an example, see the Scaffold code [1],
which just displays the data in a window. Other Scaffold code may also
be useful.

[1]
https://github.com/zotero/scaffold/blob/bf8fa8cdacd554cd291ff3b0f9ffc032d66a0337/src/chrome/content/scaffold/scaffold.js#L413

Reply all

Reply to author

Forward