requestDocument

132 views
Skip to first unread message

Marc Lajoie

unread,
Nov 7, 2023, 3:44:20 PM11/7/23
to zotero-dev
In the process of creating a new translator, I have the following problem: 

In the doWeb function for multiple, I am passing an url to the function requestDocument

A typical url would be for instance : "https://www.legisquebec.gouv.qc.ca/fr/document/lc/A-1"

The url with the multiple would be for instance :
"https://www.legisquebec.gouv.qc.ca/en/chapters?corpus=statutes"

After selecting an items, the function scrape (doc, url) is not using "https://www.legisquebec.gouv.qc.ca/fr/document/lc/A-1" but "https://www.legisquebec.gouv.qc.ca/en/chapters?corpus=statutes ». I want "https://www.legisquebec.gouv.qc.ca/fr/document/lc/A-1"

Any suggestions ?
Here is my code : 
function detectWeb(doc, url) {
    // TODO: adjust the logic here
    if (url.includes('document')) {
        return 'statute';
    }
    else if (getSearchResults(doc, true)) {
        return 'multiple';
    }
    return false;
}

function getSearchResults(doc, checkOnly) {
    var items = {};
    var found = false;
    // TODO: adjust the CSS selector
    var rows = doc.querySelectorAll('tr.clickable td a[href]');
    for (let row of rows) {
        // TODO: check and maybe adjust
        let href = row.href;
        // TODO: check and maybe adjust
        let title = ZU.trimInternal(row.textContent);
        if (!href || !title) continue;
        if (checkOnly) return true;
        found = true;
        items[href] = title;
    }
    return found ? items : false;
}

async function doWeb(doc, url) {
    if (detectWeb(doc, url) == 'multiple') {
        let items = await Zotero.selectItems(getSearchResults(doc, false));
        Zotero.debug('items:' + JSON.stringify(items));
        if (!items) return;

        for (let url of Object.keys(items)) {
            Zotero.debug('url:' + url);
            await scrape(await requestDocument(url));
        }
    }
    else {
        await scrape(doc, url);
    }
}

async function scrape(doc, url = doc.location.href) {
    var newItem = new Zotero.Item("statute");
    var statuteTitle = text(doc, 'title');
    Zotero.debug('statuteTitle' + statuteTitle);
    newItem.title = statuteTitle;
    newItem.complete();

    // TODO: add other zotero newItems
}

Abe Jellinek

unread,
Nov 7, 2023, 3:49:59 PM11/7/23
to zoter...@googlegroups.com
I tried to reproduce this by doing the following steps:


The output I’m getting is this:

15:48:38 Running doWeb
15:48:40 statuteTitleLégis Québec

It looks like the url parameter has the right value. If you’re getting something different, you’ll need to be more specific about the environment you’re running in - Scaffold? - and where you’re seeing the incorrect value.

--
You received this message because you are subscribed to the Google Groups "zotero-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zotero-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zotero-dev/ec76035a-62f6-4f2d-98f1-c66c9a8a4320n%40googlegroups.com.

Abe Jellinek

unread,
Nov 7, 2023, 3:52:54 PM11/7/23
to zoter...@googlegroups.com
(Sorry, that should have been “Add Z.debug('url in scrape: ' + url) to the top of the scrape() function”.)

Marc Lajoie

unread,
Nov 7, 2023, 4:02:48 PM11/7/23
to zotero-dev
The title should be : "Bees Act" not Légis Québec which is the title of https://www.legisquebec.gouv.qc.ca/en/chapters?corpus=statutes 

Abe Jellinek

unread,
Nov 7, 2023, 4:09:11 PM11/7/23
to zoter...@googlegroups.com
The title of the document at https://www.legisquebec.gouv.qc.ca/en/document/cs/A-1 on load is actually "Légis Québec”. It’s updated via JavaScript at some point after the page is loaded in the browser. requestDocument() doesn’t run JavaScript on the page.

It looks like “Bees Act” is stored in <meta name="dc.title" content=" - Bees Act">, as well as in various other locations on the page that you could access with stable-seeming CSS selectors. Make sure you inspect the page by using View Source (usually Cmd/Ctrl-U) in your browser and then reloading the source view - any changes made to the page after load will be visible in Inspect Element, and possibly in the pre-reload source view, but won’t be accessible to your translator.

Marc Lajoie

unread,
Nov 7, 2023, 4:17:56 PM11/7/23
to zotero-dev
Got it ! Thank you. I did not account for onload.   
Reply all
Reply to author
Forward
0 new messages