CDP - how to determine that a page is fully loaded before requesting Chrome to render PDF?

325 views
Skip to first unread message

ziggy

unread,
Mar 2, 2021, 10:43:55 PM3/2/21
to Chromium-dev

This post is  related to other posts below.

https://groups.google.com/a/chromium.org/g/chromium-dev/c/LXZQz6UpVZI

https://groups.google.com/a/chromium.org/g/headless-dev/c/KW0UwYcrEb0

https://groups.google.com/a/chromium.org/g/headless-dev/c/rxoUQ4S9jpA

My goal is to have ability to customize header and footer when rendering to PDF. Right now I am experimenting with  c++ solution to connect to remote debugging port and implement rendering to PDF.

I am looking to port to c++ the below function implemented by https://github.com/Szpadel/chrome-headless-render-pdf tool that uses remote-debugging-interface.

async renderPdf(url, options) {
        const client = await CDP({host: this.host, port: this.port});
        this.log(`Opening ${url}`);
        const {Page, Emulation, LayerTree} = client;
        await Page.enable();
        await LayerTree.enable();

        const loaded = this.cbToPromise(Page.loadEventFired);
        const jsDone = this.cbToPromise(Emulation.virtualTimeBudgetExpired);

        await Page.navigate({url});
        await Emulation.setVirtualTimePolicy({policy: 'pauseIfNetworkFetchesPending', budget: 5000});

        await this.profileScope('Wait for load', async () => {
            await loaded;
        });

        await this.profileScope('Wait for js execution', async () => {
            await jsDone;
        });

        await this.profileScope('Wait for animations', async () => {
            await new Promise((resolve) => {
                setTimeout(resolve, 5000); // max waiting time
                let timeout = setTimeout(resolve, 100);
                LayerTree.layerPainted(() => {
                    clearTimeout(timeout);
                    timeout = setTimeout(resolve, 100);
                });
            });
        });

        const pdf = await Page.printToPDF(options);
        const buff = Buffer.from(pdf.data, 'base64');
        client.close();
        return buff;
    }

In order to render page to PDF, solution must navigate to page, determine that the page content was loaded and then request Chrome to render page to PDF.

I am looking for the official state machine diagrams, interaction diagrams that I can follow to reliably determine "page fully loaded" event. Can anybody point me where I can find such information?

In the meantime I used https://github.com/Szpadel/chrome-headless-render-pdf tool to capture traffic  between client and Chrome using  wireshark and https://github.com/wendigo/chrome-protocol-proxy . See the attached files with message exchanges. Cat connection* files to git bash or Powershell window to see file content in colors as shown by Capture.png file.

The above function seems to monitor following 3 events in order to determine the page loaded complete event/status.

        const loaded = this.cbToPromise(Page.loadEventFired);
        const jsDone = this.cbToPromise(Emulation.virtualTimeBudgetExpired);
        LayerTree.layerPainted(() => { ....

The Page.loadEventFired event is fairly obvious.

Emulation.virtualTimeBudgetExpired - don't fully understand the context

LayerTree.layerPainted - it seems to be an internal event based on LayerTree.layerTreeDidChange and LayerTree.layerPainted events from Chrome.

I am looking for comments on the above and/or directions on how to reliably determine "page fully loaded" event. Chrome doesn't seem to generate such event and it appears it is the client responsibility.

Capture.PNG
connection-file.log
connection-google.log
wireshark.txt
connection-bbcnews.log
Reply all
Reply to author
Forward
0 new messages