Enhance print-to-pdf in headless mode to support all Page.printToPDF options supported by the devtools

304 views
Skip to first unread message

ziggy

unread,
Jan 16, 2021, 1:33:19 AM1/16/21
to headless-dev, headle...@chromium.org, esec...@chromium.org, skyo...@chromium.org
I am a developer on the Mbox Mail Viewer project, a free Windows application available on Github and Sourceforge to view mbox archive files such as Google Takeout archives. Users of Mbox Mail Viewer require the ability to print multiple mails to PDF without user interaction.  Currently, Mbox Viewer relies on the standard Google Chrome Browser to print mails to PDF files via command line option --print-to-pdf. However, many users complain about lack of ability to customize  PDF output, such as landscape, header and footer, etc. Browsing the Internet, I see that many other users raised the same issue in the past. Standard reply for this issue is to use the devtools. For my application and many other simple applications, use of the devtools is overkill, introducing unnecessary dependencies and risk. I understand that using the devtolols has many advantages for some deployments but for the simple use cases it is an unnecessary burden. It is much simpler to rely on the standard browser which is kept up to date automatically.

I am not exactly sure why the print-to-pdf option was not enhanced by now and  synchronized with Page.printToPDF. In my humble opinion the print-to-pdf should be enhanced or dropped, including possibly from devtools. If not dropped I don't see a good reason why it should not be enhanced since 98+% of code already exists. I did some prototyping (see below) to see what changes  to the code might be needed.

I added the new option --print-to-pdf-page-config="PrintPageConfigFile" to support customization of PDF. See below example of the json file.

I am hoping the OWNERS will seriously consider enhancing Chromium Browser and ultimately Google Browser to support customization of pdf output via command line option. Implementation effort is fairly small so I hope after  many years in limbo the feature will be prioritized, implemented and released in Google Browser

Below I described work I have done and issues I faced.

Thank You,
Zbigniew


+++++++++++++++++++++++++++++++++++
Building Chromium on Windows 10
+++++++++++++++++++++++++++++++++++

"Fetch --no-history chromium" command was failing consistently at libdavld and reporting:

0:09:44] Cloning into 'F:\Chrom\chromium\src\third_party\dav1d\_gclient_libdav1d_gue6lli1'...
[0:09:44] error: RPC failed; HTTP 400 curl 22 The requested URL returned error: 400

I tried 6 times without success. Each time I tried Fetch from scratch,  I had to delete 400,000+ files !!!.
I didn't see any information on the Chromium site that would help to recover from similar failures.

After the last Fetch failure, in desperation, I tried "gclient sync -D" and to my surprise it worked. Not sure this is a proper workaround but it seems to work.

Running "git status" showed two leftover directories:

F:\Chrom\chromium\src\third_party\dav1d
_gclient_gittmp_libdav1dugn6rfaj
_gclient_libdav1d_fwaisxxv

which I deleted.

Next, to reduce file system overhead I excluded build directories from antivirus Windows Defender software and run

gn gen out/Default
autoninja -C out\Default chrome
 
to build a browser. It took 6 hours to complete the build on my:

HP Zbook 15 G5, 4 physical cores/8 logical cores, 16GB RAM, 2.3Ghz, all SSD drives.

+++++++++++++++++++++++++++++++++++++++++++++++
Editing and Debugging under Visual Studio 19 IDE
+++++++++++++++++++++++++++++++++++++++++++++++++

I was going to use VS 19 to prototype and debug enhancements to the print-to-pdf, so I generated VS project files as follow:

gn gen --ide=vs out\Default

That resulted in over 9,000 project files. VS could not handle such a large number of projects reliably on 16GB RAM. I upgraded my machine to 48GB and reduced the number of generated projects to around 4,000 by running:

gn gen --ide=vs --filters=//chrome;//headless out\Default

Having more RAM and less projects really helped to make VS fairly stable (but not completely).

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adding command line option to support all Page.printToPDF options.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I added the new option --print-to-pdf-page-config="PrintPageConfigFile" to support customization of PDF output.

As I suspected, implementing such an option is fairly straightforward since 99+% of code already exists. I made the following changes  to 3 files:

+++++++++ headless_shell_switches.h, added

HEADLESS_EXPORT extern const char kPrintToPDFPageConfig[];

+++++++++++ eadless_shell_switches.cc, added

const char kPrintToPDFPageConfig[] = "print-to-pdf-page-config";

++++++++++++++++++ headless_shell.cc

std::unique_ptr<headless::page::PrintToPDFParams>  
ReadPageConfigParams(base::FilePath& pdf_page_config_file_name)
{
   .........
   // uses base::JSONReader::Read(json_text); for parsing json configuration
   
}

void HeadlessShell::PrintToPDF()
{
  DCHECK_CURRENTLY_ON(content::BrowserThread::UI);

// Begin of added code
  if (base::CommandLine::ForCurrentProcess()->HasSwitch( switches::kPrintToPDFPageConfig))
  {
    base::FilePath pdf_page_config_file_name =
        base::CommandLine::ForCurrentProcess()->GetSwitchValuePath(
            switches::kPrintToPDFPageConfig);

     // ++++++++++++++++
     // ReadPageConfigParams(pdf_page_config_file_name) function reads parameters from json file,
     // sets up and returns PrintToPDFParams
     //+++++++++++++
      devtools_client_->GetPage()->GetExperimental()->PrintToPDF(
        ReadPageConfigParams(pdf_page_config_file_name),
        base::BindOnce(&HeadlessShell::OnPDFCreated, weak_factory_.GetWeakPtr()));
  }
  else
 // End
  {
    bool display_header_footer =
        !base::CommandLine::ForCurrentProcess()->HasSwitch(
            switches::kPrintToPDFNoHeader);

    devtools_client_->GetPage()->GetExperimental()->PrintToPDF(
        page::PrintToPDFParams::Builder()
            .SetDisplayHeaderFooter(display_header_footer)
            .SetPrintBackground(true)
            .SetPreferCSSPageSize(true)
            .Build(),
        base::BindOnce(&HeadlessShell::OnPDFCreated,
                       weak_factory_.GetWeakPtr()));
  }
}

+++++++++  json print page configuration file


It took me several  iterations to figure out how to configure footerTemplate and headerTemplate that works and handle overflow of user provided text to be shown in the middle of the line. The devtools users may have better examples, appreciate it if you can post and share your examples.

NOTE: footer and header seems to be missing when printing some web pages except for the footer on the last printed page. Try to print  https://sourceforge.net/ .

{
    "landscape": false,                /* default = false */
    "displayHeaderFooter": true,    /* default = true */
    "printBackground": true,        /* default = true */
    "scale": 1.0,                    /* default = 1.0 */
    "paperWidth": 8.5,                /* default = 8.5 inches */
    "paperHeight": 11.0,            /* default = 11.0 inches */
    "marginTop": 0.4,                /* default = 0.4 inches */
    "marginBottom": 0.4,            /* default = 0.4 inches */
    "marginLeft": 0.4,                /* default = 0.4 inches */
    "marginRight": 0.4,                /* default = 0.4 inches */
    "pageRanges": "",                /* default = "" empty string to print all pages */
    "ignoreInvalidPageRanges": true,/* default = true */
    "preferCSSPageSize": true,        /* default = true */
    
    "footerTemplate": "<div style='width:15%;margin-left:0.5cm;text-align:left;font-size:7px;'>
                        <span><span class='date'></span></div>
                        <div style='width:70%;direction:rtl;white-space:nowrap;overflow:hidden;text-overflow:clip;text-align:center;font-size:7px;'>
                        <span>CHROMIUM HEADLESS BROWSER FOOTER 1 CHROMIUM HEADLESS BROWSER FOOTER 2 CHROMIUM HEADLESS BROWSER FOOTER 3 CHROMIUM HEADLESS BROWSER FOOTER 4 CHROMIUM HEADLESS BROWSER FOOTER 5 </span></div>
                        <div style='width:15%;margin-right:0.5cm;text-align:right;font-size:7px;'>
                        <span class='pageNumber'></span> of <span class='totalPages'></span>",
                        
    "headerTemplate": "<div style='width:15%;margin-left:0.5cm;text-align:left;font-size:7px;'>
                        <span class='date'></span></div>
                        <div style='width:70%;direction:rtl;white-space:nowrap;overflow:hidden;text-overflow:clip;text-align:center;font-size:7px;'>
                        <span>CHROMIUM HEADLESS BROWSER HEADER 1 CHROMIUM HEADLESS BROWSER HEADER 2 CHROMIUM HEADLESS BROWSER HEADER 3 CHROMIUM HEADLESS BROWSER HEADER 4 CHROMIUM HEADLESS BROWSER HEADER 5 </span></div>
                        <div style='width:15%;margin-right:0.5cm;text-align:right;font-size:7px;'>
                        <span class='pageNumber'></span> of <span class='totalPages'></span></div>"
}





Eric Seckler

unread,
Jan 18, 2021, 7:06:40 AM1/18/21
to ziggy, headle...@chromium.org, skyo...@chromium.org, ca...@chromium.org
+ca...@chromium.org

I'd suggest filing a bug on crbug.com and/or sending a patch :)

ziggy

unread,
Jan 18, 2021, 6:00:38 PM1/18/21
to Eric Seckler, headle...@chromium.org, skyo...@chromium.org, ca...@chromium.org
Thank you for replying. In my old life as the developer before retirement, coding would not start until a requirement is reviewed and accepted for development and eventual release, otherwise the effort may go nowhere :).  How does it work in Chromium?

I guess I could file the feature request on crbug.com and/or potentially  sending a patch . If I submit to  crbug.com , an experienced developer might be assigned or not :).   I suppose if I submit a patch, it may  have a better chance to be accepted into Chromium. To integrate into Google Chrome is probably an even bigger challenge.

Fow new person like myself, creating an official patch is likely to require much larger effort than creating a local patch (based on Contributing to Chromium page) . Obviously it would be helpful to have commitment that feature will be integrated into Chromium.


ziggy

unread,
Jan 20, 2021, 7:29:11 PM1/20/21
to headless-dev, ziggy, headle...@chromium.org, skyo...@chromium.org, ca...@chromium.org, Eric Seckler, chromi...@chromium.org
Hmm, I must be doing something wrong. I Replied to Andrey once from Gmail  and again from Chromium-dev group and I don't see my response anywhere. Re-sending for the last time I hope.

Hi Andrey,

Appreciate quick and informative response. The command line option to print to pdf is widely used so I suspect it will be challenging to drop the feature. If I need to guess, the feature will likely stay. My proposal was to add just one command line option to limit the number of options and make it stable.

Command line option is very easy to use and that is why it is used by many regular users instead of opening a webpage and selecting the "print" option or instead of using puppeteer. I could be wrong but I suspect that using the command line option minimizes potential state issues that might be difficult to resolve by regular users.

Eric Seckler suggested that I submit a patch but I would hesitate without commitment that the feature will be integrated into Chromium and eventually into Google Chrome. Hope you reconsider and enhance --print-to-pdf to synchronize with Page.printToPDF to benefit many users until you decide to deprecate the feature at some point but it will be challenging I believe.

In the meantime I could investigate how to integrate a subset of CDP  into Mbox Viewer which is c++ application.  Do you have suggestions/examples  where I can find c++ binding of CDP and/or API?

Best Regards,
Zbigniew


On Tuesday, January 19, 2021 at 1:51:40 PM UTC-6 ca...@chromium.org wrote:
Hi Zbigniew,

apologies for the late reply. We're really looking to deprecate and remove much of the command-line functionality to control the rendering of the page available in headless. From my point of view, there's a fundamental limit on the flexibility that the command line can give you, and ultimately one should use proper API to talk to Chrome -- which, in this case, is Chrome DevTools protocol. This is also what headless does internally, as you've already seen, so you should be able to borrow the code for configuring printing, along with the generated CDP bindings, and then use "chrome --remote-debugging-pipe" to talk DevTools protocol to chrome. I realize this is more work for you, but moving this complexity to the client side is justified, in my view, by not shipping this logic to millions of desktop chrome users. On our side, we're looking to eventually extract the client CDP library the headless uses internally and make it easier to re-use. Your other option is using Puppeteer to launch chrome and talk CDP.

Best regards,
Andrey.
Reply all
Reply to author
Forward
0 new messages