Since my local patch to enhance the -print-to-pdf to support custom page header and footere is likely not going to be incorporated into Chromium, I moved on to explore other solutions. I started work on two potential solutions:
1. Use of devtools and javascriptI found chrome-headless-render-pdf tool that employs CDP to render pages to PDF but was lacking options to customize the page header and footer.
See
https://github.com/Szpadel/chrome-headless-render-pdfI enhanced this tool to support custom headers and footers and it works well. The one potential issue I have observed are zombie node.exe processes with no chrome-headless-render-pdf connected and no Google Chrome.
Performance of this solution is likely lower, but acceptable, than the print-to-pdf option since Chrome Browser needs to encode PDF, send to chrome-headless-render-pdf to be decoded before writing the the file.
2. Use of the debug-remote-pipe recommended by AndreyI started to experiment with that solution but I am facing issues and need help since I am new to the vast Chromium framework. I am working on Windows 10.
I found an example how to create a Child Process with Redirected Input and Output
https://docs.microsoft.com/en-us/windows/win32/ProcThread/creating-a-child-process-with-redirected-input-and-outputCreated simple console app projects for the parent and child processes and it seem to work. Parent sends text messages to the child which echo these messages back to the parent.
I replaced the child.exe with the Google chrome.exe but it doesn't quite work yet. Chrome terminates even without parent process sending any JSON message. Created log file doesn't tell me much. I am creating chrome process with the following options:
chrome.exe --headless --user-data-dir=c:\\Temp\\user --data-path-dir=c:\\Temp\\data --disable-gpu --remote-debugging-pipe --enable-logging --v=99
I also experimented with other options such as:
"--disable-translate --disable-extensions --disable-background-networking --safebrowsing-disable-auto-update --log-file=C:/Temp/log.txt"
"--disable-default-apps --hide-scrollbars --metrics-recording-only --mute-audio --no-first-run --webrtc-event-logging --single-process"
" --js-flags=\"--trace -opt --trace -deopt --trace -bailout\"
Depending on the case, the parent process will print messages from the chrome process to the console, so the pipe from chrome to the parent works. I am not sure if messages from the parent are received by the chrome process.
I am sending the following zero terminated test JSON message from the parent to chrome: {"id": 0, "method": "Target.getTargets"} but not getting any response to this request. This is based on the post:
https://github.com/cyrus-and/chrome-remote-interface/issues/381I have to admit that I have not yet looked at JSON protocol and messaging over the pipe. It seems that both binary and asci protocols are supported on the pipe but I am not sure how the chrome and client agree on the protocol.
I was looking for working example that leverages pipe to communicate with the chrome browser but didn't find any. Didn't find much of general information either.
https://groups.google.com/a/chromium.org/g/headless-dev/c/n1cLc4qSfrMI have Chromium browser build on my local machine and I am getting more log messages when creating chromium based chrome.exe process by the parent but messages are not easy to understand. I would like to run chrome under VS debugger but there seem to be no way to start VS from the program. To attach debugger before the chrome terminates, I need to know where the messages are received in the code and add the SLEEP to give me time to attach the VS debugger.
Appreciate any help.
Best Regards,
Zbigniew
On Thursday, January 28, 2021 at 11:27:21 AM UTC-6 ziggy wrote:
Hi Andrey,
Appreciate quick and informative response. The command line option to print to pdf is widely used so I suspect it will be challenging to drop the feature. If I need to guess, the feature will likely stay. My proposal was to add just one command line option to limit the number of options and make it stable.
Command line option is very easy to use and that is why it is used by many regular users instead of opening a webpage and selecting the "print" option or instead of using puppeteer. I could be wrong but I suspect that using the command line option minimizes potential state issues that might be difficult to resolve by regular users.
Eric Seckler suggested that I submit a patch but I would hesitate without commitment that the feature will be integrated into Chromium and eventually into Google Chrome. Hope you reconsider and enhance --print-to-pdf to synchronize with Page,printToPDF to benefit many users until you decide to deprecate the feature at some point but it will be challenging I believe.
In the meantime I could investigate how to integrate a subset of CDP into Mbox Viewer which is c++ application. Do you have suggestions/examples where I can find c++ binding of CDP and/or API?
Best Regards,
Zbigniew
Hi Zbigniew,
apologies for the late reply. We're really looking to deprecate and remove much of the command-line functionality to control the rendering of the page available in headless. From my point of view, there's a fundamental limit on the flexibility that the command line can give you, and ultimately one should use proper API to talk to Chrome -- which, in this case, is Chrome DevTools protocol. This is also what headless does internally, as you've already seen, so you should be able to borrow the code for configuring printing, along with the generated CDP bindings, and then use "chrome --remote-debugging-pipe" to talk DevTools protocol to chrome. I realize this is more work for you, but moving this complexity to the client side is justified, in my view, by not shipping this logic to millions of desktop chrome users. On our side, we're looking to eventually extract the client CDP library the headless uses internally and make it easier to re-use. Your other option is using Puppeteer to launch chrome and talk CDP.
Best regards,
Andrey.
I am a developer on the Mbox Mail Viewer project, a free Windows application available on Github and Sourceforge to view mbox archive files such as Google Takeout archives. Users of Mbox Mail Viewer require the ability to print multiple mails to PDF without user interaction. Currently, Mbox Viewer relies on the standard Google Chrome Browser to print mails to PDF files via command line option --print-to-pdf. However, many users complain about lack of ability to customize PDF output, such as landscape, header and footer, etc. Browsing the Internet, I see that many other users raised the same issue in the past. Standard reply for this issue is to use the devtools. For my application and many other simple applications, use of the devtools is overkill, introducing unnecessary dependencies and risk. I understand that using the devtolols has many advantages for some deployments but for the simple use cases it is an unnecessary burden. It is much simpler to rely on the standard browser which is kept up to date automatically.
I am not exactly sure why the print-to-pdf option was not enhanced by now and synchronized with Page.printToPDF. In my humble opinion the print-to-pdf should be enhanced or dropped, including possibly from devtools. If not dropped I don't see a good reason why it should not be enhanced since 98+% of code already exists. I did some prototyping (see below) to see what changes to the code might be needed.
I added the new option --print-to-pdf-page-config="PrintPageConfigFile" to support customization of PDF. See below example of the json file.
I am hoping the OWNERS will seriously consider enhancing Chromium Browser and ultimately Google Browser to support customization of pdf output via command line option. Implementation effort is fairly small so I hope after many years in limbo the feature will be prioritized, implemented and released in Google Browser
Below I described work I have done and issues I faced.
Thank You,
Zbigniew
+++++++++++++++++++++++++++++++++++
Building Chromium on Windows 10
+++++++++++++++++++++++++++++++++++
"Fetch --no-history chromium" command was failing consistently at libdavld and reporting:
0:09:44] Cloning into 'F:\Chrom\chromium\src\third_party\dav1d\_gclient_libdav1d_gue6lli1'...
[0:09:44] error: RPC failed; HTTP 400 curl 22 The requested URL returned error: 400
I tried 6 times without success. Each time I tried Fetch from scratch, I had to delete 400,000+ files !!!.
I didn't see any information on the Chromium site that would help to recover from similar failures.
After the last Fetch failure, in desperation, I tried "gclient sync -D" and to my surprise it worked. Not sure this is a proper workaround but it seems to work.
Running "git status" showed two leftover directories:
F:\Chrom\chromium\src\third_party\dav1d
_gclient_gittmp_libdav1dugn6rfaj
_gclient_libdav1d_fwaisxxv
which I deleted.
Next, to reduce file system overhead I excluded build directories from antivirus Windows Defender software and run
gn gen out/Default
autoninja -C out\Default chrome
to build a browser. It took 6 hours to complete the build on my:
HP Zbook 15 G5, 4 physical cores/8 logical cores, 16GB RAM, 2.3Ghz, all SSD drives.
+++++++++++++++++++++++++++++++++++++++++++++++
Editing and Debugging under Visual Studio 19 IDE
+++++++++++++++++++++++++++++++++++++++++++++++++
I was going to use VS 19 to protype and debug enhancements to the print-to-pdf, so I generated VS project files as follow:
gn gen --ide=vs out\Default
That resulted in over 9000 project files. VS could not handle such a large number of projects reliably on 16GB RAM. I upgraded my machine to 48GB and reduced the number of generated projects to around 4000 by running:
gn gen --ide=vs --filters=//chrome;//headless out\Default
Having more RAM and less projects really helped to make VS fairly stable (but not completely).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adding command line option to support all Page.printToPDF options.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I added the new option --print-to-pdf-page-config="PrintPageConfigFile" to support customization of PDF output.
As I suspected, adding such an option is fairly straightforward since 99+% of code already exists. I made the following changes to 3 files:
+++++++++ headless_shell_switches.h, added
HEADLESS_EXPORT extern const char kPrintToPDFPageConfig[];
+++++++++++ eadless_shell_switches.cc, added
const char kPrintToPDFPageConfig[] = "print-to-pdf-page-config";
++++++++++++++++++ headless_shell.cc
std::unique_ptr<headless::page::PrintToPDFParams>
ReadPageConfigParams(base::FilePath& pdf_page_config_file_name)
{
.........
// uses base::JSONReader::Read(json_text); for parsing json configuration
}
void HeadlessShell::PrintToPDF()
{
DCHECK_CURRENTLY_ON(content::BrowserThread::UI);
// Begin of added code
if (base::CommandLine::ForCurrentProcess()->HasSwitch( switches::kPrintToPDFPageConfig))
{
base::FilePath pdf_page_config_file_name =
base::CommandLine::ForCurrentProcess()->GetSwitchValuePath(
switches::kPrintToPDFPageConfig);
// ++++++++++++++++
// ReadPageConfigParams(pdf_page_config_file_name) function reads parameters from json file,
// sets up and returns PrintToPDFParams
//+++++++++++++
devtools_client_->GetPage()->GetExperimental()->PrintToPDF(
ReadPageConfigParams(pdf_page_config_file_name),
base::BindOnce(&HeadlessShell::OnPDFCreated, weak_factory_.GetWeakPtr()));
}
else
// End
{
bool display_header_footer =
!base::CommandLine::ForCurrentProcess()->HasSwitch(
switches::kPrintToPDFNoHeader);
devtools_client_->GetPage()->GetExperimental()->PrintToPDF(
page::PrintToPDFParams::Builder()
.SetDisplayHeaderFooter(display_header_footer)
.SetPrintBackground(true)
.SetPreferCSSPageSize(true)
.Build(),
base::BindOnce(&HeadlessShell::OnPDFCreated,
weak_factory_.GetWeakPtr()));
}
}
+++++++++ json print page configuration file
It took me several iterations to figure out how to configure footerTemplate and headerTemplate that works and handle overflow of user provided text to be shown in the middle of the line. The devtools users may have better examples, appreciate it if you can post and share your examples.
NOTE: footer and header seems to be missing when printing some web pages except for the footer on the last printed page. Try to print https://sourceforge.net/ .
{
"landscape": false, /* default = false */
"displayHeaderFooter": true, /* default = true */
"printBackground": true, /* default = true */
"scale": 1.0, /* default = 1.0 */
"paperWidth": 8.5, /* default = 8.5 inches */
"paperHeight": 11.0, /* default = 11.0 inches */
"marginTop": 0.4, /* default = 0.4 inches */
"marginBottom": 0.4, /* default = 0.4 inches */
"marginLeft": 0.4, /* default = 0.4 inches */
"marginRight": 0.4, /* default = 0.4 inches */
"pageRanges": "", /* default = "" empty string to print all pages */
"ignoreInvalidPageRanges": true,/* default = true */
"preferCSSPageSize": true, /* default = true */
"footerTemplate": "<div style='width:15%;margin-left:0.5cm;text-align:left;font-size:7px;'>
<span><span class='date'></span></div>
<div style='width:70%;direction:rtl;white-space:nowrap;overflow:hidden;text-overflow:clip;text-align:center;font-size:7px;'>
<span>CHROMIUM HEADLESS BROWSER FOOTER 1 CHROMIUM HEADLESS BROWSER FOOTER 2 CHROMIUM HEADLESS BROWSER FOOTER 3 CHROMIUM HEADLESS BROWSER FOOTER 4 CHROMIUM HEADLESS BROWSER FOOTER 5 </span></div>
<div style='width:15%;margin-right:0.5cm;text-align:right;font-size:7px;'>
<span class='pageNumber'></span> of <span class='totalPages'></span>",
"headerTemplate": "<div style='width:15%;margin-left:0.5cm;text-align:left;font-size:7px;'>
<span class='date'></span></div>
<div style='width:70%;direction:rtl;white-space:nowrap;overflow:hidden;text-overflow:clip;text-align:center;font-size:7px;'>
<span>CHROMIUM HEADLESS BROWSER HEADER 1 CHROMIUM HEADLESS BROWSER HEADER 2 CHROMIUM HEADLESS BROWSER HEADER 3 CHROMIUM HEADLESS BROWSER HEADER 4 CHROMIUM HEADLESS BROWSER HEADER 5 </span></div>
<div style='width:15%;margin-right:0.5cm;text-align:right;font-size:7px;'>
<span class='pageNumber'></span> of <span class='totalPages'></span></div>"
}
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/7229ae69-6263-4842-98b3-6459736304b1n%40chromium.org.