I'm trying to port my PhantomJS script over to Headless Chrome to test improvements in terms of cpu, ram, network and speed.
My script loads a page and blocks every request (html, js, css, gif, mp4 etc) but the allowed ones (atm regex urls like "var match4 = ((/
tags.tiqcdn.com\/utag\/electriccompany/g).test(requestUrl));") that are defined in the script and depend on the domain I want to load (
example.com has different rules than
houseworkstuff.es).
If the final request url is found (this is always the same url), I get the parameters of that url, parse it to JSON and log them with the status code and url name. Then comes the next page in the list (up to 1,5 million).
I haven't found a way to do that filtering/blocking with headless chrome, are the APIs not there yet or did I just overlooked them?
And is there a parameter to disable the loading of images/graphics? There was a thread from 10.03.2017 discussing that and it seemed the APIs weren't ready then? I'm sick of the huge memory leak bug from PhantomJS if you set page.settings.loadImages = false; crashing my script.
Aaaaand ... how do I get the request urls from javascript that loads javascript? I can load the request urls like the example here (
https://github.com/cyrus-and/chrome-remote-interface) but that only shows me the request urls from the source code, not the request (external) javascript loads on that site?
Best regards
M