any possible way to block all ads in headless mode?

2,917 views
Skip to first unread message

headless-man

unread,
Dec 16, 2017, 10:31:53 PM12/16/17
to headless-dev
since extension not supported in headless mode, any possible way to block all ads? like ublock extension which block all ads

Isaac Dawson

unread,
Dec 17, 2017, 6:22:05 AM12/17/17
to headless-man, headless-dev
Maybe look at the source of ublock https://github.com/gorhill/uBlock and use the devtools commands for blocking requests?

On Sun, Dec 17, 2017 at 12:31 PM headless-man <bakht...@gmail.com> wrote:
since extension not supported in headless mode, any possible way to block all ads? like ublock extension which block all ads

--
You received this message because you are subscribed to the Google Groups "headless-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to headless-dev...@chromium.org.
To post to this group, send email to headle...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/headless-dev/ab6836f4-819e-4da4-850c-05a9f6bc8daa%40chromium.org.

markus...@gmail.com

unread,
Jan 24, 2018, 9:12:42 PM1/24/18
to headless-dev
I had the same problem when using puppeteer with headless. I ended up using the host file available from:


For puppeteer I read in this host file:

//now we read the host file
var hostFile = fs.readFileSync('hosts.txt', 'utf8').split('\n');
var hosts = {};
for (var i = 0; i < hostFile.length; i++) {
    var frags = hostFile[i].split(' ');
    if (frags.length > 1 && frags[0] === '0.0.0.0') {
        hosts[frags[1].trim()] = true;
    }
}

When loading a page I then filter out requests for these domains (and optionally images):

    page.on('request', request => {
        var domain = null;
        if (task.input.blockads) {
            var frags = request.url().split('/');
            if (frags.length > 2) {
                domain = frags[2];
            }
        }
        if ((task.input.blockads && hosts[domain] === true) || (!task.input.includephotos && request.resourceType() === 'image')) {
            request.abort();
        }
        else {
            request.continue();
        }
    });

This solution hugely improved the speed of our scraper.

nkic...@gmail.com

unread,
Dec 16, 2018, 7:05:50 PM12/16/18
to headless-dev, markus...@gmail.com
That's quite a brilliant way to block ads in puppeteer.
BTW, what is task.input.blockads ??
Where are they from? task, input, blockads

Thank you in advance.

oluwafe...@gmail.com

unread,
Sep 22, 2019, 12:26:29 PM9/22/19
to headless-dev


On Sunday, 17 December 2017 04:31:53 UTC+1, headless-man wrote:
since extension not supported in headless mode, any possible way to block all ads? like ublock extension which block all ads

Co-ask 

remi....@gmail.com

unread,
Feb 19, 2020, 2:30:10 PM2/19/20
to headless-dev

Hey there,


If anyone is still looking for a well-maintained and robust ad-blocking solution for puppeteer-based projects. I am maintaining one at https://www.npmjs.com/package/@cliqz/adblocker-puppeteer (source: https://github.com/cliqz-oss/adblocker/tree/master/packages/adblocker-puppeteer#usage ). It only takes a few lines to setup, it’s written in pure JavaScript (so no compilation step required), and it’s really efficient.


Best,

Rémi

jellybelly.j...@gmail.com

unread,
Jul 24, 2020, 7:17:21 PM7/24/20
to headless-dev, remi....@gmail.com
Hi Rémi, the adblocker you mentioned is very helpful. Does it block ads from youtube?

remi

unread,
Jul 26, 2020, 5:15:21 AM7/26/20
to jellybelly.j...@gmail.com, headless-dev

Hi there,

Yes the adblocker is able to block Youtube ads. It might be the only one for Puppeteer able to do that (except if you load an adblocker extension but then I think you cannot do that in headless mode).

Piotr Nowak

unread,
Oct 2, 2020, 3:40:08 PM10/2/20
to headless-dev, nkic...@gmail.com, markus...@gmail.com
Little late response but I just stumbled upon this thread.

I beleive task.input.blockads/task.input.includephotos are objects where there set to true and false respectivly.

Shortened it just a bit but below works for me.

    let blockads = true;
    let includephotos = false;

    page.on('request', request => {
        var domain = null;
        if (task.input.blockads) {
            var frags = request.url().split('/');
            if (frags.length > 2) {
                domain = frags[2];
            }
        }
        if ((blockads && hosts[domain] === true) || (!includephotos && request.resourceType() === 'image')) {
            request.abort();
        }
        else {
            request.continue();
        }
    });

mertcan mıgırdag

unread,
Feb 25, 2021, 11:25:36 PM2/25/21
to headless-dev, Piotr Nowak, nkic...@gmail.com, markus...@gmail.com
?

2 Ekim 2020 Cuma tarihinde saat 22:40:08 UTC+3 itibarıyla Piotr Nowak şunları yazdı:
Reply all
Reply to author
Forward
0 new messages