Load pages without viewing

159 views
Skip to first unread message

Kevin Quinn

unread,
Jul 13, 2022, 8:23:57 AM7/13/22
to Chromium Extensions
Hello, I'm new to Chrome extensions and  I've searched a lot of documentation and can't find an answer to this. I would like to get a page without opening it in a tab or viewing it all. For example, if a visitor goes to a specific page, I want to view the contents of that page, but also at the same time, visit three more pages and scrape some data from them, without viewing and without opening a new tab, or iframe. 

In one of the pages, I want to get, it does several fetch commands while loading the page. I just want to get all the contents without viewing the page. More specifically, I would like to get the json contents of one of the fetch commands that it does. I can see it when I load the full page, but I can't get this to work using fetch. Is there another way?

Deco

unread,
Jul 13, 2022, 8:27:37 AM7/13/22
to Kevin Quinn, Chromium Extensions
I very much doubt this would be allowed as it would be considered background harvesting (ignoring the technical implementation) - as a user isn't on the direct website you are fetching scripts from, you should look through CWS policy regarding this before attempting to do it.

Thanks,
Decklin 

On Wed, 13 Jul 2022, 13:24 Kevin Quinn, <kevinqu...@gmail.com> wrote:
Hello, I'm new to Chrome extensions and  I've searched a lot of documentation and can't find an answer to this. I would like to get a page without opening it in a tab or viewing it all. For example, if a visitor goes to a specific page, I want to view the contents of that page, but also at the same time, visit three more pages and scrape some data from them, without viewing and without opening a new tab, or iframe. 

In one of the pages, I want to get, it does several fetch commands while loading the page. I just want to get all the contents without viewing the page. More specifically, I would like to get the json contents of one of the fetch commands that it does. I can see it when I load the full page, but I can't get this to work using fetch. Is there another way?

--
You received this message because you are subscribed to the Google Groups "Chromium Extensions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-extens...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-extensions/17933794-31da-4ac2-97b0-295c55860738n%40chromium.org.

Kevin Quinn

unread,
Jul 13, 2022, 8:46:13 AM7/13/22
to Chromium Extensions, decklin...@gmail.com, Chromium Extensions, Kevin Quinn
Thanks Decklin. I have a client that wants a similar functionality and I thought I could get the data through an API, but the sites this client wants me to get, don't have public apis. I'm trying to find a work-around. I thought about making a backend API and hitting an endpoint with the extension and then scraping the sites from the backend. This would take too long for the users to get the data and I don't want to scrape it. 

Deco

unread,
Jul 13, 2022, 9:02:56 AM7/13/22
to Kevin Quinn, Chromium Extensions
If the extension is for a private client then CWS's policies can be ignored for this specifically (as i presume it won't be published to the Chrome Web Store publicly), one approach to this is using Javascript listeners for identifying when a host name is present in the omnibox, then opening up several new tabs which will allow for the declarative fetch script to execute the scraping for the relevant data. You can use getcurrentTab() to identify the host origin name and pass in relevant opener scripts, this shouldn't require any additional modification in terms of the main declarative functions as the manifest should execute them anyways.

This admittedly is a workaround to Chrome documentation specifically, i don't recall any API which exists that allows for calling the data without violating CORS, for details specifically you can read this which explains why your current approach is not fetching the relevant data for the other hosts.

Cheers,
Decklin

Kevin Quinn

unread,
Jul 13, 2022, 9:15:27 AM7/13/22
to Deco, Chromium Extensions
I think this might work. Is this actually against CWS's policies though? I'm building this for a client, but I don't know if they plan on putting this on the CWS or how they plan on distributing this. I just want to make sure I can let them know if there might be an issue. 

Thanks for all your help. You cleared up a bunch of issues I had.

Deco

unread,
Jul 13, 2022, 9:22:04 AM7/13/22
to Kevin Quinn, Chromium Extensions
Good question, CWS policies exist mostly for end users (e.g. public facing extensions), whereas private distributions of extensions are not subject to the same policies, i assume this client is corporate in some manner, if so you should be fine, just ensure that their distribution is through Private means only (this is normally how enterprise clients deploy stuff anyways) and not public. CWS documentation on this form of deployment can be found here

No worries, best of luck with building it.

Cheers,
Decklin

Kevin Quinn

unread,
Jul 13, 2022, 9:24:28 AM7/13/22
to Deco, Chromium Extensions
Thanks again Decklin. You helped alot.

Deco

unread,
Jul 13, 2022, 9:27:20 AM7/13/22
to Kevin Quinn, Chromium Extensions
No worries best of luck with the extension Kevin.

wOxxOm

unread,
Jul 13, 2022, 10:15:22 AM7/13/22
to Chromium Extensions, decklin...@gmail.com, Chromium Extensions, kevinqu...@gmail.com
> I very much doubt this would be allowed

There is no such restriction neither technically nor in CWS policies. This use case is very popular. There's a lot of extensions doing it. To scrape pages that don't build their content via javascript you can directly download their URL by using `fetch`. To scrape pages that use javascript to build their content you'll have to use an iframe in your background page (ManifestV2 extension) and strip the X-Frame-Options header or the upcoming chrome.offscreen API in ManifestV3.

Simeon Vincent

unread,
Jul 13, 2022, 11:16:37 AM7/13/22
to wOxxOm, Chromium Extensions, decklin...@gmail.com, kevinqu...@gmail.com
+1 to wOxxOm's comments.

To speak more directly to the concern Decklin raised, the User data privacy section of the Developer Program Policies and accompanying FAQ describe how extensions published to the Chrome Web Store can use the data they can access. What I'm about to say is an oversimplification, but in essence extensions can make whatever network requests are necessary so long as the data they access is in support of the extension's single purpose, is disclosed to the user, and the data isn't being harvested or transferred/sold to third parties. 

Simeon - @dotproto
Chrome Extensions DevRel


Kevin Quinn

unread,
Jul 13, 2022, 5:08:27 PM7/13/22
to Chromium Extensions, Simeon Vincent, Chromium Extensions, decklin...@gmail.com, Kevin Quinn, wOxxOm
Thanks for your responses. I'm going to check out some of these ideas, before i meet with the client on Thursday night. I've ruled out setting up a server and creating an endpoint that scrapes the site. 
Reply all
Reply to author
Forward
0 new messages