How to Query if A URL is Indexed by Google?

404 views
Skip to first unread message

Alan

unread,
Jun 1, 2021, 11:09:31 PM6/1/21
to Google Apps Script Community
Hi,

I want to create a Google script to check if a given URL is indexed by Google, so I write the following function:

function CheckURLForGoogleIndex(url, activesheet) 
{
    // Delete the https:// and http:// prefix
    var cururl = url.replace("https://", "");      
    cururl = cururl.replace("http://", "");

    var googlesearchurl = "https://www.google.com/search?q=site:" + encodeURIComponent(cururl);

        var page = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();
  
          // Wait for 1 second before starting another fetch
        Utilities.sleep(1000);
  
        var number = page.match("did not match any documents");
        
        if (number) 
        {
          activesheet.getSheetByName("Not Google Index").appendRow([url]);
        } else 
        {
          activesheet.getSheetByName("Google Index").appendRow([url]);
        }  

However, when debugging the code, after invoking UrlFetchApp.fetch, I can only see the header of the variable page.

I try to test the function with a Google Indexed URL and not indexed URL, but both will return null in page.match function, so both are put in "Google Index" sheet.

What is the problem with my function?

Thanks

Alan

unread,
Jun 2, 2021, 5:05:25 PM6/2/21
to Google Apps Script Community
Any updates?

Martin Hawksey

unread,
Jun 2, 2021, 5:29:34 PM6/2/21
to Google Apps Script Community
One thing you might want to check is your function you declare var googlesearchurl but in your  UrlFetchApp.fetch you use url

e.g. instead of  

var page = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();

test it with

var page = UrlFetchApp.fetch( googlesearchurl, {muteHttpExceptions: true}).getContentText();

Alan

unread,
Jun 3, 2021, 6:06:52 PM6/3/21
to Google Apps Script Community

m.ha...@gmail.com ,  Thank you very much. 

I have replaced the url with googlesearchurl. Now the issue is, if I debug the code, then it works. If I run the code, then all URLs will be added to "Google Index" sheet. It seems that " did not match any documents" does not appear at all.

Message has been deleted

Swan Fournel

unread,
Jan 6, 2023, 8:49:37 AM1/6/23
to Google Apps Script Community
Hello Alan, 

I had the same issue as you. Did you figured out or found something equivalent as an index checker for sheets ? 

Have a good day ! 

Stuart Smith

unread,
Jan 6, 2023, 4:05:23 PM1/6/23
to Google Apps Script Community
I'm not sure Google is going to give you a straightforward HTML page you can search like this, especially in response to automated queries, given that this is against their terms of service.

I dumped the page to the log:

Logger.log(page);

And what I see is:

<b>About this page</b><br><br> Our systems have detected unusual traffic from your computer network. This page checks to see if it&#39;s really you sending the requests, and not a robot. <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br><br> <div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;"> This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests.

It's possible queries like this might work from time to time, but I doubt they will be reliable.  


Reply all
Reply to author
Forward
0 new messages