Is it possible to scrape "Allintitle Google Search" from app script?

779 views
Skip to first unread message

maarofi

unread,
Jan 8, 2022, 8:25:32 AM1/8/22
to Google Apps Script Community
Greetings People

I am working on apps script which need to search for keywords in google search and capture the number of results from the request. I am using the URLFetchApp class that fetches the results of Google Search query but I have been facing a problem with it. 

Google doesn't allow my bot because it see to many requests from same IP address. Is there any alternative way I can use to perform this task. For reference please look at the attached screenshots. I want to take one keyword from first column of the row and request google allintitle search for that capture number of results and update the second column of the sheet. 


2.jpeg
1.jpeg

maarofi

unread,
Jan 8, 2022, 8:33:10 AM1/8/22
to Google Apps Script Community
Here is the script I am using to get the search query response. 

function myFunction() {
   var searchResults=UrlFetchApp.fetch("https://www.google.com/search?q="+encodeURIComponent("allintitle: Iphone vs android"), {muteHttpExceptions: true});
  
  var titleResults=searchResults.getContentText();

   Logger.log(titleResults)
}

And here is the response I am getting. 

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="initial-scale=1"><title>https://www.google.com/search?q=allintitle%3A%20Iphone%20vs%20android</title></head> <body style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById('captcha');if(e){e.focus();}"> <div style="max-width:400px;"> <hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br> <form id="captcha-form" action="index" method="post"> <script src="https://www.google.com/recaptcha/api.js" async defer></script> <script>var submitCallback = function(response) {document.getElementById('captcha-form').submit();};</script> <div id="recaptcha" class="g-recaptcha" data-sitekey="6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b" data-callback="submitCallback" data-s="lGcSLl8Rie9dc0M8smGznJbTlrLM4N_jzEI-qRiajILqO9VKIS8dp_WHJLXv5OteqO5Fh2OC3pgbNnItDjKUNQETKzh9qBvLQBkUOZY-6Zk0p49ulmGlrtaJn0GZ74fUl1wJ7FegKLqazBbXPSV80YSKcFKF-odPmqfgCSjVn5H-xoUVCaSW4ZDySIfp6uH0yQlF7AWTrvaIgELmZOuUW3vz1HFBRZzt-qrMY48"></div> <input type='hidden' name='q' value='EgQidBZhGOui5o4GIhAFJwrT9_TcZ_DQuuzFXioMMgFy'><input type="hidden" name="continue" value="https://www.google.com/search?q=allintitle%3A%20Iphone%20vs%20android"> </form> <hr noshade size="1" style="color:#ccc; background-color:#ccc;"> <div style="font-size:13px;"> <b>About this page</b><br><br> Our systems have detected unusual traffic from your computer network. This page checks to see if it&#39;s really you sending the requests, and not a robot. <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br><br> <div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;"> This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help &mdash; a different computer using the same IP address may be responsible. <a href="//support.google.com/websearch/answer/86640">Learn more</a><br><br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly. </div> IP address: AA.AAA.AA.AA<br>Time: 2022-01-08T13:28:11Z<br>URL: https://www.google.com/search?q=allintitle%3A%20Iphone%20vs%20android<br> </div> </div> </body> </html>

Clark Lind

unread,
Jan 9, 2022, 1:18:48 PM1/9/22
to Google Apps Script Community
If you add an "async / await" to the fetch, and remove the muteHttpExceptions it works. You may not even need the async/await. It returns the html text as if serving this page.
Now you just need to parse the page for what you need.

async function myFunction() {
  var searchResults = await UrlFetchApp.fetch("https://www.google.com/search?q=" + encodeURIComponent("allintitle: Iphone vs android"));
  var titleResults = searchResults.getContentText()

  console.log(titleResults)
}

maarofi

unread,
Jan 10, 2022, 10:13:53 AM1/10/22
to Google Apps Script Community

Hi thank you for your response. I tried it your way with the "async and await" keywords and removing the {muteHttpExceptions: true} but still I get the same error. Please look at the attached screenshot.
Screenshot_3.png

maarofi

unread,
Jan 10, 2022, 10:16:33 AM1/10/22
to Google Apps Script Community
For first 10 or so requests it does return the exact response I need but after a while the Google send a captcha confirmation request because I am dealing with about 3300 rows in my excel file. So I make about 3300 requests from one IP Address. 

Clark Lind

unread,
Jan 10, 2022, 10:25:50 AM1/10/22
to google-apps-sc...@googlegroups.com
If you know Python, I think it is better suited for these kinds of operations. Otherwise, you are dependent on Google servers making the urlFetch calls, and running into limitations it places on you.

If not, you can try using a setInterval, or setTimeout to slow the script down so Google doesn't think you are spamming it or making a dDos attack. I would go with Python if you know it.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Apps Script Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-apps-script-community/gcgclIwn40Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-apps-script-c...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-script-community/dfc9ad20-3e6d-4f58-a389-bd97be8b4cafn%40googlegroups.com.

maarofi

unread,
Jan 10, 2022, 10:53:54 AM1/10/22
to Google Apps Script Community
Mr. Clark, 

I know Python and I am good with it but that would not run as apps script on Google Sheet. Can you please suggest how can I do this in Python ?

Clark Lind

unread,
Jan 10, 2022, 11:32:56 AM1/10/22
to google-apps-sc...@googlegroups.com
Use python on your local machine, and just make a call to the spreadsheet to get the search terms. Then iterate over the search terms in python, making calls to the Google search. Basically, the Sheet is just a list, so you don't really even need Sheets unless interacting with others. But if you want, after you have your results, you can then put the results back in a sheet. Otherwise, maybe just download the sheet as a csv file, and don't even use sheets at all. Does that make sense?

Clark Lind

unread,
Jan 10, 2022, 11:36:01 AM1/10/22
to google-apps-sc...@googlegroups.com
p.s. if you look at the lower left, you can see the Python reference:


On Mon, Jan 10, 2022 at 10:53 AM maarofi <kaami...@gmail.com> wrote:

maarofi

unread,
Jan 10, 2022, 2:04:51 PM1/10/22
to Google Apps Script Community
Thank you Mr. Clark for pointing that out. Yes, I can access Sheets in Python and currently I am looking to figuring out how can I make google search from pytho n so that Google do not consider it as bot. Once again, I really appreciate your help for this task. 

Chiến Thắng Vi

unread,
Apr 15, 2022, 2:10:50 AM4/15/22
to Google Apps Script Community
Dear  maarofi
I can share for me, your script
Thanks
Vào lúc 02:04:51 UTC+7 ngày Thứ Ba, 11 tháng 1, 2022, maarofi đã viết:
Reply all
Reply to author
Forward
0 new messages