Python Beautiful Soup Google Link Scrape

292 views
Skip to first unread message

Ryan Cass

unread,
Jul 10, 2016, 3:17:38 PM7/10/16
to beautifulsoup

aboutHow would you go grabbing the first link off google using beautiful soup and requests with Python 2.7? And if the first link isn't the link I want then grab the second link off google, and if that link isn't the link then the third, etc.

Or if there is a way to use mechanize too that be awesome.

So far I am just looking up a certain search on Google, and taking the first link off that cite, but sometimes the first link isn't always the link I want. So if it's not the link I want, how do I get it to grab the next one? Here's my code:


yelps = []

for i in range(0,len(orders)):
    print orders[i][0]

query = #business name orders[i][1] + #business address orders[i][2] #busCity+ orders[i][3]
goog_search = "https://www.google.com/search?q=yelp+" + query
#print str(goog_search)


try:
    r = requests.get(goog_search)
    soup = BeautifulSoup(r.text, "html.parser")

    cite = soup.find('cite')
    #print cite
    if cite != None:
        cite = cite.text
        if cite.startswith(h):
            print cite
            yelps.append(cite)
        elif cite.startswith(g):
            print cite
            yelps.append(cite)
            #write to file saying link needs to be fixed
        else:
            yelps.append("")
            print("Passed on %s and wrote it to a file and list as an empty string") % cite
            #write to file that this link was not a yelp link

except requests.exceptions.RequestException as e:
    print e
    sys.exit(1)


I'm looking for particular Yelp links. So I search google with a query of "Yelp + business name + business address" and take the first link off google hoping it's the Yelp link of that restaurant. I'm looking for www.yelp.com/biz/... Links. Most of the time I get it, but I need it to be 100% of the time. Sometimes it's a Yelp.com/search?.. Link and sometimes it's just not the right link at all
Reply all
Reply to author
Forward
0 new messages