How to scrape data from google map??

3,845 views
Skip to first unread message

Royce

unread,
Sep 26, 2013, 9:56:48 PM9/26/13
to scrapy...@googlegroups.com
Hi guys. im having a really headache on how to scrape data from google map. Sry if it's confusing, i'll make it more clear

This is the page im trying to scrape : https://www.mcdonalds.com.sg/locate-us/

when the page loads, u see lots of McDonalds icon all over the map. If u click on one of it, it will show u the address, contact and operating hours of that store. 
So, my question is, how do i scrape all these info of all store locations from it? ( address, contact, hours ). These values are not located inside the HTML file of the page, im really lost

To the experienced scrapy user out there, pls help this greenhorn out, if possible, pls do an example code for me. im really bad at understanding theories and stuff...

P.S im using scrapy framework , and run my script on the cmd line. i save the data into a json file using "scrapy crawl smth.. -o smth.json -t json"

Thanks in advance

Rolando Espinoza La Fuente

unread,
Sep 27, 2013, 12:53:18 AM9/27/13
to scrapy...@googlegroups.com
Generally, websites that use a third party service to render some data visualization (map, table, etc) have to send the data somehow, and in most cases this data is accessible from the browser.

For this case, an inspection (i.e. exploring the requests made by the browser) shows that the data is loaded from a POST request to https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php
So, basically you have there all the data you want in a nice json format ready for consuming. 

Scrapy provides the "shell" command which is very convenient to thinker with the website before writing the spider:

2013-09-27 00:44:14-0400 [scrapy] INFO: Scrapy 0.16.5 started (bot: scrapybot)
...

In [1]: from scrapy.http import FormRequest

In [2]: req = FormRequest('https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php', formdata={'action': 'ws_search_store_location', 'store_name':'0', 'store_area':'0', 'store_type':'0'})

In [3]: fetch(req)
2013-09-27 00:45:13-0400 [default] DEBUG: Crawled (200) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)
...

In [4]: import json

In [5]: data = json.loads(response.body)

In [6]: len(data['stores']['listing'])
Out[6]: 127

In [7]: data['stores']['listing'][0]
Out[7]: 
{u'address': u'678A Woodlands Avenue 6<br/>#01-05<br/>Singapore 731678',
 u'city': u'Singapore',
 u'id': 78,
 u'lat': u'1.440409',
 u'lon': u'103.801489',
 u'name': u"McDonald's Admiralty",
 u'op_hours': u'24 hours<br>\r\nDessert Kiosk: 0900-0100',
 u'phone': u'68940513',
 u'region': u'north',
 u'type': [u'24hrs', u'dessert_kiosk'],
 u'zip': u'731678'}


In short: in your spider you have to return the FormRequest(...) above, then in the callback load the json object from the response's body and finally for each store in the listing create the an item with the values.

Hope that helps.

Regards,
Rolando



--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Tremberth

unread,
Sep 27, 2013, 3:08:30 AM9/27/13
to scrapy...@googlegroups.com

Xiaorong CHEN

unread,
May 9, 2016, 5:27:05 AM5/9/16
to scrapy-users
Hello,
I have a similar question for this web site: http://familyquick.fr/store-locator
The idea is also to retrieve all restaurants (Quick) in France.
could you please help ?
Thank you
Reply all
Reply to author
Forward
0 new messages