Mechanize function in Scrapy

64 views
Skip to first unread message

Sayth Renshaw

unread,
Mar 11, 2014, 5:53:48 AM3/11/14
to scrapy...@googlegroups.com
Hi

Having completed and toyed with the tutorial I have something I don't understand. What happens when my base url features links and content that change daily? 
I don't want all the data only specific documents when they update to the page. 

From the base url to get the link across to the page I want to scrape is  body/div/div/div/div/table/tbody/tr/td/p/a.
So i want to navigate down that path if State and location details when they update. So will Scrapy allow me to do that or do I need to employ something like Mechanize https://pypi.python.org/pypi/mechanize/?

Sayth

Pablo Hoffman

unread,
Apr 18, 2014, 11:40:17 AM4/18/14
to scrapy-users
One way to do that is to keep track (in a disk file, for example) of already seen urls & content (along with their hashes) and check every scraped item against those in an item pipeline [1], dropping [2] the ones that were already seen before.



--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages