404 pages and masking IP address

16 views
Skip to first unread message

Paul Bradshaw

unread,
Jun 27, 2015, 10:21:25 AM6/27/15
to scrap...@googlegroups.com
I'm trying to get data from a site but notice the site serves a 404 page to Scraperwiki despite the same URL delivering fine in a browser and using cURL. 

I'm guessing it has Scraperwiki blocked in some way and the solution is to mask the IP address or use a proxy somehow. Is there a way to do this in the code on Scraperwiki?


--

Paul Bradshaw

Out now - Finding Stories in Spreadsheets; https://leanpub.com/spreadsheetstories 
Scraping for Journalists: http://leanpub.com/scrapingforjournalists 
8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way: https://leanpub.com/8000holes (all proceeds to the Brittle Bone Society)
The Online Journalism Handbook: http://amzn.to/jEND3p 

Please use secure email if you can: my public key is at https://pgp.mit.edu/pks/lookup?op=get&search=0x540D6E3F

Online Journalism Blog http://onlinejournalismblog.com 
Help Me Investigate http://helpmeinvestigate.com - Shortlisted for 
Multimedia Publisher of the Year, 2010; winner of Talk About Local investigation of the year 2010

Organiser, Hacks and Hackers Birmingham http://meetupbirmingham.hackshackers.com/
Behind The Numbers - the Birmingham Mail datablog

Visiting Professor, City University, London http://www.city.ac.uk/journalism/
Associate Professor and course leader, MA Online Journalism, Birmingham City University http://bit.ly/maonlinejournalism

http://twitter.com/paulbradshaw
LinkedIn profile and recommendations at http://bit.ly/paulbrecommendations


Thomas Levine

unread,
Jun 27, 2015, 12:30:50 PM6/27/15
to scrap...@googlegroups.com
Set up your own proxy or hire one.
http://proxymesh.com/

Configure the proxy like this.

export http_proxy=blahblahblah
./run-my-program

On 27 Jun 15:21, Paul Bradshaw wrote:
> I'm trying to get data from a site but notice the site serves a 404 page to
> Scraperwiki despite the same URL delivering fine in a browser and using
> cURL.
>
> I'm guessing it has Scraperwiki blocked in some way and the solution is to
> mask the IP address or use a proxy somehow. Is there a way to do this in
> the code on Scraperwiki?
>
>
> --
>
> Paul Bradshaw
>
> Out now - Finding Stories in Spreadsheets;
> https://leanpub.com/spreadsheetstories
> Scraping for Journalists: http://leanpub.com/scrapingforjournalists
> Data Journalism Heist: https://leanpub.com/DataJournalismHeist
> 8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way:
> https://leanpub.com/8000holes (all proceeds to the Brittle Bone Society)
> The Online Journalism Handbook: http://amzn.to/jEND3p
>
> Please use secure email if you can: my public key is at
> https://pgp.mit.edu/pks/lookup?op=get&search=0x540D6E3F
> <https://pgp.mit.edu/pks/lookup?op=get&search=0x124E1D0A31056ED8>
>
> Online Journalism Blog http://onlinejournalismblog.com
> Help Me Investigate http://helpmeinvestigate.com - Shortlisted for
> Multimedia Publisher of the Year, 2010; winner of Talk About Local
> investigation of the year 2010
>
> Organiser, Hacks and Hackers Birmingham
> http://meetupbirmingham.hackshackers.com/
> Behind The Numbers
> <http://www.birminghammail.co.uk/all-about/behind%20the%20numbers> - the
> Birmingham Mail datablog
>
> Visiting Professor, City University, London
> http://www.city.ac.uk/journalism/
> Associate Professor and course leader, MA Online Journalism, Birmingham
> City University http://bit.ly/maonlinejournalism
>
> http://twitter.com/paulbradshaw
> LinkedIn profile and recommendations at http://bit.ly/paulbrecommendations
>
> --
> You received this message because you are subscribed to the Google Groups "ScraperWiki" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scraperwiki...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages