Access Denied on Tripadvisor

3,729 views
Skip to first unread message

Punit Singh

unread,
Sep 1, 2021, 2:44:56 PM9/1/21
to Web Scraping
Screen Shot 2021-09-02 at 12.06.08 AM.pngHi,

I'm trying to run and calculate the time required to extract reviews from Tripadvisor.com. Everything runs fine during the test runs, but it returns with "Access Denied" every time.

I have also tried www.tripadvisor.in but still the same result.

Thanks,

Andrew11

unread,
Sep 1, 2021, 6:10:40 PM9/1/21
to Web Scraping
I think you just need to replace http:// with https:// in the URL. Let me know if you need help with that.

Punit Singh

unread,
Sep 2, 2021, 6:00:07 AM9/2/21
to Web Scraping
Hi Andrew,

I'm using all the urls with "https://"

Andrew11

unread,
Sep 2, 2021, 8:45:03 AM9/2/21
to Web Scraping
But the access denied page says "http://"

Punit Singh

unread,
Sep 2, 2021, 9:27:17 AM9/2/21
to Web Scraping
Hi Andrew,

Screen Shot 2021-09-02 at 6.50.44 PM.png 

Screen Shot 2021-09-02 at 6.54.37 PM.png

I've checked again, and the URL that I've inserted starts with HTTPS instead of HTTP

Have checked the project setting too, but there are no options to forcing it to use HTTPS.

Please suggest what can be done in this situation.

Andrew11

unread,
Sep 2, 2021, 9:49:24 AM9/2/21
to Web Scraping
That's strange, when I make a project that has start URL https://www.tripadvisor.com/Hotel_Review-g42216-d257243-Reviews-Bavarian_Inn_Lodge-Frankenmuth_Michigan.html, it extracts no problem in live run... not sure what's going on in yours.

Punit Singh

unread,
Sep 2, 2021, 12:12:26 PM9/2/21
to Web Scraping
the url you mentioned is working fine somehow

Andrew11

unread,
Sep 2, 2021, 12:45:51 PM9/2/21
to Web Scraping

Punit Singh

unread,
Sep 8, 2021, 11:43:30 AM9/8/21
to Web Scraping
No, that's still not working somehow

Andrew11

unread,
Sep 8, 2021, 11:50:58 AM9/8/21
to Web Scraping
Sorry I didn't believe you! It's just as you said. If you have a paid plan, turning on IP rotation in the project settings seems to fix it though.

Punit Singh

unread,
Sep 8, 2021, 12:57:48 PM9/8/21
to Web Scraping
I wanted to check the speed and the reliability of the tool before moving to a paid plan. The tool I was using earlier (Octoparse) seems to not work efficiently with Tripadvisor anymore, resulting in only producing 30% of the results, while 70% of the data is not achievable due to page block issues. It affected our campaign pace entirely.

So, just wanted to be sure before moving to the paid plan.

Moreover, I would also like to know that even with IP rotation, can the issue exist in the future?

Andrew11

unread,
Sep 8, 2021, 1:18:18 PM9/8/21
to Web Scraping
I'm not sure where ParseHub gets their proxy servers, the pool of random IP addresses that TripAdvisor can block one by one, or how many there are. You should know that IP rotation is also very slow, there's no comparison with a normal run's speed.

sh...@parsehub.com

unread,
Sep 9, 2021, 12:34:43 PM9/9/21
to Web Scraping
Hi,

Tripadvisor can be scraped with ParseHub. It seems like it needs IP Rotation in most cases to avoid being blocked with the 'access denied' message. I ran a test project on the 9 hotel results found in this URL: https://www.tripadvisor.com/Hotels-g42216-Frankenmuth_Michigan-Hotels.html I was able to scrape all the data I targeted from the hotel pages and it seems to have taken 10 minutes. This is admittedly quite slow for such a small amount of data. Our IP Rotation engine takes a while to initiate the run and will then scrape at a reduced speed per page because the IP needs to be rotated per page loaded. 

If you have any further questions please feel free to contact us through our support platform using he...@parsehub.com or the live chat.
Cheers,
Shan

Andrew11

unread,
Sep 9, 2021, 12:49:53 PM9/9/21
to Web Scraping
Would it be any faster if he used his own custom proxies? There used to be something called Crawlera which is a la carte IP rotation, though I'm not sure if it's ParseHub compatible. Unfortunately outside of them, the industry seems a little shady to a newcomer like me.

sh...@parsehub.com

unread,
Sep 9, 2021, 2:56:27 PM9/9/21
to Web Scraping
Yes. Custom proxies are much faster when used in ParseHub. You just need to make sure that the proxy provider can supply you with proxies that support username and password authentication. They would need to be in this format as well: IPAddress:Port:Username:Password:Realm

We have a guide for how to format and implement custom proxies here.

Proxies can be tricky to work with because there is no guarantee that they will work when you are buying shared proxies. I tell our users to always make sure the provider has a gracious return period, or that they would at-least be willing to offer you some alternate proxies if the ones you buy don't work at all.

Andrew11

unread,
Sep 9, 2021, 3:24:12 PM9/9/21
to Web Scraping
Not that I think TripAdvisor uses this particular bot blocking software, but the first part of the article'll tell you all about why people need IP rotation:

Punit Singh

unread,
Sep 16, 2021, 11:00:24 AM9/16/21
to Web Scraping
Hi Shan,

Thanks for the update.

Can you confirm the approximate crawling speed I would get in the case of the Professional plan with IP rotation enabled? I think that would be enough for me to decide as to what suits me better. Also, please give some information on the Enterprise plan.

Punit Singh

unread,
Sep 16, 2021, 11:03:46 AM9/16/21
to Web Scraping
Could it be that Tripadvisor has advanced from that step? I've tried crawling as a Googlebot too, however, Tripadvisor still acts the same, which is baffling me because bots are generally never blocked by any website.

sh...@parsehub.com

unread,
Sep 16, 2021, 11:13:48 AM9/16/21
to Web Scraping

Hi,

A lot of these sites are using services to monitor web traffic and behaviour to try and stop bots from scraping their data. I'll test the site with a larger data set and contact you via email with my results.

Cheers,
Shan

Punit Singh

unread,
Sep 16, 2021, 11:17:20 AM9/16/21
to Web Scraping
Thanks a lot. That'd be great

Andrew11

unread,
Sep 16, 2021, 11:19:48 AM9/16/21
to Web Scraping
You must mean you changed the user agent string, because people outside Google can't make a Googlebot I think. Might be an awesome web crawler if they did.

Verifying Googlebot

Before you decide to block Googlebot, be aware that the user agent string used by Googlebot is often spoofed by other crawlers. It's important to verify that a problematic request actually comes from Google. The best way to verify that a request actually comes from Googlebot is to use a reverse DNS lookup on the source IP of the request.

https://developers.google.com/search/docs/advanced/crawling/googlebot

Reply all
Reply to author
Forward
0 new messages