Different results in Test run and real run

143 views
Skip to first unread message

Thomas Kaufmann

unread,
Apr 19, 2022, 9:06:41 AM4/19/22
to Web Scraping
Hi everyone,

I'm new to ParseHub and have created my first two projects according to blogs from the website (e.g. https://www.parsehub.com/blog/scrape-amazon-product-data/).
Unfortunately I don't get the same result although I follow every step one to one. And even more surprisingly I can see the result in a correct way (including all results from all templates) when I do the test run step by step, but as soon as I start the real run the data output is completely different. When I run it I only get the results from the main template and no results from the sub templates.
How can this be that "test run" and "run" give different results?
I have added some screenshots for you to see the issue. You can see what I added in the main and in the sub templates. I have added the results of the test run and the run as well for you to see the difference. The run ends in "number of ratings" and doesn't go to the sub template to grab the details of the product. In the test run it does.

I hope you can help me.

Kind regards,
Thomas
Test run.png
Details template.png
Run.png
Main template.png

sh...@parsehub.com

unread,
Apr 20, 2022, 3:43:08 PM4/20/22
to Web Scraping
Hi,

I see two problems with the project. The first, is that your product selection on the first template does not have a 'begin new entry' command attached to it. This means each result scraped will scrape its data into the same fields and overwrite the previously extracted data. You will need to add a new entry command to the product selection and then nest the remaining extraction and click commands inside it for ParseHub to scrape the data for each selected item into a new row/list.

If you check the server snapshots from your recent run (little camera icons on selections or in the run results page) you can see that Amazon has blocked your scrape when trying to click through to the product page. See attached image.

Usually our IP Rotation feature from our paid plans can bypass blocks like this. Lately Amazon has began blocking us repeatedly even when using this feature. The only remaining option to try and scrape Amazon at scale is to use custom proxy IP addresses from third party providers imported into your ParseHub project.

If you have any further questions or issues please feel free to reach out through ParseHub support using the contact form on our site or the chat widget. 

Cheers,
Shan


2022-04-20_15-40-50.png
Reply all
Reply to author
Forward
0 new messages