How to bypass sites that blocks scraping

45 views
Skip to first unread message

BCC in China

unread,
Apr 21, 2025, 8:00:12 AMApr 21
to seleniu...@googlegroups.com

I'm trying to scrape a site to get product details, I have 2 functions 1) getURL() 2) parseData(), the getURL functions works fine, I can get product URL of 30 pages with 8 products per page, my trouble is with parseData() function, in this function I got a blank page after click on cookie consent page. I suspect that the site knows it's robot and not human, so it blocked/stopped sending date. How can I bypass this hurdle?

- getURL() this function walk through a product summary page with links to product detail, this function gathers the URL of each product and save to a list

- getData() this function uses each of the link gathered by previous getURL() function and get details of each product and saves to a dictionary.

ps: during troubleshooting, I can see both functions showed cookie consent form when starts, the difference is after clicking "Accept" in the getURL(), the product summary page shows up with data, but in getData() function, it just showed a blank page. Going to the site manually, the cookie consent form shows only once, in the product summary page, not in the product details page.

蕭市淂

unread,
Apr 28, 2025, 12:57:15 PMApr 28
to seleniu...@googlegroups.com
Hello, 

 Would it be possible for you to share the complete source code for your project? I'm interested in learning and exchanging ideas about web scraping techniques. 

 While I can't guarantee I'll solve your specific issue, seeing your implementation would help me understand your approach better and possibly suggest some workarounds. I'm also keen to learn from your experience with these anti-scraping mechanisms. 

 Thank you

BCC in China <bccin...@gmail.com> 於 2025年4月21日 週一 下午7:59寫道:
--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/selenium-users/63a79075-6ffa-465c-9241-c41fb84c30f2%40gmail.com.
Reply all
Reply to author
Forward
0 new messages