Using PySpider to scrape wiki

65 views
Skip to first unread message

Michael Cohen

unread,
Oct 23, 2020, 11:12:34 AM10/23/20
to pyspider-users
Hey folks,
I;m looking into using PySpider to scrape a wiki (not Wikipedia). I tried wget, but the problem seems to be that even though I set up the reject list (I just want current contents, I don't want to go recursively down into all previous states), it still seems to take forever. It seems to download files on the rejection list, before deleting them, so 5 days later it still wasn't finished on a modest wiki.

I usually use HTTrack for regular websites, but it can't handle a wiki.
Has anyone had experience using PySpider on wikis?

Thanks,
Michael

Shubham Biswas

unread,
Sep 23, 2022, 1:20:42 PM9/23/22
to pyspider-users
It may can work if done correctly, using MYSQL and DATA 
Reply all
Reply to author
Forward
0 new messages