Using PySpider to scrape wiki

112 views

Skip to first unread message

Michael Cohen

unread,

Oct 23, 2020, 11:12:34 AM10/23/20

to pyspider-users

Hey folks,

I;m looking into using PySpider to scrape a wiki (not Wikipedia). I tried wget, but the problem seems to be that even though I set up the reject list (I just want current contents, I don't want to go recursively down into all previous states), it still seems to take forever. It seems to download files on the rejection list, before deleting them, so 5 days later it still wasn't finished on a modest wiki.

I usually use HTTrack for regular websites, but it can't handle a wiki.

Has anyone had experience using PySpider on wikis?

Thanks,

Michael

Shubham Biswas

unread,

Sep 23, 2022, 1:20:42 PM9/23/22

to pyspider-users

It may can work if done correctly, using MYSQL and DATA

Reply all

Reply to author

Forward

0 new messages