Hey folks,
I;m looking into using PySpider to scrape a wiki (not Wikipedia). I tried wget, but the problem seems to be that even though I set up the reject list (I just want current contents, I don't want to go recursively down into all previous states), it still seems to take forever. It seems to download files on the rejection list, before deleting them, so 5 days later it still wasn't finished on a modest wiki.
I usually use HTTrack for regular websites, but it can't handle a wiki.
Has anyone had experience using PySpider on wikis?
Thanks,
Michael