TL;DR: Please help run and patch the current version of dumpgenerator on
Wikia wikis or other wikis which fail.
https://github.com/WikiTeam/wikiteam/blob/7c545d05b7effc240c8f20885dbcd7bad5632c94/dumpgenerator.py
----
This month I've updated the dumps of some 7k MediaWiki wikis on IA.
Out of the 23073 wikis archived on IA, 9651 were found to be alive
according to checkalive.py. I've tried to archive the non-farm wikis.
Many of those wikis fail exporting nevertheless, for the usual
neverending loops or for other reasons. To improve success, I've added
an --xmlrevisions option which only uses the API and I've committed my
usual skipping hacks as a --failfast option.
I've called this version 0.4 because changes can be rather radical,
although everything should still be the same if you don't use the option.
https://github.com/WikiTeam/wikiteam/issues/311
https://github.com/WikiTeam/wikiteam/commits/7c545d05b7effc240c8f20885dbcd7bad5632c94/dumpgenerator.py
A list from not-archived.py now returns 4113 wikis. I'd also like to
dump ~240k Wikia wikis, but --xmlrevisions still fails on them and the
list could be improved:
https://github.com/WikiTeam/wikiteam/pull/310
On OVH, which I've used this time, bandwidth and CPU are cheap so my
bottleneck was mostly the disk (lesson learnt: better spend a few dozen
euro on a bigger disk than spend hours fighting with disk limits). I
used a little patch to launcher.py and uploader.py so that the 7z files
would be written to a separate partition. Can I commit such an option to
the main repository or would it become too messy?
Federico