Contributing wiki dumps on an ongoing basis?

30 views
Skip to first unread message

Scarred Sun

unread,
Aug 16, 2019, 5:10:21 PM8/16/19
to wikiteam-discuss
Hi there, 

I run a few wikis that have been dumped in the past by Wikiteam and would like to help provide more up-to-date backups on an ongoing basis (monthy? quarterly? whenever?)--since I have shell access, it seems that would make extraction and delivery a lot easier. That said:

1. How would I go about contributing an updated file to an already existing wiki dump? The instructions on the Github tutorial aren't clear in this case.
2. The size of my properties has gotten... exponentially bigger since they've last been dumped. https://archive.org/details/wiki-segaretroorg was 30GB when it was backed up. Today, it's at 470GB with 2GB in XML alone. Realistically, how would I even deliver that server-to-server?

Federico Leva (Nemo)

unread,
Apr 18, 2020, 8:50:41 AM4/18/20
to wikiteam-discuss
Hello, thanks for asking and for pinging on IRC. I didn't have your message in my mailbox for some reason.

Indeed making a backup with shell access is much easier and is recommended. Unless data from your wikis is frequently consumed and needs to be very fresh, a quarterly update is probably sufficient. You can be more generous with XML updates and less frequent with image updates.

For a regular upload of many GB, I would suggest that you create your own item in the "opensource" collection (which doesn't need special permissions), for instance by adapting uploader.py to add a timestamp like "_20200418" to the identifier. We try not to create items over 400 GiB if possible, but it doesn't harm to try once or twice for your wiki.

As for the upload methods, it's not necessarily very difficult to upload files of this size with the normal Python client of the IA API, if your server has a low latency from archive.org, but it can be difficult for users to consume such a big file. It's probably easier to split it in smaller files, for instance one for each root directory in your MediaWiki upload dir. To reduce the risk of upload errors you can use the torrent upload method:

If that's too complicated, don't worry too much and just try to upload the 400 GiB file to a new item, see how it goes. I've done it before.

Federico
Reply all
Reply to author
Forward
0 new messages