Playing with Wikimedia dumps at IA

19 views
Skip to first unread message

Luiz Augusto

unread,
Nov 12, 2012, 10:09:26 AM11/12/12
to wikiteam...@googlegroups.com
I'm playing around on a VPS and with a Internet Archive user account in an attempt of personal project to create external copies and mirror instructions for usage of Wikisource content. My intention is to build a resource kit periodically updated with data downloads and usage instructions.

I known that Wikimedia and IA started conversations and tests to directly store dumps immediately it is generated, but since it still doesn't seems to be usable to eventual interested people, copies on Digital Preservation don't means waste of resources but alternatives of preservation and I need to get some technical experience while I get fun... =)

I've started downloading and uploading a copy of arwikisource-20120902-local-media-1.tar and arwikisource-20120902-remote-media-1.tar, downloaded from your.org "mirror" (since the tarballs are avaiable at this time only at that server, this isn't exactly a mirror...), checked integrity and generated md5sums (files with a total size 69+GB without this from their source...). Now I'm in need to submit data to make it public avaiable and just found that the only collection I can submit those files is the "opensource". So, my questions are:

* Who is the people or team I should contact to be able to add this (and future objects) into the "wikiteam" collection?

* Is possible to create a new subcollection into the "wikiteam" collection? If yes, I should request to create one named "Wikisource mirroring project" (or any similarly named one) to who?

My account on IA for this purspose is arch...@lugusto.org and the future public URL for stuff already uploaded will be <http://archive.org/details/arwikisource-media-20120902>.

Best,
[[:m:User:555]]

Federico Leva (Nemo)

unread,
Nov 12, 2012, 11:35:33 AM11/12/12
to wikiteam...@googlegroups.com, Luiz Augusto
Luiz Augusto, 12/11/2012 16:09:
> * Who is the people or team I should contact to be able to add this (and
> future objects) into the "wikiteam" collection?

You need to become an admin of the collection to add directly to it.
It's not really needed if you don't have to create thousands of items,
you can add wikiteam keyword and Jason (SketchCow on IRC) or others will
move them

>
> * Is possible to create a new subcollection into the "wikiteam"
> collection? If yes, I should request to create one named "Wikisource
> mirroring project" (or any similarly named one) to who?

It's possible. You can ask info *AT* archive.org, they're usually very
responsive.
In the meanwhile, however, just upload to "opensource", it's not a problem.

Nemo

Luiz Augusto

unread,
Nov 14, 2012, 10:34:21 PM11/14/12
to wikiteam...@googlegroups.com
Many thanks to all public and private replies!

I paused the tarball uploads until I learn how to properly upload some very big files without the FTP protocol usage. s3/api looks to be a good choice (thanks for pointing it to me!), but the Torrent section of IA FAQ page [1] says "Starting in 2011, the Internet Archive began automatically retrieving BitTorrent files uploaded into most Community collections".

So, new questions,

* How to upload those Torrents to IA? Should I create a new item and upload the .torrent file to it? (I'm a n00b at IA, not on torrenting, so please reply in this scenary xD )

* My very-big already finished upload (70GB+) [2] don't have a torrent option. Is it due to the total file size, because any mistakes made by me or it is only lost in the queue? (My interest on this is to download from the IA server to my personal machine, since HTTP downloads from IA are unfortunatelly very slow and I've very limited bandwidth rates on my VPS)

In the meantime, I've finished to dump two wikis and uploaded the dumps to IA [3], [4]. Those two are waiting to get moved into Wikiteam collection =)

[1] - http://archive.org/about/faqs.php#312
[2] - http://archive.org/details/arwikisource-media-20120902
[3] - http://archive.org/details/wiki-biblioteca_wikimedia_it
[4] - http://archive.org/details/wiki-wikilivres_ca

Hydriz Wikipedia

unread,
Nov 14, 2012, 10:53:02 PM11/14/12
to wikiteam...@googlegroups.com
On Thu, Nov 15, 2012 at 11:34 AM, Luiz Augusto <lug...@gmail.com> wrote:
Many thanks to all public and private replies!

I paused the tarball uploads until I learn how to properly upload some very big files without the FTP protocol usage. s3/api looks to be a good choice (thanks for pointing it to me!), but the Torrent section of IA FAQ page [1] says "Starting in 2011, the Internet Archive began automatically retrieving BitTorrent files uploaded into most Community collections".

So, new questions,

* How to upload those Torrents to IA? Should I create a new item and upload the .torrent file to it? (I'm a n00b at IA, not on torrenting, so please reply in this scenary xD )

You can create a torrent using burnbit.com, but I can't guarantee that this feature works, as it failed for the last time I tried (although I must admit the torrent was >100GB, so it might work for smaller sizes). You can create a new item and upload the .torrent file to it, or use the "Edit item" link to upload the torrent to existing items.
 

* My very-big already finished upload (70GB+) [2] don't have a torrent option. Is it due to the total file size, because any mistakes made by me or it is only lost in the queue? (My interest on this is to download from the IA server to my personal machine, since HTTP downloads from IA are unfortunatelly very slow and I've very limited bandwidth rates on my VPS)

There is a limit in the item size before torrents can be created. The current limit is at 25GB, so 70+GB is too much. :)
 

In the meantime, I've finished to dump two wikis and uploaded the dumps to IA [3], [4]. Those two are waiting to get moved into Wikiteam collection =)

Thanks! Someone would move it in once in a while, so yours would most probably be next!
 

[1] - http://archive.org/about/faqs.php#312
[2] - http://archive.org/details/arwikisource-media-20120902
[3] - http://archive.org/details/wiki-biblioteca_wikimedia_it
[4] - http://archive.org/details/wiki-wikilivres_ca


On Mon, Nov 12, 2012 at 2:35 PM, Federico Leva (Nemo) <nemo...@gmail.com> wrote:
Luiz Augusto, 12/11/2012 16:09:

* Who is the people or team I should contact to be able to add this (and
future objects) into the "wikiteam" collection?

You need to become an admin of the collection to add directly to it. It's not really needed if you don't have to create thousands of items, you can add wikiteam keyword and Jason (SketchCow on IRC) or others will move them



* Is possible to create a new subcollection into the "wikiteam"
collection? If yes, I should request to create one named "Wikisource
mirroring project" (or any similarly named one) to who?

It's possible. You can ask info *AT* archive.org, they're usually very responsive.
In the meanwhile, however, just upload to "opensource", it's not a problem.

Nemo




--
Regards,
Hydriz

We've created the greatest collection of shared knowledge in history. Help protect Wikipedia. Donate now: http://donate.wikimedia.org
Reply all
Reply to author
Forward
0 new messages