WikiTeam at Internet Archive

9 views
Skip to first unread message

emijrp

unread,
Aug 3, 2011, 5:49:15 PM8/3/11
to wikiteam...@googlegroups.com
Dear all;

Today, I have discovered we have a section at Internet Archive.[1] Not sure who has created it, but thanks : ). How can we add new items?

We could create a new item with a batch of the first 100 wikis of WikiTeam. And, from 101st, upload them one to one to Internet Archive (see in another thread how Google Code rejected to increase our quota).

Congratulations,
emijrp

[1] http://www.archive.org/details/wikiteam

Alex Buie

unread,
Aug 4, 2011, 3:04:44 PM8/4/11
to wikiteam-discuss
Hey guys,

I went ahead and created the collection for us. (I'm interning at the
Archive right now, so I get access to the magic collection creating
tools)

In order to add more items to the collection, you need to contact me
at either ab...@archive.org or on IRC as underscor, undersco2 (or
3-9). Because of the way collections work (I need to give you special
headers to use for the S3 interface), please talk to me before
creating a new item or uploading content.

There's some other news about the archive, but I'll put that in a
separate post for clarity's sake.


Alex/underscor

emijrp

unread,
Aug 4, 2011, 4:06:26 PM8/4/11
to wikiteam...@googlegroups.com
Hi Alex;

Thanks for you did. I can't wait for the other thread details ; )

Regards,
emijrp

2011/8/4 Alex Buie <ab...@kwdservices.com>

Platonides

unread,
Aug 5, 2011, 6:33:44 PM8/5/11
to wikiteam...@googlegroups.com
Alex Buie wrote:
> Hey guys,
>
> I went ahead and created the collection for us. (I'm interning at the
> Archive right now, so I get access to the magic collection creating
> tools)
>
> In order to add more items to the collection, you need to contact me
> at either ab...@archive.org or on IRC as underscor, undersco2 (or
> 3-9). Because of the way collections work (I need to give you special
> headers to use for the S3 interface), please talk to me before
> creating a new item or uploading content.
>
> There's some other news about the archive, but I'll put that in a
> separate post for clarity's sake.
>
>
> Alex/underscor

Thanks Alex.
I'm glad to have you there. :)

Alex Buie

unread,
Aug 6, 2011, 4:56:34 PM8/6/11
to wikiteam-discuss
Well, I guess I should get off my lazy butt and actually write this.
Apologies for the delay, I'm on vacation and as such haven't had as
much computer time as I normally do.

I've been working with both Alexis (Collections Manager) and Jason
(Scott) at the Internet Archive, and I now have a machine I can use
for wiki backups. Pretty nice disk specs and network connection:

Filesystem Size Used Avail Use% Mounted on
/dev/vda1 95G 24G 66G 27% /
/dev/vdb1 3.4T 2.0T 1.5T 58% /1
/dev/vdc 16T 2.4T 14T 16% /2
/dev/vdd 11T 9.5T 1.2T 90% /3

2011-08-06 13:50:38 (60.5 MB/s) - `/dev/null' saved
[104857600/104857600] (100MB Test file)

The plan is that the archive would like to get 6 month XML dumps of as
many wikis we can find, and yearly image dumps. Each wiki will get an
item, with an identifier like "wiki-domainname.tld" and a title like
"Wiki - DomainName.tld", and backups will go into each wiki's
respective item. (For an example, see http://www.archive.org/details/wiki-archiveteam.org)

These items will become part of the WikiTeam collection, which allows
us to have a central repository of our work. When you do a wiki dump/
find more wikis, let me know and we can get it all squared away in the
IA.

If you have any questions, please feel free to ask!

Alex

PS:

abuie@teamarchive-0:/1/UNDERTHESTAIRS 737 π du -sh wikiteam
67G wikiteam


abuie@teamarchive-0:/1/UNDERTHESTAIRS/wikiteam 209 π ls
acawikiorg-20110804-wikidump
jaenpediawikandaes_wiki-20110803-wikidump wikiarchlinuxfr-20110804-
wikidump
adcivorg-20110804-wikidump
jaopensuseorg-20110805-wikidump
wikiarchlinuxorg-20110804-wikidump
almeriapediawikandaes_wiki-20110803-wikidump knowinoorg_w-20110804-
wikidump wikifreecultureorg-20110804-wikidump
archiveteamorg-20110803-wikidump
loprometidoesdeudacom-20110804-wikidump
wikifremaecssotonacuk-20110804-wikidump
archiveteamorg-20110803-wikidump.XMLONLY.7z
lunarpediaorg-20110803-wikidump
wikigreasespotnet-20110804-wikidump
ateneodecordobaes-20110804-wikidump madripediaes-20110804-
wikidump wikiindexorg-20110803-wikidump
atwikiassistivetechnet-20110804-wikidump
malagapediawikandaes_wiki-20110803-wikidump wikilessigorg-20110804-
wikidump
botwikisnocc_w-20110804-wikidump marspediaorg-20110803-
wikidump wikinolesvotesorg_w-20110804-wikidump
broken
mastershaperorg-20110804-wikidump
wikionlinegamesnetnet_wiki-20110803-wikidump
bulbapediabulbagardennet_w-20110804-wikidump
memoryarchiveorg_en-20110804-wikidump
wikiopenstreetmaporg_w-20110804-wikidump
cadizpediawikandaes_wiki-20110803-wikidump
nlgentoo_wikicom-20110803-wikidump wikiosgeoorg-20110803-
wikidump
cordobapediawikandaes_wiki-20110803-wikidump
nlopensuseorg-20110805-wikidump
wikisalamancaorg-20110804-wikidump
csgentoo_wikicom-20110803-wikidump
plgentoo_wikicom-20110803-wikidump
wikisucaes_iberogre-20110804-wikidump
csopensuseorg-20110804-wikidump
plopensuseorg-20110804-wikidump
wikisucaes_wikice-20110804-wikidump
ctpediaes_w-20110804-wikidump
plopensuseorg-20110805-wikidump
wikisucaes_wikiformacion-20110804-wikidump
degentoo_wikicom-20110803-wikidump
portlandwikiorg-20110804-wikidump
wikisucaes_wikihaskell-20110804-wikidump
deopensuseorg-20110804-wikidump
projectbrahmaorg-20110804-wikidump
wikisucaes_wikiii-20110804-wikidump
dumpgenerator.py
ptopensuseorg-20110805-wikidump
wikisucaes_wikijuegos_w-20110804-wikidump
elopensuseorg-20110805-wikidump
publicdomainideasorg-20110804-wikidump
wikisucaes_wikira-20110804-wikidump
enbitcoinit_w-20110804-wikidump
recentchangescamporg-20110804-wikidump
wikisucaes_wikiunix-20110804-wikidump
enciclopediadelinarescom_mx-20110804-wikidump
red_sosteniblenet-20110804-wikidump
wikitravelorg_ca_p__gina_principal-20110803-wikidump
enciclopediadelinarescom_wiki-20110804-wikidump
rezeptewikiorg-20110804-wikidump
wikitravelorg_de_hauptseite-20110803-wikidump
engentoo_wikicom-20110803-wikidump
rosettacodeorg_wiki-20110803-wikidump
wikitravelorg_en_amtrak-20110803-wikidump
enopensuseorg-20110804-wikidump
rugentoo_wikicom-20110803-wikidump
wikitravelorg_eo___efpa__o-20110803-wikidump
enwadaes-20110804-wikidump
ruopensuseorg-20110805-wikidump
wikitravelorg_es_portada-20110803-wikidump
esgentoo_wikicom-20110803-wikidump
scientifictionorg-20110803-wikidump
wikitravelorg_fi_etusivu-20110803-wikidump
esopensuseorg-20110805-wikidump
setiquestorg_wiki-20110804-wikidump
wikitravelorg_fr_accueil-20110803-wikidump
esotericvoxelperfectnet_w-20110804-wikidump
sevillapediawikandaes_wiki-20110803-wikidump
wikitravelorg_he__________________-20110803-wikidump
estanatopedianet-20110804-wikidump
shoutwiki
wikitravelorg_hi_________________________-20110803-wikidump
figentoo_wikicom-20110803-wikidump
shoutwikicom_w-20110804-wikidump
wikitravelorg_hu_kezd__lap-20110803-wikidump
fiopensuseorg-20110805-wikidump solarpediaes-20110804-
wikidump wikitravelorg_it_pagina_principale-20110803-
wikidump
forensicswikiorg_w-20110804-wikidump
svopensuseorg-20110805-wikidump
wikitravelorg_ko_______-20110803-wikidump
fossilwikiorg-20110804-wikidump
trentowikiit_w-20110804-wikidump
wikitravelorg_nl_hoofdpagina-20110803-wikidump
frgentoo_wikicom-20110803-wikidump
trgentoo_wikicom-20110803-wikidump
wikitravelorg_pl_strona_g____wna-20110803-wikidump
fropensuseorg-20110805-wikidump
tropensuseorg-20110805-wikidump
wikitravelorg_pt_p__gina_principal-20110803-wikidump
gpwikiorg-20110804-wikidump
ubuntuwikinet-20110804-wikidump
wikitravelorg_ro_portal-20110803-wikidump
granadapediawikandaes_wiki-20110803-wikidump
universaleditbuttonorg-20110803-wikidump
wikitravelorg_sv_huvudsida-20110803-wikidump
huelvapediawikandaes_wiki-20110803-wikidump
viopensuseorg-20110805-wikidump
wikitravelorg_zh_______-20110803-wikidump
huopensuseorg-20110805-wikidump
wikandaes_wiki-20110803-wikidump zhopensuseorg-20110805-
wikidump
isopensuseorg-20110805-wikidump
wikiangelacom_w-20110804-wikidump
zh_twopensuseorg-20110805-wikidump
itopensuseorg-20110805-wikidump
wikiarchlinuxde-20110804-wikidump

Not bad so far, eh?

emijrp

unread,
Aug 6, 2011, 5:12:28 PM8/6/11
to wikiteam...@googlegroups.com
Great news Alex! Very glad to hear that.

A small request, can you please pack files inside .7z without including a subdirectory?

2011/8/6 Alex Buie <ab...@kwdservices.com>

Alex Buie

unread,
Aug 6, 2011, 7:29:06 PM8/6/11
to wikiteam...@googlegroups.com
The archive requested that they be packed that way. You can flatten it
by using "7z e" instead of "7z x"

Alex
--
Alex Buie
Network Coordinator / Server Engineer
KWD Services, Inc
Media and Hosting Solutions
+1(703)445-3391
+1(480)253-9640
+1(703)919-8090
ab...@kwdservices.com

Federico Leva (Nemo)

unread,
Aug 21, 2011, 6:15:58 AM8/21/11
to wikiteam...@googlegroups.com
Alex Buie, 06/08/2011 22:56:

> The plan is that the archive would like to get 6 month XML dumps of as
> many wikis we can find, and yearly image dumps. Each wiki will get an
> item, with an identifier like "wiki-domainname.tld" and a title like
> "Wiki - DomainName.tld", and backups will go into each wiki's
> respective item. (For an example, see http://www.archive.org/details/wiki-archiveteam.org)
>
> These items will become part of the WikiTeam collection, which allows
> us to have a central repository of our work. When you do a wiki dump/
> find more wikis, let me know and we can get it all squared away in the
> IA.

I suppose we should update the tutorial on Google Code to reflect this.
I have a question: why the prefix wiki- in the identifier? Seems mostly
unnecessary, especially if the wiki is the only thing on the domain
and/or the word "wiki" is already included in the domain name itself.

Minor things aside, I was wondering: how are you planning to make this
task as automated as possible in the future? There are dozens of
thousands of wikis around (even excluding the big farms)... Perhaps we
should at least include the api.php and index.php URL in the metadata so
that in the future it can be easily grabbed from there.

Nemo

Reply all
Reply to author
Forward
0 new messages