List of wikibackups and help request

16 views
Skip to first unread message

emijrp

unread,
Dec 16, 2011, 1:11:46 PM12/16/11
to wikiteam...@googlegroups.com
Dear all;

I'm working in a well known wiki of wikis, WikiIndex.[1] Using its semantic properties capabilities, I have generated a dynamic list with all the wikis included in WikiIndex and the backup status for all them.[2]

I request your help for downloading all those wikis, and complete the table. We have to exclude Wikimedia (Wikipedias, Wiktionaries, etc...) and Wikia wikis, which offer public backups, and of course all non-MediaWiki wikis (until we develop new backup tools).

So, for example, you choose "Hortipedia" in that list, generate the backup with dumpgenerator.py, upload it into Internet Archive and/or Google Code (send me an e-mail to add it to Download section), and the last step is to add "| backupurl = http://..." to the infobox template in the article in WikiIndex (in this case http://wikiindex.org/Hortipedia). The table will be refreshed automagically.

Do you want to help?

Regards,
emijrp

[1] http://wikiindex.org
[2] http://wikiindex.org/index.php?title=List_of_wikibackups&action=purge

emijrp

unread,
Dec 16, 2011, 1:16:03 PM12/16/11
to wikiteam...@googlegroups.com
For example, I'm going to download La Quadrature du Net, a wiki about activism http://www.laquadrature.net/wiki/Main_Page

2011/12/16 emijrp <emi...@gmail.com>

emijrp

unread,
Dec 16, 2011, 1:28:38 PM12/16/11
to wikiteam-discuss
Newcomers, two useful links for you:

* dumpgenerator.py tool: https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py
* a new tutorial: https://code.google.com/p/wikiteam/wiki/NewTutorial

On 16 dic, 19:11, emijrp <emi...@gmail.com> wrote:
> Dear all;
>
> I'm working in a well known wiki of wikis, WikiIndex.[1] Using its semantic
> properties capabilities, I have generated a dynamic list with all the wikis
> included in WikiIndex and the backup status for all them.[2]
>
> I request your help for downloading all those wikis, and complete the
> table. We have to exclude Wikimedia (Wikipedias, Wiktionaries, etc...) and
> Wikia wikis, which offer public backups, and of course all non-MediaWiki
> wikis (until we develop new backup tools).
>
> So, for example, you choose "Hortipedia" in that list, generate the backup
> with dumpgenerator.py, upload it into Internet Archive and/or Google Code
> (send me an e-mail to add it to Download section), and the last step is to
> add "| backupurl = http://..." to the infobox template in the article in

> WikiIndex (in this casehttp://wikiindex.org/Hortipedia). The table will be

Federico Leva (Nemo)

unread,
Dec 16, 2011, 1:31:09 PM12/16/11
to wikiteam...@googlegroups.com
emijrp, 16/12/2011 19:11:

> I'm working in a well known wiki of wikis, WikiIndex.[1] Using its
> semantic properties capabilities, I have generated a dynamic list with
> all the wikis included in WikiIndex and the backup status for all them.[2]
>
> I request your help for downloading all those wikis, and complete the
> table. We have to exclude Wikimedia (Wikipedias, Wiktionaries, etc...)
> and Wikia wikis, which offer public backups, and of course all
> non-MediaWiki wikis (until we develop new backup tools).

How many are they? Why are you using this list and not
<http://s23.org/wikistats/mediawikis_html.php?sort=good_desc,total_desc&th=0&lines=1348>
which has API URL etc.?

Nemo

emijrp

unread,
Dec 16, 2011, 1:37:15 PM12/16/11
to wikiteam-discuss
Hi Nemo;

That list only includes about ~1300 wikis (truncated query?).

If you prefer to work using that list, OK, but it neither can be
edited to mark wikis as backed up, nor add new wikis in an easy way.
WikiIndex is a wiki, and you can create articles for any wiki you want
(and it will be listed).

Again, use whatever you prefer, but we need a method to avoid dupe
work.

Regards,
emijrp

> <http://s23.org/wikistats/mediawikis_html.php?sort=good_desc,total_des...>

Federico Leva (Nemo)

unread,
Dec 16, 2011, 1:42:44 PM12/16/11
to wikiteam...@googlegroups.com
emijrp, 16/12/2011 19:37:

> That list only includes about ~1300 wikis (truncated query?).

How many wikis to be archived does wikiindex include?
s23.org has over 60.000 wikis in total, way more than wikiindex.

>
> If you prefer to work using that list, OK, but it neither can be
> edited to mark wikis as backed up, nor add new wikis in an easy way.
> WikiIndex is a wiki, and you can create articles for any wiki you want
> (and it will be listed).
>
> Again, use whatever you prefer, but we need a method to avoid dupe
> work.

On wikiindex I don't see any method to flag a wiki as being archived,
either. And it's very easy to see whether a ikis has already been
archived: you only have to look for it on archive.org (given that this
is what we're using).

Nemo

emijrp

unread,
Dec 16, 2011, 1:51:43 PM12/16/11
to wikiteam...@googlegroups.com


2011/12/16 Federico Leva (Nemo) <nemo...@gmail.com>

emijrp, 16/12/2011 19:37:

That list only includes about ~1300 wikis (truncated query?).

How many wikis to be archived does wikiindex include?
s23.org has over 60.000 wikis in total, way more than wikiindex.


If you can extract all 60,000 wikis urls, cool, but as I said, the list only shows 1300 wikis. It is truncated. WikiIndex has about 5000 wikis, it is a good start.
 


If you prefer to work using that list, OK, but it neither can be
edited to mark wikis as backed up, nor add new wikis in an easy way.
WikiIndex is a wiki, and you can create articles for any wiki you want
(and it will be listed).

Again, use whatever you prefer, but we need a method to avoid dupe
work.

On wikiindex I don't see any method to flag a wiki as being archived, either.

The backupurl parameter in the infoboxes, look the Backup url column in the list http://wikiindex.org/index.php?title=List_of_wikibackups&action=purge
 
And it's very easy to see whether a ikis has already been archived: you only have to look for it on archive.org (given that this is what we're using).

That is not an easy way. Just looking at this list allows to know which wiki backups are missing http://wikiindex.org/index.php?title=List_of_wikibackups&action=purge
 

Nemo

Federico Leva (Nemo)

unread,
Dec 16, 2011, 1:58:16 PM12/16/11
to wikiteam...@googlegroups.com
emijrp, 16/12/2011 19:51:

> How many wikis to be archived does wikiindex include?
> s23.org <http://s23.org> has over 60.000 wikis in total, way more

> than wikiindex.
>
>
> If you can extract all 60,000 wikis urls, cool, but as I said, the list
> only shows 1300 wikis. It is truncated. WikiIndex has about 5000 wikis,
> it is a good start.

It's not truncated, that's only the list of private mediawikis.
They're not filtered on that list, how many are they on wikindex? I see
4000 on http://wikiindex.org/Category:MediaWiki , and the subcategories
for wiki farms contain about 1700, so this leaves 2300; are those
correctly tagged or could they belong to some wiki farm?

Nemo

Hydriz Wikipedia

unread,
Dec 20, 2011, 9:46:32 AM12/20/11
to wikiteam...@googlegroups.com
Just to change the information that emijrp provides, please don't upload it to the Internet Archive directly, it would bring great trouble for us (or actually me) to pull the identifier with the wikiteam collection. We are currently having some trouble with the earlier identifiers created.

Just either email emijrp directly, and if possible, email me so that I can actually add it into the wikiteam collection. My email is ad...@alphacorp.tk, and I can accept dumps at least up to 10GB :P

Emailing emijrp would allow him to upload it to Google Code, which will also be updated frequently on our network of mirrors.

emijrp

unread,
Dec 20, 2011, 11:39:46 AM12/20/11
to wikiteam...@googlegroups.com
2011/12/20 Hydriz Wikipedia <ad...@alphacorp.tk>

Just to change the information that emijrp provides, please don't upload it to the Internet Archive directly, it would bring great trouble for us (or actually me) to pull the identifier with the wikiteam collection. We are currently having some trouble with the earlier identifiers created.

Which problems? People use "wikiteam" tag or "wikidump.7z" suffix, so, search is easy.
 
 
Just either email emijrp directly, and if possible, email me so that I can actually add it into the wikiteam collection. My email is ad...@alphacorp.tk, and I can accept dumps at least up to 10GB :P

I only want people contact me (sending me a link) after they have uploaded the dump to wherever they want. Sending me an e-mail is a desired action but not mandatory. I don't want to be a bottleneck, I can't manage all dumps.
 

Hydriz Wikipedia

unread,
Dec 20, 2011, 10:22:49 PM12/20/11
to wikiteam...@googlegroups.com
I understand that tagging the identifier with "wikiteam" would allow searching much easier, but we need to place the respective identifiers under the collection:wikiteam so that we can have a common appearance throughout all the identifiers that we have (currently using the interface called "web"). It allows the listing of the dump files directly on the details page itself and save the hassle of clicking the HTTP link to all the files.

The Internet Archive is just stubborn. Identifiers are world-readable, uploader-editable, but to connect identifiers to our collection, its just trouble (unless I can actually apply to be an interning admin of the Archive, then it is very possible).

Alex Buie

unread,
Dec 20, 2011, 10:25:19 PM12/20/11
to wikiteam...@googlegroups.com
You can always shoot me an email with a list of identifiers that need
to be moved as well.
--
Alex Buie
Network Coordinator / Server Engineer
KWD Services, Inc
Media and Hosting Solutions
+1(703)445-3391
+1(480)253-9640
+1(703)919-8090
ab...@kwdservices.com
ज़रा

Hydriz Wikipedia

unread,
Dec 20, 2011, 10:30:19 PM12/20/11
to wikiteam...@googlegroups.com
Eh, yes. Federico and I tried emailing you, but to no avail...

Alex Buie

unread,
Dec 20, 2011, 10:51:06 PM12/20/11
to wikiteam...@googlegroups.com
Hmm, let me try and find them then. Sorry!

--
Alex Buie
Network Coordinator / Server Engineer
KWD Services, Inc
Media and Hosting Solutions
+1(703)445-3391
+1(480)253-9640
+1(703)919-8090
ab...@kwdservices.com
ज़रा

Hydriz Wikipedia

unread,
Dec 20, 2011, 11:19:42 PM12/20/11
to wikiteam...@googlegroups.com
In case you can't find the email (I sent on 29th November), here are the identifiers that I would need you to help add to the wikiteam collection (in total 30):


Thanks :)
Reply all
Reply to author
Forward
0 new messages