wikkii.com and editthis.info

6 views
Skip to first unread message

Federico Leva (Nemo)

unread,
Nov 1, 2013, 7:47:17 PM11/1/13
to wikiteam...@googlegroups.com
Hello, I've started the download of these two farms. As emiirp said, one
has to go very slow with them, so I've started even if my machine(s) are
still doing something else.
* editthis.info has API enabled but requires a captcha if you go over 60
requests per hour;
* wikkii.com disables API and serves a text error if you go over 2
requests per second.
I remember that ac.wikkii.net took an extremely long time in 2011. These
two downloads will probably take a few months to complete, I'll run only
one worker per farm on a server and forget it.
I'm using the old lists, checkalive.py was throttled so it's not
reliable anyway:
https://code.google.com/p/wikiteam/source/browse/trunk/#trunk%2Flistsofwikis

Nemo

Federico Leva (Nemo)

unread,
Nov 6, 2013, 8:29:34 AM11/6/13
to wikiteam...@googlegroups.com
Federico Leva (Nemo), 02/11/2013 00:47:
> Hello, I've started the download of these two farms. As emiirp said, one
> has to go very slow with them, so I've started even if my machine(s) are
> still doing something else.
> * editthis.info has API enabled but requires a captcha if you go over 60
> requests per hour;
> * wikkii.com disables API and serves a text error if you go over 2
> requests per second.

After some bugfixing and experimenting in tehse days (see also
https://code.google.com/p/wikiteam/issues/list ), I'm currently rather
settled on the following crawl delays in seconds between each wiki and
each request: 10 and 3 for wikkii, 60 for editthis, 720 and 360 (!!) for
wiki-site.
Now I hope they'll just run without requiring me to babysit them.

Nemo

Hydriz Scholz

unread,
Nov 6, 2013, 8:31:39 AM11/6/13
to wikiteam...@googlegroups.com
Excellent job, and I am officially back in business. Just ended my last most important paper today (though I have two minor ones next week). Archive.org is down right now but once its back up I will have a look at Issue 43.


--
You received this message because you are subscribed to the Google Groups "wikiteam-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikiteam-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Regards,
Hydriz

Be social, follow/add me:
Facebook: http://tinyurl.com/hydrizfb
Twitter: @hydrizwiki

Federico Leva (Nemo)

unread,
Jan 24, 2014, 4:29:18 AM1/24/14
to wikiteam...@googlegroups.com
Over two months later I'm at about 130 EditThis and 120 wiki-site wikis
downloaded... However, I stopped downloading stupid local MediaWiki
messages and yesterday I made the delay option more thorough:
https://code.google.com/p/wikiteam/source/detail?r=902 The download rate
is now improving, while being nicer to the server.

Nemo

Emilio J. Rodríguez-Posada

unread,
Jan 24, 2014, 4:40:26 AM1/24/14
to wikiteam...@googlegroups.com
Yes... EditThis wikis has 2000+ Mediawiki:Message pages per site... A waste of time and resources. Excluding by namespace (--exnamespaces) is recommended.


2014/1/24 Federico Leva (Nemo) <nemo...@gmail.com>
Over two months later I'm at about 130 EditThis and 120 wiki-site wikis downloaded... However, I stopped downloading stupid local MediaWiki messages and yesterday I made the delay option more thorough: https://code.google.com/p/wikiteam/source/detail?r=902 The download rate is now improving, while being nicer to the server.


Nemo

Reply all
Reply to author
Forward
0 new messages