New issue 141 by mike.lifeguard: Slow response times
http://code.google.com/p/perlwikipedia/issues/detail?id=141
On Wed Oct 06 17:49:21 2010, vonankh wrote:
> Not sure what the problem is, but doing the exact same request using
> "Perlwikipedia" Vs. "MediaWiki::Bot", takes only 1/3 the time as the
> "MW:Bot". I have tried many different thing, but cannot find the
> problem...
> This is especially evident in the:
> my @genosets = $bot->get_pages_in_category($cats);
> calls.
Comment #1 on issue 141 by mike.lifeguard: Slow response times
http://code.google.com/p/perlwikipedia/issues/detail?id=141
Sorry, are you saying the request is *faster* when using
the "Perlwikipedia" alias? I don't see how that's possible, since it loads
the exact same code, by a different name.
Please post your test code.
That's correct! It makes no sense whatsoever. But are you sure the settings
of the code is the same? Could it be that the site I'm trying to get the
pages from is choking/throttling "agents" from "MediaWiki::Bot" but not
from "Perlwikipedia" as that toolset is getting outdated?
Addendum: I'm using these:
1) Perlwikipedia-1.5.2.tar.gz
2) MediaWiki-Bot-3.2.4.tar.gz
So you *aren't* using the Perlwikipedia alias from MediaWiki::Bot. You're
using a (very) old distribution. Please upload the tarball, and I'll try to
find where the inefficiency lies in the newer code.
I believe I may be having the same problem. I recently had to upgrade from
MediaWiki-Bot 2.3.0 to 3.2.4 to solve a login issue after the snpedia.com
wiki did a server upgrade. After the upgrade, the get_pages_in_category
calls have become extremely slow, about 25 times slower in the worst case.
It seems to get exponentially worse as the number of pages returned
increases.
Here is my minimum sample code which demonstrates the problem:
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new();
$bot->set_wiki('www.snpedia.com','/');
my @rsnums = $bot->get_pages_in_category("Category:Is_a_snp",{ max => 0 });
This returns about 13,500 pages currently.
In 2.3.0, this code took about 54 seconds to complete. In 3.2.4 it takes
about 23 minutes to complete.
I went back and starting with 2.3.0 upgraded one at a time. 2.3.0, 2.3.1,
and 3.0.0 all take about 54 seconds. Starting with 3.1.5 it jumps up to
around 23 minutes.
I would like to suggest the priority be increased from medium to high, as a
25x reduction in performance is a pretty serious problem.
I found the "problem".
Up to 3.0.0, cmlimit => 500 was used for the call to MediaWiki::API's list
function. In 3.1.5 this was removed.
cmlimit specifies the number of items to request in each query. If not
specified, it uses 10 per request. In other words, 13,500 pages would
require 1,350 requests with the default setting or 27 requests with the
cmlimit set to 500. Suggest either restoring the old code, or make this an
option.
In the meantime, for those who want to restore the old behavior, in the
MediaWiki-Bot source, edit lib/MediaWiki/Bot.pm then find the
get_pages_in_category function and add the line "cmlimit => 500," to $hash,
e.g.:
my $hash = {
action => 'query',
list => 'categorymembers',
cmtitle => $category,
cmlimit => 500,
};
After making this change to 3.2.4, my sample code took 57 seconds, on par
with the previous behavior.
Comment #8 on issue 141 by mike.lifeguard: Slow response times
http://code.google.com/p/perlwikipedia/issues/detail?id=141
Could you please test by using automatic login and configuration? If your
account is a bot, highlimits should be set, so you should get 5000 results
per query.
But, yes, this should specifically set cmlimit for normal users.
Actually, 'max' can be used:
All list queries return a limited number of results. This limit is 10 by
default, and can be set as high as 500 for regular users, or 5000 for users
with the apihighlimits right (typically bots and sysops). Some modules
impose stricter limits under certain conditions. If you're not sure which
limit applies to you and just want as many results as possible, set the
limit to max. In that case, a <limits> element will be returned, specifying
the limits used.