Trouble in using get_history method

3 views
Skip to first unread message

Marjorie Seizou

unread,
Apr 14, 2011, 4:34:38 PM4/14/11
to perlw...@googlegroups.com
Hello,

I'm a postgraduate student in Computational Linguistics at the University of Paris Ouest - La Défense, and I have a project whose goal is to analyze the "edit war" on french Wikipedia articles.
For this purpose, I have to develop an application (in Perl or PHP) which retrieves history pages, articles content and informations about users (username or IP, memberships, etc.), and handles these informations in order to distinguish between minor and major contributions and create some statistical representations.

I made some simple tries with MediaWiki::Bot library and get_history method, but it didn't work well : the array only contains one element (a number, 111). That's why I need some help :)
I checked the documentation and some scripts, but I didn't find any answer.

Here is my code :

use utf8;
use MediaWiki::Bot;

my $host = 'fr.wikipedia.org';
my $path = 'w/api.php';
my $pagename = 'Svadilfari';

my $bot = MediaWiki::Bot->new({
    assert      => 'bot',
    protocol    => 'http',
    host        => $host,
    path        => $path,
    login_data  => { username => $username, password => $password  },
}) || die $bot->{'error'}->{'code'};

$bot->set_wiki($host, $path) || die $bot->{'error'}->{'code'};

my @hist = $bot->get_history($pagename) || die $bot->{'error'}->{'code'};

foreach my $hist (@hist)
{
   print "$hist\n";
}


I'm a newbie in using API and object-oriented programming, so I guess I made mistakes but I can't figure it out. What sould I do ? Is it because I only have a "basic" user account ?

By the way, why don't we have access to the entire history (limited to 500) ? How can I avoid this limitation ?

I thank you a lot in advance for your help.

Best regards,
Marjorie Seizou





Mike.lifeguard

unread,
Apr 14, 2011, 5:22:34 PM4/14/11
to perlw...@googlegroups.com
On 11-04-14 05:34 PM, Marjorie Seizou wrote:
> I made some simple tries with MediaWiki::Bot library and get_history
> method, but it didn't work well : the array only contains one element (a
> number, 111).

I think that you have evaluated the array in scalar context. In scalar
context, arrays evaluate to their size. So, 111 means that it has 111
elements in it.

I've fixed up the code sample you provided - see
https://gist.github.com/920575.

You should always use strict and use warnings - these help you catch
basic errors (for example, you can't use $bot on the right side of the
'or' condition on line 13 when trying to create the bot object. If
creating it fails, there will be no $bot to refer to there. You can find
more of the usual recommendations at http://hashbang.ca/perl/the-usual

Also, you need to know what kind of data structure you're accessing.
Each element of @hist is a reference to a hash. You need to unpack it to
get your data - I've shown you one way to do that on line 19. The
documentation for get_history is not very good, I will try to improve it
before the next release, which will be in about 1 week's time.

Finally, I removed some things you don't need to call - for example, you
don't need to set_wiki() if you already gave that when creating the bot
object. For WMF wikis, it is sufficient to provide the host. That will
come as you use MediaWiki::Bot more. It has a somewhat 'organic' API :)

> By the way, why don't we have access to the entire history (limited to
> 500) ? How can I avoid this limitation ?

Do you really need to use the live site? You might find that downloading
a database dump works better for your methodology. If that won't do, you
can request a bot account, which will allow you to access up to 5000.
And, even in batches of 500, you can get the whole page history.

-Mike

Reply all
Reply to author
Forward
0 new messages