Fwd: XML dumps repackaged in TSV

3 views
Skip to first unread message

Federico Leva (Nemo)

unread,
Feb 10, 2020, 1:01:26 PM2/10/20
to wikiteam...@googlegroups.com



-------- Messaggio inoltrato --------
Oggetto: [Analytics] Announcement - Mediawiki History Dumps
Data: Mon, 10 Feb 2020 17:27:51 +0100
Mittente: Joseph Allemandou

Hi Analytics People,

The Wikimedia Analytics Team is pleased to announce the release of the
most complete dataset we have to date to analyze content and
contributors metadata: Mediawiki History [1] [2].

Data is in TSV format, released monthly around the 3rd of the month
usually, and every new release contains the full history of metadata.

The dataset contains an enhanced [3] and historified [4] version of
user, page and revision metadata and serves as a base to Wiksitats API
on edits, users and pages [5] [6].

We hope you will have as much fun playing with the data as we have
building it, and we're eager to hear from you [7], whether for issues,
ideas or usage of the data.

Analytically yours,

--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation

[1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
[3] Many pre-computed fields are present in the dataset, from
edit-counts by user and page to reverts and reverted information, as
well as time between events.
[4] As accurate as possible historical usernames and page-titles (as
well as user-groups and blocks) is available in addition to current
values, and are provided in a denormalized way to every event of the
dataset.
[5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[6] https://wikimedia.org/api/rest_v1/
[7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
Reply all
Reply to author
Forward
0 new messages