Wikimedia datasets collection on the Internet Archive has surpassed 1 million items

5 views
Skip to first unread message

Hydriz Scholz

unread,
Nov 14, 2016, 6:03:02 AM11/14/16
to Wikipedia Xmldatadumps-l, wikiteam...@googlegroups.com, Research into Wikimedia content and communities, A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Dear all,

The Wikimedia Foundation datasets collection on the Internet Archive
[1] has now surpassed 1 million items (and about 50,000 full database
dumps)! This marks a major milestone in our archiving efforts of
Wikimedia's vast amount of data and ensures that the vital content
submitted by volunteers across the moment is preserved. All these
would not have been possible without the help of many people,
including Nemo, Ariel and Emijrp (thanks!).

We started archiving towards the end of 2011 and reached a milestone
of half a million items back in June 2015. [2] We have since moved on
from archiving just the main database dumps to saving research-worthy
data such as the pageviews data and even attempting to keep a copy of
Wikimedia Commons. Today, we are working on making the items on the
Internet Archive more accessible for researchers by working on an
interface for searching old dumps.

Despite this feat, we are in constant need of more help. If you are a
researcher, a programmer or someone with a computer, we need your help
in many tasks! Have a look at WikiTeam's project [3] or Emijrp's
Wikipedia Archive page [4] for more information. If you regularly work
on the Wikimedia database dumps, please provide your input in the
Dumps-Rewrite project [5] and the API interface [6].

As before, here's to the next million!

[1]: https://archive.org/details/wikimediadownloads
[2]: https://groups.google.com/forum/#!msg/wikiteam-discuss/Vj3oonpYphg/h9HE6r3v2QAJ
[3]: https://github.com/WikiTeam/wikiteam
[4]: https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
[5]: https://phabricator.wikimedia.org/tag/dumps-rewrite/
[6]: https://phabricator.wikimedia.org/T147177

--
Hydriz Scholz

Federico Leva (Nemo)

unread,
Nov 14, 2016, 6:09:33 PM11/14/16
to wikiteam...@googlegroups.com
Quite impressive. :-) (Especially the part where the Internet Archive
lets us throw all this stuff at them! :-D )

Nemo
Reply all
Reply to author
Forward
0 new messages