Replacement for mw-buildcdb?

114 views
Skip to first unread message

UltraNurd

unread,
Jul 26, 2012, 1:34:17 PM7/26/12
to mw...@googlegroups.com
I had been using mwlib to interface with Wikipedia dumps for a research project. The first step I did was to run:

mw-buildcdb --input enwiki-latest-pages-articles.2012.03.26.xml.bz2 --output 2012.03.26.no-redirects --ignore-redirects
My wikiconf.txt in that output directory sets the type to nucdb. In my Python code that operates on the dump, I call mwlib.wiki.makewiki on that conf file.

I saw in the commit log on GitHub that CDB support was going away. What is the current correct method for loading a Wikipedia dump file?

Ralf Schmitt

unread,
Jul 27, 2012, 8:36:25 AM7/27/12
to mw...@googlegroups.com
UltraNurd <ultr...@gmail.com> writes:

> I saw in the commit log on GitHub that CDB support was going away. What is
> the current correct method for loading a Wikipedia dump file?

Sorry, it's not supported anymore by pediapress. Feel free to copy the
old code into your project or become a maintainer for a mwlib.cdb
project..

--
Cheers
Ralf

Jeff

unread,
Sep 23, 2012, 4:35:15 PM9/23/12
to mw...@googlegroups.com
I've copied the old cdb code into mwlib.cdb on github.  I also resurrected the old xhtmlwriter as mwlib.xhtml.  I can submit both packages to pypi if all of the credits and attributions look okay (none of this is actually my code, it just the old code from mwlib)

-- Jeff

Ralf Schmitt

unread,
Sep 24, 2012, 9:41:07 AM9/24/12
to mw...@googlegroups.com
Jeff <jdo...@gmail.com> writes:

> I've copied the old cdb code into
> mwlib.cdb<https://github.com/doozan/mwlib.cdb>on github. I also
> resurrected the old xhtmlwriter as mwlib.xhtml
> <https://github.com/doozan/mwlib.xhtml>. I can submit both packages
> to pypi if all of the credits and attributions look okay (none of this
> is actually my code, it just the old code from mwlib)

sure, just go ahead!
Reply all
Reply to author
Forward
0 new messages