Wiki markup Parser HELP.....

1 view
Skip to first unread message

hitesh thakkar

unread,
Mar 24, 2008, 10:40:58 AM3/24/08
to mw...@googlegroups.com


hello,

We am doing knowledge Harvesting project,
where we are  parsing wikipedia  xml pages  and
there it contain wiki markup .

so we are suppose to parse that wiki too,
we got stuck here, how to parse wiki markup.

do i need to write wiki parser of its already available.

do reply if any one know about this query.

even we tried to install mwlib but facing error as follows:

error: can't copy 'mwlib/_expander.cc': doesn't exist or not a regular file

Thanking you,

Hitesh Thakkar.

Ralf Schmitt

unread,
Mar 24, 2008, 12:56:23 PM3/24/08
to mw...@googlegroups.com, hitesht...@gmail.com
On Mon, Mar 24, 2008 at 3:40 PM, hitesh thakkar <hitesht...@gmail.com> wrote:


hello,

We am doing knowledge Harvesting project,
where we are  parsing wikipedia  xml pages  and
there it contain wiki markup . 

so we are suppose to parse that wiki too,
we got stuck here, how to parse wiki markup.
 
>>> from mwlib import uparser
>>> uparser.simpleparse("* hello\n* world")
parser.info >> Parsing "'unknown'"
 Article 'unknown': 1 children
     Paragraph '': 1 children
         Node '': 1 children
             ItemList '': 2 children
                 Item '': 1 children
                     ' hello'
                 Item '': 1 children
                     ' world'
Article 'unknown': 1 children

Have a look at the source for advanced usage...



do i need to write wiki parser of its already available.

do reply if any one know about this query.

even we tried to install mwlib but facing error as follows:

error: can't copy 'mwlib/_expander.cc': doesn't exist or not a regular file

You are using the development version. You need to install re2c and run make.
Or you can use the source tarball on pypi by running easy_install mwlib.

- Ralf

Reply all
Reply to author
Forward
0 new messages