building a custom parser from Bliki

39 views
Skip to first unread message

Olivier Ricordeau

unread,
Apr 25, 2008, 6:57:53 AM4/25/08
to Bliki - Java/Eclipse Wikipedia API
Hi group,

I've discovered Bliki yesterday and it looks great.
I need some help in order to be able to to what I need. I'm currently
importing the latest wikipedia dump in MySQL (thanks to xml2sql). So
what I need now is to build a java parser that does the following:
* For each internal link, store the link (source and destination) in
MySQL
* For each external link, same thing (but in a different table)
* Output the article as raw text (no HTML markups, etc.)

For the third point, maybe the simpliest way to do this is to first
convert to HTML using Blikki, and then remove tags (using htmlcleaner
for instance).
But can someone tell me where to begin for the two first points.
Please tell me which class do I need to override.

Cheers,
Olivier

Axel Kramer

unread,
Apr 25, 2008, 12:09:24 PM4/25/08
to Bliki - Java/Eclipse Wikipedia API


On Apr 25, 12:57 pm, Olivier Ricordeau <olivier.ricord...@gmail.com>
wrote:
> Hi group,
>
> I've discovered Bliki yesterday and it looks great.
> I need some help in order to be able to to what I need. I'm currently
> importing the latest wikipedia dump in MySQL (thanks to xml2sql). So
> what I need now is to build a java parser that does the following:
> * For each internal link, store the link (source and destination) in
> MySQL
> * For each external link, same thing (but in a different table)
> * Output the article as raw text (no HTML markups, etc.)
At the moment you can only use a derived
info.bliki.wiki.model.WikiModel or
info.bliki.wiki.model.AbstractWikiModel
and modify the append*() methods. I.e.
appendExternalLink(), appendInternalLink(),
appendInterWikiLink(), ...

Note you can speed things up in the AbstractWikiModel#render()
method,
when you use a <code>null</code> converter argument (if you don't need
the HTML output),
if you make these changes:
http://plog4u.svn.sourceforge.net/viewvc/plog4u?view=rev&revision=358

> For the third point, maybe the simpliest way to do this is to first
> convert to HTML using Blikki, and then remove tags (using htmlcleaner
> for instance).
Yes, that's possible you can derive from the
info.bliki.html.IHTMLToWiki
interface. For example copy the
info.bliki.html.wikipedia.ToWikipedia
class and modify it for your needs.

Olivier Ricordeau

unread,
Apr 25, 2008, 3:12:20 PM4/25/08
to Bliki - Java/Eclipse Wikipedia API
Thanks a lot for you answer!
One more question BTW: how am I supposed to build the .jar from the
svn? I found no build.xml nor Makefiles that builds bliki-xxx.jar...

PS: I did a checkout the following way:
svn co https://plog4u.svn.sourceforge.net/svnroot/plog4u plog4u

Maybe you should add this command here: http://matheclipse.org/en/Java_Wikipedia_API#Development

Thx,
Olivier

Axel Kramer

unread,
Apr 25, 2008, 4:01:55 PM4/25/08
to Bliki - Java/Eclipse Wikipedia API
On Apr 25, 9:12 pm, Olivier Ricordeau <olivier.ricord...@gmail.com>
wrote:
> Thanks a lot for you answer!
> One more question BTW: how am I supposed to build the .jar from the
> svn?  I found no build.xml nor Makefiles that builds bliki-xxx.jar...
In Eclipse do a "right mouse click" in the package explorer on the
file
<your-workspace>\info.bliki.wiki.svn\bliki.jardesc
and select menu "Create JAR"

May be you have to modify the bliki.jardesc text file for you own
paths and settings.
For example the addon-src files are probably not necessary for your
purposes.
Reply all
Reply to author
Forward
0 new messages