TagTreeScanner help

Patrick Gundlach

unread,

Oct 24, 2005, 12:08:43 PM10/24/05

to

Hello out there,

I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
created a Wiki-markup parser for the mediawiki software?

I have succeded for a few tags, but I am currently stuck with
enumerations:

* one
*two
* three
** threeone
** threetwo
*four

should be represented by sth. like

<ol>
<li>one</li>
<li>two</li>
<li>three
<ol>
<li>threeone</li>
<li>threetwo</li>
</ol>
</li>
<li>four</li>
</ol>

This is what I have tried, but I have the feeling that is not the way
to proceed here. I only get <ol> </ol> around the string.

--------------------------------------------------
#!/opt/ruby/1.8.2/bin/ruby

require 'TagTreeScanner'

class SimpleMarkup < TagTreeScanner
@root_factory.allows_text = false
@tag_genres[ :root ] = [ ]
@tag_genres[ :root ] <<
TagFactory.new( :ol,
:open_match => /(\*.+?)\n+(?=[^*])/m,
:open_requires_bol => true,
:setup => lambda{ |tag, scanner, tagtree|
# Throw the contents I found into the tag
# but remove leading whitespace
tag << scanner[1] # [1].gsub( /\*/, '<li>' )
},
:allows_text => true,
:autoclose => :true,
# no effect: (??)
:allowed_genre => :list
)

@tag_genres[ :list ] = [ ]
@tag_genres[ :list ] <<
TagFactory.new( :li,
:open_match => /\*/,
:close_match => /\n/,
:open_requires_bol => true,
:allows_text => true
)

end

sample = <<EOS
* one
*two
* three
** threeone
** threetwo
*four

EOS

markup = SimpleMarkup.new(sample)
puts markup.to_xml
--------------------------------------------------

Damphyr

unread,

Oct 27, 2005, 10:42:11 AM10/27/05

to

Patrick Gundlach wrote:
> Hello out there,
>
>
> I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
> created a Wiki-markup parser for the mediawiki software?

Have you had any luck with this?
I find myself in the position of having to migrate a sizeable MediaWiki
installation to a TracWiki one.
First I must say I haven't started on the differences between the two
yet (which seem trivial at first glance).
I would love helping out with any problems you might have (and I have a
sizeable data set to test the code with :) ) as it would give me a
sizeable head start in my own work.
Cheers,
V.-
--
http://www.braveworld.net/riva

____________________________________________________________________
http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.
http://www.freemail.gr - free email service for the Greek-speaking.

Patrick Gundlach

unread,

Oct 27, 2005, 11:33:21 AM10/27/05

to

>> I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
>> created a Wiki-markup parser for the mediawiki software?
>
> Have you had any luck with this?

Actually no. I have tried different approaches, one with
tagtreescanner, one with a racc grammer and one with regular
expressions and the help of stringscanner. But I did not get any
satisfying results (= nice looking code). For example, I have no idea
yet how to parse (the example from my first post)

* item 1
* item 2
** subitem 2/1
** subitem 2/2
# subitem 2/3, but numbererd

etc. I might give RACC grammar another try perhaps. I would be very
very happy about a mediawiki.to_html method.

Patrick

Alan Chen

unread,

Oct 27, 2005, 2:48:26 PM10/27/05

to

You could look at the ruwiki parsing code to see if it helps. The
scanning and token replacement is factored into separate components -
see token.rb and the token directory in the ruwiki dist.

http://rubyforge.org/projects/ruwiki

HTH
- alan

Patrick Gundlach

unread,

Oct 29, 2005, 5:25:07 AM10/29/05

to

Thanks. I fear that if I use that source code I get only 95% the way
because of subtle differences in these two markup languages. ruwiki
has a lot of code on parsing that isn't easy for a human to parse. I
am advancing on my mediawiki class. And I also 'fear' that my code
grows up to that size and complexity. I'll release my code as a lib
once I get more complex mediawiki pages parsed.

Patrick