I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
created a Wiki-markup parser for the mediawiki software?
I have succeded for a few tags, but I am currently stuck with
enumerations:
* one
*two
* three
** threeone
** threetwo
*four
should be represented by sth. like
<ol>
<li>one</li>
<li>two</li>
<li>three
<ol>
<li>threeone</li>
<li>threetwo</li>
</ol>
</li>
<li>four</li>
</ol>
This is what I have tried, but I have the feeling that is not the way
to proceed here. I only get <ol> </ol> around the string.
--------------------------------------------------
#!/opt/ruby/1.8.2/bin/ruby
require 'TagTreeScanner'
class SimpleMarkup < TagTreeScanner
@root_factory.allows_text = false
@tag_genres[ :root ] = [ ]
@tag_genres[ :root ] <<
TagFactory.new( :ol,
:open_match => /(\*.+?)\n+(?=[^*])/m,
:open_requires_bol => true,
:setup => lambda{ |tag, scanner, tagtree|
# Throw the contents I found into the tag
# but remove leading whitespace
tag << scanner[1] # [1].gsub( /\*/, '<li>' )
},
:allows_text => true,
:autoclose => :true,
# no effect: (??)
:allowed_genre => :list
)
@tag_genres[ :list ] = [ ]
@tag_genres[ :list ] <<
TagFactory.new( :li,
:open_match => /\*/,
:close_match => /\n/,
:open_requires_bol => true,
:allows_text => true
)
end
sample = <<EOS
* one
*two
* three
** threeone
** threetwo
*four
EOS
markup = SimpleMarkup.new(sample)
puts markup.to_xml
--------------------------------------------------
Have you had any luck with this?
I find myself in the position of having to migrate a sizeable MediaWiki
installation to a TracWiki one.
First I must say I haven't started on the differences between the two
yet (which seem trivial at first glance).
I would love helping out with any problems you might have (and I have a
sizeable data set to test the code with :) ) as it would give me a
sizeable head start in my own work.
Cheers,
V.-
--
http://www.braveworld.net/riva
____________________________________________________________________
http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.
http://www.freemail.gr - free email service for the Greek-speaking.
Actually no. I have tried different approaches, one with
tagtreescanner, one with a racc grammer and one with regular
expressions and the help of stringscanner. But I did not get any
satisfying results (= nice looking code). For example, I have no idea
yet how to parse (the example from my first post)
* item 1
* item 2
** subitem 2/1
** subitem 2/2
# subitem 2/3, but numbererd
etc. I might give RACC grammar another try perhaps. I would be very
very happy about a mediawiki.to_html method.
Patrick
http://rubyforge.org/projects/ruwiki
HTH
- alan
Thanks. I fear that if I use that source code I get only 95% the way
because of subtle differences in these two markup languages. ruwiki
has a lot of code on parsing that isn't easy for a human to parse. I
am advancing on my mediawiki class. And I also 'fear' that my code
grows up to that size and complexity. I'll release my code as a lib
once I get more complex mediawiki pages parsed.
Patrick