In the past I've been a big fan of Pandoc[1] for converting text from
one format to another, including HTML->Mediawiki.
The University of the Highlands and Islands (just had to spell it out,
because that seems like such a *great* name for a university) had the
start of a small course in HTML format. Each page was short and
reasonably regular (with the exception of video). I decided to use a
quick iojs ES6 script to scrape the pages and create a 1-for-1
wikipage mapping (minus the non-free images).
This was NOT intended as a general solution, but I think the code is
simple enough it might serve as a model for others wanting to do
something similar. I used cheerio[2] to mangle the HTML into usable
wikitext.
The code is in the wikieducator project on gitorious[3].
https://gitorious.org/wikieducator/htmlin
Jim
[1] Pandoc is the Swiss Army knife of text conversion tools
http://johnmacfarlane.net/pandoc/
[2] cheerio is a subset of jQuery for HTML manipulation on the server
https://github.com/cheeriojs/cheerio
[3] gitorious yesterday announced they are shutting down at the end of
May, so expect our project to move
--
Jim Tittsler Tokyo
http://OERfoundation.org/
http://OERu.org/