importing HTML pages to the wiki

4 views
Skip to first unread message

Jim Tittsler

unread,
Mar 3, 2015, 11:16:43 PM3/3/15
to wikieducator-tech
In the past I've been a big fan of Pandoc[1] for converting text from
one format to another, including HTML->Mediawiki.

The University of the Highlands and Islands (just had to spell it out,
because that seems like such a *great* name for a university) had the
start of a small course in HTML format. Each page was short and
reasonably regular (with the exception of video). I decided to use a
quick iojs ES6 script to scrape the pages and create a 1-for-1
wikipage mapping (minus the non-free images).

This was NOT intended as a general solution, but I think the code is
simple enough it might serve as a model for others wanting to do
something similar. I used cheerio[2] to mangle the HTML into usable
wikitext.

The code is in the wikieducator project on gitorious[3].
https://gitorious.org/wikieducator/htmlin

Jim

[1] Pandoc is the Swiss Army knife of text conversion tools
http://johnmacfarlane.net/pandoc/
[2] cheerio is a subset of jQuery for HTML manipulation on the server
https://github.com/cheeriojs/cheerio
[3] gitorious yesterday announced they are shutting down at the end of
May, so expect our project to move


--
Jim Tittsler Tokyo
http://OERfoundation.org/
http://OERu.org/

Wayne Mackintosh

unread,
Mar 3, 2015, 11:30:26 PM3/3/15
to wikieduc...@googlegroups.com
Hi Jim,

Thanks for your "number 8 wire" solution for converting the UHI course materials into wiki format. I know that Andy Brown is rather chuffed with this solution to kick start her OERu course remix in WikiEducator. She is an experienced learning designer and I suspect that she will add a few valuable features (eg integrating IDevices) into the materials.

Hopefully the pedagogical value add will be an inspiration for the UHI Tech folk to help grow the number of themes for our snapshots. 

The UHI development is going to be a valuable experience for the OERu team. 

W



--
You received this message because you are subscribed to the Google Groups "WikiEducator Technical Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikieducator-t...@googlegroups.com.
To post to this group, send email to wikieduc...@googlegroups.com.
Visit this group at http://groups.google.com/group/wikieducator-tech.
For more options, visit https://groups.google.com/d/optout.



--
Wayne Mackintosh
Director OER Foundation
UNESCO/COL/ICDE Chair in OER
Skype: WGMNZ1
Twitter: Mackiwg

Luis Miguel Morillas

unread,
Mar 4, 2015, 6:51:57 AM3/4/15
to wikieducator-tech
Very interesting, Jim, I was thinking on a similar conversor but using
parsoid. I'll test your script.



Saludos,

-- luismiguel (@lmorillas)
Reply all
Reply to author
Forward
0 new messages