some thoughts about wikitext pages

9 views
Skip to first unread message

Laurent Savaete

unread,
Jul 16, 2012, 12:18:25 PM7/16/12
to ductus-d...@googlegroups.com
While writing the docbook -> wikicreole converter, I hit a series of
problems due to limitations in creole. We talked a bit about this with
Jim yesterday, below is a summary of what I've gathered so far.

Some issues I faced with wikicreole:
- no way to create a numbered list with a table as one of the items.
Adding anything bigger than a single line breaks the numbering
- I can't seem to add a caption to a table (or to an image)
- can't seem to add anything but text in a table cell
- no way to auto generate table of content, or numbered headings for a long page
- transclusion is a know problem with many faces (inserting another
lesson format into text pages, templates...)
- formatting is severely limited (cannot force a table/column width).
wikicreole.org confirms that table support is only very simple.
- etc...

Here's an example of what I'm talking about, FSI russian chapter 1,
PDF version by Eric VS wikitext version I produced (both from the same
docbook source)
http://www.yojik.eu/Downloads/FSI-chap1-stressed.pdf
http://laurent.dev.wikiotics.net/ru/fsi1
(images are missing from the wikitext version because I haven't added
support for them yet, so ignore that for now)

We are about to get substantial amounts of text pages from Eric's
FSI/DLI conversions, with valuable content (not to mention all other
contributions). Unfortunately, with the current creole parser, we
cannot support this content. It's not merely a matter of stylesheets
(although they could do with some love!), the markup language does not
support what we need.

There are a few ways forward:

- expand the creole markup language to handle what we need. Pros: we
do what we want, as we need it. Cons: we have to code everything,
ideally going through the creole discussion groups to make those
things "standard" (as that is the aim of creole, right?). All this
will likely be (very) slow? Also, I suppose that extending creole
would eventually become yet another tag soup like mediawiki markup is.
But maybe I'm just biased here :)

- move to a different wiki markup language. The obvious candidate
would be mediawiki markup, as it's by far the most widely used one.
There are several projects to assemble a proper mediawiki markup
parser (some in python) so we could just reuse that. Pros: we get
virtually instant access to all the power of the syntax, plus a wide
user base. Cons: the markup language is somewhat bloated, and we
import all the crap. For what it's worth, I just looked at the source
behind https://en.wikibooks.org/wiki/French/Lessons/Greetings and I
think I'd rather read perl than this :)

- take a daring move forward, and do what most blog editors do:
wysiwyg edition (actual backend language to be defined). The big
advantage I can see there is that the learning curve for editing pages
is virtually absent. There are millions of people editing blogs out
there, using pretty much what we would rely on then. I'm clearly
biased when writing this, no need to hide it. I think visual edition
would rock.
A few points where I think it could help tremendously:
> linking inside the wiki. Currently you need to find the url for the page, replace a / with a : and put [] around it. Wordpress has a neat way of dealing with internal links: you click the link button, choose "internal link" and then pick the page you want to link to from a list. We could create something similar without too much trouble, I think.
> inserting images (and audio). Bug aside, it's hard to imagine worse workflow than what we have now. Similarly, I think blogging software does that pretty well. We could reuse our media widgets there too.
I'm looking into ckeditor and wysihtml5 (the one everyone has posted
about on the mailing list :) to see how easy they are to integrate
into the wiki. I'll share my findings soon.

Ian

unread,
Jul 16, 2012, 12:48:24 PM7/16/12
to ductus-d...@googlegroups.com
On 07/16/2012 12:18 PM, Laurent Savaete wrote:
> - move to a different wiki markup language. The obvious candidate
> would be mediawiki markup, as it's by far the most widely used one.
> There are several projects to assemble a proper mediawiki markup
> parser (some in python) so we could just reuse that. Pros: we get
> virtually instant access to all the power of the syntax, plus a wide
> user base. Cons: the markup language is somewhat bloated, and we
> import all the crap. For what it's worth, I just looked at the source
> behind https://en.wikibooks.org/wiki/French/Lessons/Greetings and I
> think I'd rather read perl than this :)
>
> - take a daring move forward, and do what most blog editors do:
> wysiwyg edition (actual backend language to be defined). The big
> advantage I can see there is that the learning curve for editing pages
> is virtually absent. There are millions of people editing blogs out
> there, using pretty much what we would rely on then. I'm clearly
> biased when writing this, no need to hide it. I think visual edition
> would rock.

The biggest technical project currently being pushed by the wikimedia
foundation is the visual editor initiative, which is currently available
for a subset of their markup and will be deploying to one of their
language wikis within two months. More info is available here:
https://www.mediawiki.org/wiki/VisualEditor:Welcome

Everything I heard at wikimania agrees with your sentiment that moving
to visual editing is a very important step. While I do not know the
actual state of the MediaWiki visual editor code and am no personal fan
of the MediaWiki markup syntax, I will say that they have significant
engineering effort behind the project and that there is some significant
language content on wikibooks that we could import wholesale if we
supported the sytax. Unless the code is a particularly poor fit, I think
the wikibook content should put this visual editor at the top of the
list of options.

The WikiMedia visual editor team might also be personally interested in
having a non-MediaWiki deployment test case, especially over the next
few months.

-Ian

Jim Garrison

unread,
Jul 17, 2012, 11:37:53 PM7/17/12
to ductus-d...@googlegroups.com
On 07/16/12 09:18, Laurent Savaete wrote:
> While writing the docbook -> wikicreole converter, I hit a series of
> problems due to limitations in creole. We talked a bit about this with
> Jim yesterday, below is a summary of what I've gathered so far.
>
> Some issues I faced with wikicreole:
> - no way to create a numbered list with a table as one of the items.
> Adding anything bigger than a single line breaks the numbering
> - I can't seem to add a caption to a table (or to an image)
> - can't seem to add anything but text in a table cell
> - no way to auto generate table of content, or numbered headings for a long page
> - transclusion is a know problem with many faces (inserting another
> lesson format into text pages, templates...)
> - formatting is severely limited (cannot force a table/column width).
> wikicreole.org confirms that table support is only very simple.
> - etc...

A few of these are issues with creoleparser (the parser we use), not the
creole language itself. It would be useful to determine which category
each item falls under.

Even if (when) we have a visual editor, we need a markup language of
some sort (be it creole, creole + out own custom extensions, mediawiki,
or html minus lots of unsafe stuff).

The wikicreole language allows one to give a "title" with an image.
Would it make sense just to make this be the caption as well, if it is
given? That doesn't solve the issue with tables though...

Laurent Savaëte

unread,
Jul 26, 2012, 9:55:15 AM7/26/12
to ductus-d...@googlegroups.com
So, visual editing...

- creole: I haven't found anything for wikicreole. I just assume its
user base is too small to gather momentum for developing that sort of
editor.

- mediawiki markup: from what I've read here and there, it doesn't
actually have a properly formalised grammar, which makes it really hard
to create a visual editor for it. If we want to go down that route, our
best bet is to follow what wikimedia does...

- html: 2 wysiwyg editors come back everywhere on the internet:
= tinyMCE, used by wordpress
= CKEditor (formerly FCKEditor)
They both seem to handle pretty advanced stuff (tables, media
insertion, text formatting, and more joy...) and the difference between
them seems to boil down to a matter of personal preferences. I've read
a bunch of comparisons at various points in time. Basically, it would
seem that tinyMCE was ahead in the times of FCKEditor, but the revamp
into CKEditor seems to have closed the gap, and possibly the latter is
better now. Their code is decently commented and they both feature a
plugin mechanism which we could use to tie in internal page linking,
wiki media upload/linking, etc... I think we could shoehorn our audio
recorder in there to allow for on-the-spot audio insertion in text
pages and that sort of stuff, which would rock!

= there's this wysiHTML5 which doesn't really belong in the same
category. It doesn't directly allow to work on tables or anything
advanced (it offers an insertHTML() method which can be used for that
purpose, but all the logic + UI has to be built for it). It certainly
is lighter, but requires us to construct an entire UI around it. Not
sure this is worth the effort, when great stuff is available elsewhere.

= aloha-editor.org which is pretty impressive (try their online demos,
though I find the ones in the source more helpful). A big advantage I
see to it is that the editor works on the DOM, as opposed to editing
HTML code, which means there is no HTML code view either, seriously
limiting the risk for crappy/dangerous input. A quick look at the
source produced is a lot more promising than tinyMCE/CKEditor (read:
cleaner). It seems to be the upcoming thing, with a pretty active
community around it.
A couple issues I can see with it:
- it's 3 to 5 times heavier than the other guys (that's a guestimate
based on their demos VS other demos, but since they didn't optimise
anything, and it's very modular, there's probably room for improvement,
PageSpeed addon for chromium says we can cut code size by ~half just
with basic compression)
- it's a bit less browser compatible than others (IE6-7 not supported)
but we don't support these for flashcard deck based editing either, so
I wouldn't call it a problem. Mobile editing isn't supported yet either.
- it's much younger than other editors (first release dates back only 2
years), so it may not be as stable. However, the concept behind it is a
generation ahead.

There are other editors, but they are either far behind, or support has
been dropped (from what I've seen).

Having spent a few hours working on a separate wordpress project for a
friend, I'm convinced that visual editing is the way to go. Creating a
few pages on wordpress takes minutes, and you don't need to go looking
around for documentation when you forgot how to achieve this or that
result.
A few advantages I see to HTML:
- can't think of a more widely supported markup. It has grammar,
support for more stuff than we can think of, a standardising org behind
it, etc...
- all sorts of tools are available for it, I'm thinking parsers,
sanitisers, converters... Importing all the FSI content to HTML would
boil down to ... uploading files, or almost. With a bit of luck,
importing content from other freely licensed sites could be as simple
as copy/pasting
- security issues: wysiHTML5 recommends using
https://github.com/jsocol/bleach to sanitize HTML on the server side.
It's a whitelisting system, which would be needed regardless of the
editor used.

So this was a completely biased review. Feel free to re-establish some
neutrality!

Ian Sullivan

unread,
Jul 26, 2012, 10:10:45 AM7/26/12
to ductus-d...@googlegroups.com
On 07/26/2012 09:55 AM, Laurent Sava�te wrote:
> So, visual editing...
>
> - creole: I haven't found anything for wikicreole. I just assume its
> user base is too small to gather momentum for developing that sort of
> editor.
>
> - mediawiki markup: from what I've read here and there, it doesn't
> actually have a properly formalised grammar, which makes it really hard
> to create a visual editor for it. If we want to go down that route, our
> best bet is to follow what wikimedia does...

The visual editor initiative from MediaWiki is piggybacking on their
formal MediaWiki syntax parser initiative "Parsoid"
(https://www.mediawiki.org/wiki/Parsoid). Theoretically we could re-use
that to create our own editor but going with the editor they are
developing along with the parser seems the more sensible route should we
choose mediawiki syntax.

-Ian

Jim Garrison

unread,
Jul 29, 2012, 11:20:26 PM7/29/12
to ductus-d...@googlegroups.com
On 07/26/12 07:10, Ian Sullivan wrote:
Historically [1], it seems highly likely that somebody will port the new
mediawiki parser to python, at which point we could find a way to use
that parser (if it's written in a sufficiently generic way) and simply
plug in the visual editor. I haven't looked into the details yet, but
this could very well be the best way forward. Wikicreole doesn't really
seem to have the brain share behind it that we would like, and we could
really use a visual editor and more features. Even if we support a
subset of mediawiki's syntax (i.e. everything except the crap), this
would be very beneficial to us.

I haven't looked into Parsoid much, but it appears to be written in
Javascript (which leads me to believe that mediawiki itself is still
using the old parser to render the HTML, at least for now). I'm sure we
could clarify some of this on the wikitext-l mailing list, which I was
subscribed to for some time last year (but unsubscribed due to lack of
time).

[1] http://www.mediawiki.org/wiki/Alternative_parsers

Jim Garrison

unread,
Jul 29, 2012, 11:26:27 PM7/29/12
to ductus-d...@googlegroups.com
I was not aware of this editor until now. It looks very nice (on the
surface).
I hadn't seen bleach before. In the past (for another project) I've
also used Genshi's HTMLSanitizer (along with TinyMCE) and I was happy
with the results.

Laurent Savaëte

unread,
Aug 15, 2012, 8:38:04 PM8/15/12
to ductus-d...@googlegroups.com

Further thoughts on the matter. I'll try to put together a series of
requirements we have for the editing system, so that we can objectively
decide what fits best.

- basic formatting: I think we don't want pure formatting (as in
colours, font size and the like) beyond maybe bold/italics. We should
restrict use to h1-h6, lists, tables and so on (or their markup
equivalents). CSS will do the job of formatting all of those in a
unified way.
- transclusion / templates. Ability to insert the content of some other
wikipage (be it text or flashcard deck based, or anything else that can
be rendered as html)
- images. Both reuse of existing picture in the resource db and
uploading (or transfer from flickr or commons or...) should be
supported. As we save the page, images would be on our server in any
case, so the syntax there would probably be the same throughout.
- audio (video?). Same as for pictures.
- macros (searches...). Users should be able to include a macro from an
admin-defined set of allowed stuff.
- internal / external links. Just like now, we want to support both.
For internal links, the UI should provide some easy way to pick a link
from the database.
- redirects. This is most likely implemented below the parser level, so
not relevant here, except maybe for rendering a redirected page.
- Security: we should use a whitelist system, so that anything not
explicitly allowed is ignored (or leads to edit rejection). This would
take care of javascript, and all sorts of potentially malicious html
code.

Anything else we want supported?

Jim Garrison

unread,
Aug 16, 2012, 12:52:40 AM8/16/12
to ductus-d...@googlegroups.com
I think you got everything.

I have always dreamed of transclusions of any type of content -- not
just audio, images and video, but other textwiki pages and choice
lessons as well. (Wagn is a wiki engine that relies heavily on
transclusions, fyi.) These can be implemented gradually over time, of
course, as long as we have a transclusion syntax+UI plan.

Redirects will not be a special syntax on a textwiki page, but instead
are their own type of wiki page (see ticket #39, which has a partial patch).
Reply all
Reply to author
Forward
0 new messages