aloha - macros

11 views
Skip to first unread message

Laurent Savaëte

unread,
Sep 8, 2012, 1:48:37 PM9/8/12
to ductus-d...@googlegroups.com
One more thing I have to do before deploying the new aloha code to
production is the macros.

Currently, we have 3 that I know of.
- page-search which I can reimplement with a simple UI in aloha that
would query for the parameters
- 2 other wikiotics specific ones that are defined in puppet.

I'd suggest we implement macros in "ductus-html5" like this:

<ductusmacro data-name="page-search" data-tags="tag1,tag2" />

This should work well with html parsers, and allow easy manipulation
from aloha and with jquery.

Input is pretty straightforward, click a button in aloha toolbar, set
some params and aloha adds the code in the html blob.
I suppose we can define some custom rendering html in editing mode, so
that it shows something like "page-search macro (tags=xxx)" or so.

Then we have to run a parser that will turn <ductusmacro> tags into some
form of html understandable by web browsers. This could be done on the
client side using javascript, but I think it makes a lot more sense to
do it server-side as it is now with creole macros. This way, results can
be cached, crawled by search engines, etc...

Any comments?

Laurent Savaëte

unread,
Sep 10, 2012, 1:16:04 PM9/10/12
to ductus-d...@googlegroups.com

> I'd suggest we implement macros in "ductus-html5" like this:
>
> <ductusmacro data-name="page-search" data-tags="tag1,tag2" />
>
> This should work well with html parsers, and allow easy manipulation
> from aloha and with jquery.

Fiddling with aloha, I realised it's a lot easier to just store macro
calls using something like:
<div class="ductus-macro" data-macro-name="pagelist"
data-tags="tag1,tag2"></div>

This saves us from converting weird non-standard content upon
loading/saving in aloha (read: writing a lot more code). While editing,
a simple CSS definition (eg. using pseudo-elements) can act as a
placeholder. And we only deal with html5 compliant code, which is
probably a good idea.

> Input is pretty straightforward, click a button in aloha toolbar, set
> some params and aloha adds the code in the html blob.
> I suppose we can define some custom rendering html in editing mode, so
> that it shows something like "page-search macro (tags=xxx)" or so.

I took a shot at it based on another aloha plugin. It seems to work,
although I still need a simple UI to input the tags the user wants to
search for :)

> Then we have to run a parser that will turn <ductusmacro> tags into some
> form of html understandable by web browsers. This could be done on the
> client side using javascript, but I think it makes a lot more sense to
> do it server-side as it is now with creole macros. This way, results can
> be cached, crawled by search engines, etc...

it looks like using lxml works well. It parses the html and replaces
the placeholder above with actual content we want to see (in this case,
the list of pages).
As it stands, I have a bit of redundant code between the creole macro
and the html5 macro.
Considering that we only have a handful of actual pages that contain
macros, I'd be tempted to ignore the upgrade path. Currently, the
visual editor gets old pages via the creole parser, so that it receives
the output of macros. I'd say no big deal, we can just manually fix the
3 pages that need it. It will be easier than coding a converter! (if we
open an old page, the old macro still runs, so we don't break anything)

the take away: it works overall (I'll try to push a demo to devbox
tonight), except:
- removing the macro in the editor is weird/broken, I need to fix that
- your opinions on dealing with old creole macros, please :)

any other thoughts?

Jim Garrison

unread,
Sep 13, 2012, 1:41:40 AM9/13/12
to ductus-d...@googlegroups.com
I think this sounds good. Ignoring the old creole macros is fine.

The HTML5 spec defines a parsing algorithm that results in a parse tree
that is parser-independent even if the HTML input is not valid. How
does lxml handle invalid HTML? I've always been a bit skeptical of its
HTML handling since it's an XML parser, but I haven't looked into it
recently (and this attitude may not be justified!).

Laurent Savaëte

unread,
Sep 13, 2012, 7:31:39 AM9/13/12
to ductus-d...@googlegroups.com, Jim Garrison

> I think this sounds good. Ignoring the old creole macros is fine.
>
> The HTML5 spec defines a parsing algorithm that results in a parse tree
> that is parser-independent even if the HTML input is not valid. How
> does lxml handle invalid HTML? I've always been a bit skeptical of its
> HTML handling since it's an XML parser, but I haven't looked into it
> recently (and this attitude may not be justified!).

I started with lxml's XML() constructor, which failed after about 1
attempt. But then reading the docs, HTML() showed up, which is a lot
more tolerant and relies on libxml2's "recovery" mode. So far, I
haven't had any problem, although it's only a limited variety of
content I've been able to test.
The normal workflow is:
aloha --> genshi validation --> lxml macro expansion
where aloha produces valid html5.
Anything (including stuff not produced by aloha, like bot uploads) that
reaches lxml (or whatever macro parser we want to use) has gone through
genshi validation first, so it can only be as broken as genshi allows
it to be.
I suppose it would make sense to homogenise html parsers and use either
genshi or lxml everywhere. (grep'ing through the sources, I only find
them used for the above purposes, but they were both already in
requirements.txt, any ideas where else they are used?)


Jim Garrison

unread,
Sep 13, 2012, 12:03:48 PM9/13/12
to ductus-d...@googlegroups.com
On 09/13/12 04:31, Laurent Savaëte wrote:
>
>> I think this sounds good. Ignoring the old creole macros is fine.
>>
>> The HTML5 spec defines a parsing algorithm that results in a parse tree
>> that is parser-independent even if the HTML input is not valid. How
>> does lxml handle invalid HTML? I've always been a bit skeptical of its
>> HTML handling since it's an XML parser, but I haven't looked into it
>> recently (and this attitude may not be justified!).
>
> I started with lxml's XML() constructor, which failed after about 1
> attempt. But then reading the docs, HTML() showed up, which is a lot
> more tolerant and relies on libxml2's "recovery" mode. So far, I
> haven't had any problem, although it's only a limited variety of
> content I've been able to test.
> The normal workflow is:
> aloha --> genshi validation --> lxml macro expansion
> where aloha produces valid html5.
> Anything (including stuff not produced by aloha, like bot uploads) that
> reaches lxml (or whatever macro parser we want to use) has gone through
> genshi validation first, so it can only be as broken as genshi allows
> it to be.

Ok, this makes sense.

> I suppose it would make sense to homogenise html parsers and use either
> genshi or lxml everywhere. (grep'ing through the sources, I only find
> them used for the above purposes, but they were both already in
> requirements.txt, any ideas where else they are used?)

lxml is primarily used to deal with XML in the ResourceDatabase. genshi
is used purely for HTML cleaning.
Reply all
Reply to author
Forward
0 new messages