Outlines and html/xml

8 views
Skip to first unread message

Karsten Wolf

unread,
Jan 7, 2013, 7:06:47 AM1/7/13
to cactus-outliner-dev
As I'm struggling with html writing I discovered a fundamental flaw in
my thinking: html and xml are not easily mappable onto an outline!
Only when following certain rules.

For example:

<a>Some text
<b>B's text</b>
Some more of A's text
</a>

When Cactus reads such a xml/html file it produces:
-a(Some text)
--b(B's text)

So "Some more of A's text" is lost.

What are the possibilities to handle this?

Should the result be:

-a(Some text)
--b(B's text)
--(Some more of A's text) # at the level of <b> but belonging to <a>

or
-a(Some text (child1)Some more of A's text)
--b(B's text)

The second case would make outline editing code very complicated.

In search for an elegant solution.

A lot more thinking is needed.

-karsten

P.S.: The opml file format got a lot of criticism for putting the text
into an attribute but it avoids this problem very nicely.

Scott Lawton

unread,
Jan 7, 2013, 1:18:01 PM1/7/13
to cactus-ou...@googlegroups.com
> Should the result be:
>
> -a(Some text)
> --b(B's text)
> --(Some more of A's text) # at the level of <b> but belonging to <a>
>
> or
> -a(Some text (child1)Some more of A's text)
> --b(B's text)
>
> The second case would make outline editing code very complicated.

A third:
-a
--b
-...a

i.e. at the same level as *a*, with added '...' prefix (or some other
text or symbol)


> In search for an elegant solution.

One useful option for html: show inline tags (such as bold) as if they
were merely part of the text rather than tagged. Something similar
could be done for xml with a custom settings file to mark block vs.
inline OR merely by observing whether it had the
'text<tag>text</tag>text' format.


> A lot more thinking is needed.

Agreed! I'm pretty sure there's no one best/right answer; just
different settings for different files and/or personal preference.

There are of course plenty of models in the XML world (of editors with
an outline view), though I never found any compelling enough to
overcome whatever other limitations they had vs a plain text editor.

Scott

Karsten Wolf

unread,
Jan 7, 2013, 5:15:33 PM1/7/13
to cactus-outliner-dev

Hi Scott,


On Jan 7, 7:18 pm, Scott Lawton <scott.s.law...@gmail.com> wrote:
...
> A third:
> -a
> --b
> -...a
>
> i.e. at the same level as *a*, with added '...' prefix (or some other
> text or symbol)

For html, a node without tag could be made. For xml this would only
work if the ...a node is at the level of b and it's text would be
appended to the parent.


>
> > In search for an elegant solution.
>
> One useful option for html: show inline tags (such as bold) as if they
> were merely part of the text rather than tagged. Something similar
> could be done for xml with a custom settings file to mark block vs.
> inline OR merely by observing whether it had the
> 'text<tag>text</tag>text' format.

That could be a possible solution: making better decisions, which xml/
html nodes become outline nodes (currently all). That way, the easy
nodes (without trailing text) become outline nodes and the tagsoup
parts get collected into one node.

It looks like I need to extend the current parser.

> Agreed! I'm pretty sure there's no one best/right answer; just
> different settings for different files and/or personal preference.

At least I have a direction: I want to outline-edit xml and html
documents. Period.

> There are of course plenty of models in the XML world (of editors with
> an outline view), though I never found any compelling enough to
> overcome whatever other limitations they had vs a plain text editor.

I just looked at an older version of <oXygenXML>. It creates a mapping
from the outline to a selection of the file. The outline looks the
same as mine with trailing text not in a node but it references the
trailing text correctly in the file. That's not what I want.


-karsten

Karsten Wolf

unread,
Jan 22, 2013, 9:28:28 AM1/22/13
to cactus-outliner-dev
This has been solved for html in the upcoming v0.4.2e

It turns out, the libraries I use, ElementTree and lxml, have an
attribute for this: tail.

To rephrase the first example:
<a>Some text
<b>B's text</b>
Some more of A's text
</a>

"Some more of A's text" doesn't belong to <a>, it's the tail of <b>

The tests I conducted seemed to be OK, the check with http://validator.w3.org/
showed only errors my source material already had.

So writing html files is back. Currently with <!DOCTYPE html> and
utf-8 encoding only.
Reply all
Reply to author
Forward
0 new messages