parse trees and sections

0 views
Skip to first unread message

Devin Bayer

unread,
Jun 3, 2011, 6:25:37 AM6/3/11
to zeta-discuss
Hi. I am interested in eazytext because of the way it generates parse trees and can dump them back to source text. My plan was to rearrange sections, then write the wiki text back out. However, when I tried to do this, I couldn't make sense of the parse trees. With this source:

<eazytext>
= title

== section1

1

== section2

2
</eazytext>

I get this parse tree:

wikipage:
paragraphs:
paragraphs:
paragraphs:
paragraphs:
paragraphs:
paragraph:
heading: `['=', <eazytext.ast.TextContents object at 0x104257350>]`
newline:
paragraph_separator:
newline:
paragraph:
heading: `['==', <eazytext.ast.TextContents object at 0x1042576d0>]`
newline:
paragraph_separator:
newline:
paragraph:
textlines:
(line 1)
textcontent: basictext :
newline:
paragraph_separator:
newline:
paragraph:
heading: `['==', <eazytext.ast.TextContents object at 0x104257bd0>]`
newline:
paragraph_separator:
newline:
paragraph:
textlines:
(line 1)
textcontent: basictext :
newline:
paragraph_separator:
newline:
paragraph_separator:
newline:

Identifying sections isn't that easy, since headings are very deep in the hierarchy, but text is near the top. I was hoping for something like this hierarchy:

wikipage:
section:
heading: "title"
paragraphs:
section:
heading: "section1"
paragraphs:
text: "1"
section:
heading: "section2"
paragraphs:
text: "2"

Then we could also generate a table of contents, etc...

Cheers,
Devin

prat...@gmail.com

unread,
Jun 4, 2011, 9:35:38 AM6/4/11
to zeta-discuss
Hi Devin,

The show() method in AST nodes are primarily used for debugging, and
to understand the parse grammar. For some time now, the show() method
is not kept up-to date with the program modifications. Thanks for
reminding me. I have updated them now.

Coming to the your pretty print requirement, (I am copying it below
for reference)
>
> wikipage:
>   section:
>     heading: "title"
>     paragraphs:
>       section:
>         heading: "section1"
>         paragraphs:
>           text: "1"
>       section:
>         heading: "section2"
>         paragraphs:
>           text: "2"

Grouping sections :

You want sections to be hierarchically organized (i.e) section-
heading and section-text to be parsed as children nodes for a given
section-node. Note that, HTML does not organize sections like that.
Headings and paragraphs are not grouped into sections, also they are
stored as siblings under the same parent node. So, it does not sound
prudent to change the grammar.

Too many paragraphs :

The crux of the grammar is that, eazytext documents are parsed as
sequence of paragraphs. And it is defined using left recursion. So,
the list of paragraphs are reduced and stored as AST tree of
paragraphs.
wikipage : paragraphs
| paragraphs ENDMARKER
paragraphs : paragraph paragraph_separator
| paragraphs paragraph paragraph_separator
| paragraph_separator
You can visualize this using erlang's representation of list
datatype as -> [ H | T ], where H is the first `term` in the list and
T is a list of remaining `terms`, subsequently list T can be
represented as [ H1 | T1 ] and so on ...
Nevertheless, I have removed this reduced tree representation of
paragraphs in the show() method (only in the show() method, the
grammar is still the same) there by avoiding the clutter in pretty-
printing.

I hope you will get a nice looking output now.

I have pushed the latest changes into lp:eazytext branch (in
launchpad). It will be part of the next release.

Cheers,
Pratap

Harshad RJ

unread,
Jun 4, 2011, 11:51:41 AM6/4/11
to zeta-d...@googlegroups.com
Pratap,

On Sat, Jun 4, 2011 at 7:05 PM, prat...@gmail.com <prat...@gmail.com> wrote:
I hope you will get a nice looking output now.

My hunch is that the OP wanted to directly manipulate the AST, to rearrange the sections, and then pretty print the tree.

--
Harshad RJ
http://twitter.com/h__r__j

Devin Bayer

unread,
Jun 6, 2011, 5:06:28 AM6/6/11
to zeta-d...@googlegroups.com

On Jun 4, 2011, at 15:35, prat...@gmail.com wrote:

> Grouping sections :
>
> You want sections to be hierarchically organized (i.e) section-
> heading and section-text to be parsed as children nodes for a given
> section-node. Note that, HTML does not organize sections like that.
> Headings and paragraphs are not grouped into sections, also they are
> stored as siblings under the same parent node. So, it does not sound
> prudent to change the grammar.

I think the HTML designers realized this mistake. HTML5 and XHTML2 both organize content into sections:

http://www.w3.org/TR/html5/sections.html#the-section-element

Without a grouping element, styling and manipulation become much harder.

~ Devin

prat...@gmail.com

unread,
Jun 6, 2011, 6:27:40 AM6/6/11
to zeta-discuss
Thanks for pointing this out. I haven't yet scoped HTML5 for eazytext.
I will have to go through the spec before I can give a road-map on
having <section> elements generated using eazytext. Meanwhile, if u
have some suggestions please throw them forward.

Cheers,

On Jun 6, 2:06 pm, Devin Bayer <l...@t-0.be> wrote:

prat...@gmail.com

unread,
Jun 7, 2011, 12:26:41 PM6/7/11
to zeta-discuss
I was hoping to get more inputs on that. Meanwhile, I have improvised
the AST nodes to reflect the grammar as much as possible, it is now
possible to reconstruct the reduced rules by walking through the nodes
using its children() method. I have also added two attributes
`_nonterms` and `_terms` for each node object.

These changes are available under lp:eazytext (bazaar repository) from
launchpad. And revision should be r165.

Cheers,

On Jun 4, 8:51 pm, Harshad RJ <harshad...@gmail.com> wrote:
> Pratap,
>
> On Sat, Jun 4, 2011 at 7:05 PM, prata...@gmail.com <prata...@gmail.com>wrote:
>
> > I hope you will get a nice looking output now.
>
> My hunch is that the OP wanted to directly manipulate the AST, to rearrange
> the sections, and *then* pretty print the tree.
>
> --
> Harshad RJhttp://twitter.com/h__r__j
Reply all
Reply to author
Forward
0 new messages