A beautiful pattern for composing large strings

121 views
Skip to first unread message

Edward K. Ream

unread,
Jan 1, 2013, 8:03:05 PM1/1/13
to leo-e...@googlegroups.com
I continue to work feverishly on various aspects of static type checking.

That work lead to a discovery that may benefit you.  It's one of the most beautiful patterns I've ever created: it allows a program to simply and naturally build up huge strings without using any string operations.  Generating an html file with minimal stress on the gc is an obvious application.  You could also say that the pattern creates a Pythonic way of using lisp-like algorithms, but more safely than in lisp.

Full details at the stc documentation page::
http://webpages.charter.net/edreamleo/stc/stc.html#a-beautiful-pattern-for-building-large-strings

Edward

Terry Brown

unread,
Jan 1, 2013, 10:48:49 PM1/1/13
to leo-e...@googlegroups.com
On Tue, 1 Jan 2013 17:03:05 -0800 (PST)
"Edward K. Ream" <edre...@gmail.com> wrote:

> Generating an html file with minimal stress on the gc is an obvious
> application.

Nice. An HTML specific approach which avoids things like the
r.div_end() construct is the LXML element factory:
http://lxml.de/tutorial.html#the-e-factory

Cheers -Terry

Edward K. Ream

unread,
Jan 2, 2013, 10:56:30 AM1/2/13
to leo-e...@googlegroups.com

Beautiful.  With the list framework the revised r.div method would be something like::

    def div(self,aList):
        compute old and new indents
        return [
            <div>, with old indent,
            aList, with new indent,
            </div>, with old indent,
        ]

An example of the pattern in use::

    return [
        ...
        r.div([
            contents of the div,
        ]),
        ...
    ]

Similarly for span, etc. Thanks for pointing this out.  This is too good to ignore.  I'll do it soon.

Edward

Ville M. Vainio

unread,
Jan 2, 2013, 3:01:17 PM1/2/13
to leo-editor
This seems like a spiritual relative of "rope" data structure:






Edward

--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To view this discussion on the web visit https://groups.google.com/d/msg/leo-editor/-/a2fmKxEY0s0J.
To post to this group, send email to leo-e...@googlegroups.com.
To unsubscribe from this group, send email to leo-editor+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.

Edward K. Ream

unread,
Jan 2, 2013, 5:41:18 PM1/2/13
to leo-e...@googlegroups.com
On Wednesday, January 2, 2013 10:02:40 AM UTC-6, Edward K. Ream wrote:

ReportTraverser uses this pattern at rev 288.  It is a remarkable simplification.
 
The revised r.div method will be something like::


    def div(self,aList):
        compute old and new indents
        return [
            <div>, with old indent,
            aList, with new indent,
            </div>, with old indent,
        ]

This doesn't work!  aList won't have the proper indentation.  In particular, the following won't work::

    [['  ',z] for z in aList],

flatten_list will add the two spaces before the line, that is, before a newline. Instead, a hack is needed:

    return [
        div,
        join_list(aList,indent='  '),
        '\n</div>'
    ]

The new 'indent' keyword tells flatten_list to add the given indentation (two spaces, here), to strings that start with a newline.  The new code in flatten_list is::

        for i,item in enumerate(aList):
            if leading:                  yield leading
            for s in flatten_list(item):
                if indent:
                    if s.startswith('\n'):
                        yield '\n'+indent+s[1:]
                    else:
                        yield s
                else:
                    yield s
            if sep and i < len(aList)-1: yield sep
            if trailing:                 yield trailing

The point is that the indentation must be "moved behind the newline".

Edward

Edward K. Ream

unread,
Jan 2, 2013, 5:45:44 PM1/2/13
to leo-e...@googlegroups.com


On Wednesday, January 2, 2013 2:01:17 PM UTC-6, Ville M. Vainio wrote:
This seems like a spiritual relative of "rope" data structure:



Thanks for these links.  Yes, there is a similarity.  However, the real beauty of new pattern is the flatten_list method.  It's the combination of the lists and flatten_list that gives the pattern real power.  We've just seen how the 'indent' keyword hack allows sophisticated processing of the lists.  I wouldn't expect more hacks to be needed, but I've been surprised before ;-)

Edward

Ville M. Vainio

unread,
Jan 3, 2013, 4:28:52 AM1/3/13
to leo-editor
Yes, it's indeed an interesting pattern. 

It would seem more useful in faster languages than python though; in python, string operations (and gc) are faster in comparison to executing other code, whereas in fast, more static languages (C++, Java, Go) avoiding GC gives you great benefits (I saw 5x perf increase reported for some Go app when eliminating GC).

Edward K. Ream

unread,
Jan 6, 2013, 12:09:40 PM1/6/13
to leo-e...@googlegroups.com
On Thursday, January 3, 2013 3:28:52 AM UTC-6, Ville M. Vainio wrote:
Yes, it's indeed an interesting pattern. 

It would seem more useful in faster languages than python though; in python, string operations (and gc) are faster in comparison to executing other code, whereas in fast, more static languages (C++, Java, Go) avoiding GC gives you great benefits (I saw 5x perf increase reported for some Go app when eliminating GC).

Thanks for these remarks.  I've enjoyed thinking about them.  A few responses:

1. This is a smallish pattern--it can't change the world, except insofar as something beautiful changes the world.

2. Otoh, the pattern changes the way I think about lisp and lisp-like patterns.  That's not nothing. For the first time, it makes list-oriented  programming pattern completely safe.  It does this because it doesn't matter what each component list contains, nor does it matter *at all* what the shape of any part of the tree is.  This makes the pattern completely flexible.

3. The pattern can be generalized.  The pattern I described uses a tree of component strings to describe a (large) resulting string.  But one can easily imagine using lists to hold anything at all (of whatever tree shape) and then use another version of flatten_list to compose results of other types.  Alternatively, rather than composing a result, the analog of flatten_list could process the tree of lists in other ways.  So the most general version of the pattern is:

A) The tree of lists can contain any data whatever, especially including None,
B) The "producers" (visitors) can create subtrees of whatever shape,
C) The analog of flatten_list is free to do anything whatever with the resulting tree.

I suspect that these features are what appeal to lisp programmers ;-)

4. I'm not sure whether the pattern is more useful in "faster" languages or not.  True, anything that helps a feeble language like C++ will seem useful :-)  But points 1-3 above have nothing to do with speed: they just make programming simpler, more flexible, more powerful and more fun.

Imo, gc issues are important both in Python and in C++.  For stc, the only way to get reproducible timing statistics for tests was to do the following before running the test::

    for z in (0,1,2): gc.collect(z)

The ReportTraverser class no longer contains *any* calls to string.join, so one could imagine that all strings used in the code would be interned.  The generated tree actually contains nothing but *references* to strings, and if all strings are interned the references will not themselves cause any new strings to be allocated.

Naturally, gc issues are even more important in language like C++ without a gc.  Lol.  The preceding paragraph is more important for C++ than in Python.  So yes, in this sense I agree with you completely that the pattern is more useful for "fast" languages than for Python.

Thanks, Ville, for provoking all these pleasant thoughts :-)

Edward
Reply all
Reply to author
Forward
0 new messages