leoPersistence.py: persistent gnx's and uA's for @auto etc.

33 views
Skip to first unread message

Edward K. Ream

unread,
Jul 13, 2014, 6:55:12 AM7/13/14
to leo-e...@googlegroups.com
Yesterday I started work on leoPersistence.py.  It is a re-imagining of leoViews.py.  This post discusses this new module in detail.

As of recent revs, the code in leoPersistence.py is connected to Leo via the c.leoPersistence ivar.  The c.leoViews ivar no longer exists, though leoViews.py is still part of the repo.  All but one new unit test covering leo.core.leoPersistence now pass.  The unit tests for leoViews.py have been disabled.

Terminology. A **foreign** file is a file created by @auto, @org-mode or @vim-outline.  The latter two do not exist yet, but they will asap. @org-mode or @vim-outline fully describe outline structure but do not (of course) directly support gnx's or uA's.  The leoPersistence module promises to allow Leo to reliably associate gnx's (and thus uA's) will all such files.

Now to the actual description of leoPersistence.py.  As mentioned earlier, it is based on leoViews.py, but it attempts *no* reordering or renaming of incoming nodes.  As a result, none of the horrendously complex code in leoViews.py exists in leoPersistence.py.  This is a *big* step forward.

Leo will store the needed in a tree (in the Leo outline) as follows, from the docstring of the PersistenceDataController class:

QQQ
    - @persistence
      - @data <headline of foreign node>
        - @gnxs
           body text: pairs of lines: gnx:<gnx><newline>unl:<unl>
        - @uas
            @ua <gnx>
                body text: the pickled uA
QQQ

The only real difficulty is associating the unl's saved in @gnx's with the "incoming" nodes, that is, the nodes read by @auto, @org-mode or @vim-outline.  This will done using pd.find_absolute_unl_node or pd.find_position_for_relative_unl.  Work is needed on these: more in another thread.

In any case, the idea is straightforward: search each unl for a corresponding node in the incoming outline.  If found, set the node's gnx, then use the gnx to set the uA and clone links.  I expect only a few days work will be needed complete the code.

Edward

Edward K. Ream

unread,
Jul 13, 2014, 7:06:25 AM7/13/14
to leo-e...@googlegroups.com


On Sunday, July 13, 2014 5:55:12 AM UTC-5, Edward K. Ream wrote:
Yesterday I started work on leoPersistence.py.

P.S.  The PersistenceDataController class uses an unusual coding style.  The abbreviation for this class is pd.  Rather than using "self" as the first argument of each member, the first argument of all pd members is, you guessed it, "pd".  This saves the usual assignment: pd = self in each member. 

A single comment at the start of the class disables all pylint warnings about "self" not being the first argument of a member: # pylint: disable=no-self-argument

I rather like this new coding style.  YMMV, but don't bother trying to talk me out of it :-)

EKR

Edward K. Ream

unread,
Jul 14, 2014, 8:23:46 AM7/14/14
to leo-editor
On Sun, Jul 13, 2014 at 5:55 AM, Edward K. Ream <edre...@gmail.com> wrote:

> Yesterday I started work on leoPersistence.py. It is a re-imagining of leoViews.py.

Work is going well: leoPersistence.py is a big collapse in complexity
compared with leoViews.py.

gnx's can now be restored in the easy case. The **hard case** arises
when the structure of the imported file has been changed outside of
Leo.

Restoring uA's should be straightforward: the write logic saves
pickled uA's in @ua nodes in the @uas tree. I'll add the code to
restore uA's later today. This code uses restored gnx's and will work
better when the restoration of gnx's becomes more reliable, as
discussed below.

I expect to enable the new code for general consumption later today by
setting new_auto to True at the start of leoAtFile.py. I don't expect
major problems, but if there are, you can just set this switch to
False.

===== Associating unl's with gnx's

> The only real difficulty is associating the unl's saved in @gnxs [nodes] with the [imported] nodes. This will done using...pd.find_position_for_relative_unl.

As mentioned above, this works, provided that the relative unl's in
the imported nodes haven't changed.

> More in another thread.

Rather than start another thread, I'll discuss
pd.find_position_for_relative_unl here. Terry, Fidel, Kent, I
especially look forward to your comments.

It seems to me, that in *this* context, we don't particularly want
unls that specify sibling position using the <headline>:M:N notation
within parts of unls, as in g.recursiveUNLFind. Indeed, we don't care
about sibling position when trying to associated gnx's with nodes.
More importantly, perhaps, we do not expect two sibling nodes to have
the same (imported) name, because that would mean we have a duplicate
definition of the imported symbol!

I *do* think that we need a way to expand the search for nodes if
there is no exact match for the relative (to the root node) unl fails
to match any imported node. However, I don't expect to use the
<headline>:M:N notation.

The general strategy will be to have a dictionary, say d, that
associates headlines with lists of (copies of) positions. This will
avoid scanning the entire outline repeatedly. If no exact match is
found for a unl, the algorithm will use d to find the list of all the
nodes corresponding to the *last* headline in the unl. For each
position in the list, we can compute it's parents, and thus compute
the longest unl that matches the to-be-matched unl. We pick the
position in the list with the longest matching unl. This algorithm
should be straightforward and very fast.

===== Summary

I'll enable the new code later today by setting new_auto = True in
leoAtFile.py. If there are problems, you can set this switch back to
False. I'll enable the new code when restoring uA's works for the easy
cases.

The **hard case** arises when the structure of the imported file has
been changed outside of Leo. In that case, a relative unls stored in
the @persistence tree may not match any incoming node. I'll rewrite
pd.find_position_for_relative_unl to find as many *reasonable* matches
as possible.

Your comments please, Amigos.

Edward

Kent Tenney

unread,
Jul 14, 2014, 9:14:57 AM7/14/14
to leo-editor
I'm afraid this level of detail is beyond me.

I don't understand Leo looking for nodes, I would expect
the @auto parsing component to be asking if a node had
been seen previously, something like

prior = found_in_auto_node_UA(<parsed component of external file>):
if prior is not None:
restore_prior_attrs(prior)
else:
<generate a standard auto node>

reviewing the first post
> Leo will store the needed in a tree (in the Leo outline) as follows

So Leo is persisting the tree of the @auto file? Doesn't that set the
stage for conflicts between the @auto parsing machinery and the
persisted tree?

I've been thinking along simpler lines, only persisting key/value
pairs like
{hash_contents:[uA, gnx], hash_contents:[uA, gnx] ...}
or
{hash_uri:[uA, gnx], hash_uri:[uA, gnx] ... }

if the parser comes across a node which matches one seen before, fine,
if it's new create a new gnx and empty uA, leaving all the tree management
work to the @auto parser.

Evidently I don't appreciate the complexities involved.
> --
> You received this message because you are subscribed to the Google Groups "leo-editor" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.
> To post to this group, send email to leo-e...@googlegroups.com.
> Visit this group at http://groups.google.com/group/leo-editor.
> For more options, visit https://groups.google.com/d/optout.

Edward K. Ream

unread,
Jul 14, 2014, 10:05:58 AM7/14/14
to leo-editor
On Mon, Jul 14, 2014 at 8:14 AM, Kent Tenney <kte...@gmail.com> wrote:
> I'm afraid this level of detail is beyond me.

It's hard to understand without digging into the code. But read on...

> I don't understand Leo looking for nodes, I would expect the @auto parsing component to be asking if a node had been seen previously

The parser is not involved in any way. The new @auto code uses what
the parser gives it.

The crucial comments are in the docstring for the
PersistenceDataController class:

QQQ
All required data are held in nodes having the following structure::

- @persistence
- @data <headline of foreign node>
- @gnxs
body text: pairs of lines: gnx:<gnx><newline>unl:<unl>
- @uas
@ua <gnx>
body text: the pickled uA
QQQ

Looking closely, you will see that Leo attempts to associate gnx's
with unl's, very much like bookmarks.

> So Leo is persisting the tree of the @auto file?

Kinda. It persists the unl's, which depend on the outline structure.
As discussed previously, we don't need to require an *exact* match
between unl's and nodes, but that code isn't in place yet.

> Doesn't that set the stage for conflicts between the @auto parsing machinery and the persisted tree?

Yes. This scheme *must* fail if imported headlines change, that is, if
you change the *name* of the thing being imported **outside of Leo**.
Absent true AI, there is **no way** around such breakages.

But how often do you, in fact, change outline structure outside of
Leo? And how often do you change the *names* of imported things
outside of Leo? Not all that often, I'm guessing.

> I've been thinking along simpler lines, only persisting key/value pairs like
{hash_contents:[uA, gnx], hash_contents:[uA, gnx] ...}
or
{hash_uri:[uA, gnx], hash_uri:[uA, gnx] ... }

This is really the same scheme that the new code uses! Look again at
this part of the docstring:

QQQ
- @uas
- @ua <gnx>
body text: the pickled uA
QQQ

> if the parser comes across a node which matches one seen before, fine,
> if it's new create a new gnx and empty uA, leaving all the tree management
> work to the @auto parser.

To repeat, the parser has *nothing* to do with this scheme. The new
@auto code takes what the parser gives it, and goes from there.

> Evidently I don't appreciate the complexities involved.

The new code is *much* simpler than the old, so you can study it in
detail if you like. However, I suggest looking at the data in the
@persistence tree. It is straightforward.

As long as outline structure doesn't change (and it rarely does) that
data suffices to recreate clone links and uA's. If the outline
structure *does* change, we will need more clever code to associate
unl's with gnx's. If a node's headline changes, we will lose uA's and
clone links to that node. That's a given, imo.

Thanks for your questions. Feel free to ask more.

Edward

Terry Brown

unread,
Jul 16, 2014, 11:58:56 AM7/16/14
to leo-e...@googlegroups.com
On Sun, 13 Jul 2014 04:06:24 -0700 (PDT)
"Edward K. Ream" <edre...@gmail.com> wrote:

> P.S. The PersistenceDataController class uses an unusual coding
> style. The abbreviation for this class is pd. Rather than using
> "self" as the first argument of each member, the first argument of
> all pd members is, you guessed it, "pd". This saves the usual
> assignment: pd = self in each member.
>
> A single comment at the start of the class disables all pylint
> warnings about "self" not being the first argument of a member: #
> pylint: disable=no-self-argument
>
> I rather like this new coding style. YMMV, but don't bother trying
> to talk me out of it :-)

Ok, I won't :-), but I have to say it: Eww. :-]

I've never really felt the need for c = self type assignments, in fact
I think they make code harder to read, since you have to keep track of
whether c is self or an argument or a local var.

A good programming environment should make it easy to keep track of
which class contains the method you're editing ;-) Kind of makes me
wonder if clones are the problem, seeing you do lose that context when
you collect a group of methods together in a view, whereas bookmarks...
I think bookmarks need an alternative view in the MVC sense which uses
QTreeView - this would basically create two trees in the Leo UI, but I
think it might make them more intuitive than the current more compact
view.

Of course sometimes it would be nice if self wasn't such a long word,
e.g. _ or something. I guess that's an advantage of C++, if you use a
proportional font, 'this' will take up less space :-)

Cheers -Terry

Edward K. Ream

unread,
Jul 16, 2014, 12:51:34 PM7/16/14
to leo-editor
On Wed, Jul 16, 2014 at 10:58 AM, 'Terry Brown' via leo-editor
<leo-e...@googlegroups.com> wrote:

> Ok, I won't :-), but I have to say it: Eww. :-]

The advantage of this style is that there is never any doubt about
what 'self' is. This advantage is a textual one, having nothing to do
with clones, or Leo, for that matter.

EKR
Reply all
Reply to author
Forward
0 new messages