Will just one kind of @file node suffice?!

Edward K. Ream

unread,

Mar 25, 2009, 5:59:01 PM3/25/09

to leo-editor

The recent discussions got me thinking about object identities, clones
and sentinels.

While doing pt exercises in the pool I think I may have solved the
"master problem". (Usually I wait until I am in the shower for my
Ahas...)

Naturally, having been wrong (or frustrated) so many times before, I'm
not completely confident that I have thought of everything, but here
goes.

This is a fairly long post, so let me tell you at the start where I
want to go. It may be possible to combine all the best aspects of
@thin, @auto and @shadow into a new @<file> type. It may eventually
be called just @file, but here let us call it @file! (trailing
exclamation point) to emphasize that it is a new kind of @<file>
directive.

The essential idea is this: when writing @file! trees, Leo will write
a public file (without sentinels) and (usually) a private file
containing sentinels, much as @shadow does now. The trick will be
figuring out how to make this scheme work in various kinds of
cooperative environments.

Let us just assume that some (fairly straightforward) trick will make
this work. We have entered the promised land:

A. When reading an @file! node, Leo will use @auto import code if no
private file exists.

B. When saving an @file! node, Leo will write both the public and
private files, unless we want @file! to do the work of @edit, in which
case Leo will just write the public file.

C. When reading an @file node, we must compare the modification dates
of the public and private files to see whether just to use the private
file (if it is newer) or to use the general @shadow update algorithm
(comparing the public and private files) if the public file is newer.

The above allows @file! to do the work of @auto, @thin and @shadow.
Furthermore, there would be little or no need for @asis, @nosent,
etc. It would also probably be straightforward to have @file! do the
work of @edit. In short, @file! likely could do *everything*.

BTW, note that @file! would eliminate most of the problems with rST
embedded in @thin trees. We would pass the public file to sphinx or
docutils. The public file would contain no sentinels, so they would
cause no problems. (Yes, we might have to ensure a blank line at the
end of each node.)

To repeat, the above is simply a road map to the desired result. We
still have the hard word to do.

In the interest of completeness, I'll now present my thinking about as
it was in the pool. Some details of the thought process have already
been lost: this is the best I can do.

I first considered breakable clone links. While not a preposterous
idea, they would, in fact, create significant complications throughout
Leo. Imo, they would likely lead to unavoidable special cases: nodes
in @<file> nodes would like have to be "master" nodes, and clones
outside @<file> nodes would likely be subsidiary nodes. Furthermore,
because clones can contained cloned ancestors, the likelihood of data
loss, or worse, data corruptions or confusion, becomes very real in
the presence of broken clone links. One antidote would be to make
clones in @<file> nodes first among equals, but such a distinction
would ripple throughout Leo.

So, once again, the conclusion was that object identity is important.
We simply can not fake identity by using mutable data such as headline
text. Something like gnx's is required.

Then my attention returned, more in desperation than hope, to the
question of sentinels. I believe I quickly realized that no amount of
"simplification" of sentinels is likely ever going to work. Many
people just will not tolerate sentinels.

This leaves only @auto and @shadow as possible salvations.

I don't recall how this came about, but the phrase "reverse @shadow"
appeared.

Let us recall the essential problem with @shadow in a collaborative
environment, namely that public files do not contain sentinels, and so
can not specify the structure used by ones collaborators.

Here is the Aha: what if we commit the so-called *private* files when
using @shadow? This is, in fact, equivalent to committing @thin
files.

This is indeed a reversal of @shadow work flow. We would typically
update the *public* files based on changes to the private files.
Actually, this is trivial: the @shadow update logic is not used, and
we just read the @shadow tree as if it were an @thin tree.

You can probably see some complications here, but let me skip ahead to
what might be called the "real" Aha, namely that we can imagine three
(or is it four?) ways of using bzr.

0. Don't use bzr at all. There is no cooperation.
1. A fully Leonine environment, such as leo-editor. All committers
to the repository use Leo.
2. A fully non-Leonine environment. At most one committer uses Leo.
3. A mixed Leonine environment. More than one, but not all,
committers use Leo.

Let's examine each case separately.

Case 0. No cooperation

In this case, the user can use @thin or @shadow, so it doesn't much
matter how @file! would work.

Case 1: Everyone uses Leo.

In this case, we do the *reverse* of how @shadow was originally
intended to operate: we commit the *private* files containing the
sentinels. Since everyone is using Leo, there is never any need to
commit the so-called public files.

We may want to put the private files in the same directories as the
public files, for convenience in folder organization, but in principle
it doesn't matter where the private files go. All that matters is
that we commit the private files, not the public files.

Case 2: At most one person uses Leo.

This case is also straightforward. We commit the *public* files, not
the private files. Since only one Leo users is committing, only that
user needs to know the structure data in the private files. Of
course, collaborators can (in effect) change outline structure without
having the Leo outline, but that will be rare, and it will likely not
be a serious issue.

Case 3: Two or more people use Leo cooperatively, but some people do
*not* use Leo.

This is the hard case. Ideally, we would commit both the public and
private files. Leo users want the private files; the non-Leo users
want the public files. To make this scheme work, we would want a way
to force users to commit public/private pairs of files in synch. This
is possible in principle, but I'm not sure how easy it would be. In
the worst case, Leo users could fall back on @auto, but we really
would rather avoid that sad eventuality.

BTW, it might be good for this case to put all files containing
sentinels in a single place, so as not to overly pollute the directory
structure in the repository.

Conclusion

It looks to me that cases 0, 1 and 2 would be easy to do. Case 3 is
harder, but feasible.

Unless I am horribly mistaken, @file! promises to relegate the
(absolutely essential) sentinels to the background for almost all use
cases. In effect, we get all the advantages of @thin nodes in all use
cases.

Leo's processing for @file! nodes would be fairly straightforward in
concept. Interestingly, the behind-the-scenes atFile read.write logic
uses just about every trick in Leo's book: @auto import code
initially, dual writes of public and private files, and the standard
@shadow update algorithm as needed.

All comments welcome. It really would be nice if this general scheme
could be made to work.

Edward

Ville M. Vainio

unread,

Mar 25, 2009, 6:19:31 PM3/25/09

to leo-e...@googlegroups.com

On Wed, Mar 25, 2009 at 11:59 PM, Edward K. Ream <edre...@gmail.com> wrote:

> want the public files. To make this scheme work, we would want a way
> to force users to commit public/private pairs of files in synch. This
> is possible in principle, but I'm not sure how easy it would be. In
> the worst case, Leo users could fall back on @auto, but we really
> would rather avoid that sad eventuality.

Hashing to rescue! For file foo.py ver 3, we have a private version
2368237aaee.txt.py. The name is the hash digest of contents of foo.py.
So, when you open foo.py, leo will look up the right private filei in
a priv file repo. People can commit their private files a bit
off-sync, as long as the right files end up in the repo at some point.

> BTW, it might be good for this case to put all files containing
> sentinels in a single place, so as not to overly pollute the directory
> structure in the repository.

Yeah, that's the repo :-). Like we have .bzr, we can have
.leo_structure in project root. If you send out a leo tree somewhere
(release), you will also ship that dir. GC can be done every now and
then.

> Unless I am horribly mistaken, @file! promises to relegate the
> (absolutely essential) sentinels to the background for almost all use
> cases. In effect, we get all the advantages of @thin nodes in all use
> cases.

Why not keep calling it @shadow, becase that's what it still remains?
The difference is that shadow files are also public files now.

--
Ville M. Vainio
http://tinyurl.com/vainio

Edward K. Ream

unread,

Mar 25, 2009, 9:45:18 PM3/25/09

to leo-e...@googlegroups.com

On Wed, Mar 25, 2009 at 5:19 PM, Ville M. Vainio <viva...@gmail.com> wrote:

On Wed, Mar 25, 2009 at 11:59 PM, Edward K. Ream <edre...@gmail.com> wrote:

> want the public files. To make this scheme work, we would want a way
> to force users to commit public/private pairs of files in synch. This
> is possible in principle, but I'm not sure how easy it would be. In
> the worst case, Leo users could fall back on @auto, but we really
> would rather avoid that sad eventuality.

Hashing to rescue! For file foo.py ver 3, we have a private version
2368237aaee.txt.py. The name is the hash digest of contents of foo.py.
So, when you open foo.py, leo will look up the right private filei in
a priv file repo. People can commit their private files a bit
off-sync, as long as the right files end up in the repo at some point.

Thanks for this: it looks like an immediate proof that we don't have to worry much about syncing public and private files.

In contrast, checksums don't help at all when trying to create unbreakable links. If a checksum fails, it tells us nothing about where we might find the changed node.

> BTW, it might be good for this case to put all files containing
> sentinels in a single place, so as not to overly pollute the directory
> structure in the repository.

Yeah, that's the repo :-). Like we have .bzr, we can have
.leo_structure in project root. If you send out a leo tree somewhere
(release), you will also ship that dir. GC can be done every now and
then.

> Unless I am horribly mistaken, @file! promises to relegate the
> (absolutely essential) sentinels to the background for almost all use
> cases. In effect, we get all the advantages of @thin nodes in all use
> cases.

Why not keep calling it @shadow, becase that's what it still remains?
The difference is that shadow files are also public files now.

I'm a bit shocked that you haven't found a glaring hole in the idea. I'm starting to confront how good this scheme might be. For example, it looks like @file! does not need to care whether the public or private files are committed to bzr. As another example, the docs will no longer have to describe the 9 (!) ways to create external files.

Obviously, we can call @file! anything we want. My thinking is that the new @file might actually be all there is, so @file would seem to be the clearest choice. I'm not wild about @shadow, because @file! is an amalgam of @thin, @auto, @shadow and maybe even @edit. We probably should leave the final name for later, after all the details become clearer.

Edward

Seth Johnson

unread,

Mar 25, 2009, 10:28:41 PM3/25/09

to leo-e...@googlegroups.com

On Wed, Mar 25, 2009 at 9:45 PM, Edward K. Ream <edre...@gmail.com> wrote:

I'm a bit shocked that you haven't found a glaring hole in the idea. I'm starting to confront how good this scheme might be. For example, it looks like @file! does not need to care whether the public or private files are committed to bzr. As another example, the docs will no longer have to describe the 9 (!) ways to create external files.

This is a very serious boon. I try to track the discussions here, but I can't because everything everybody says is "this would be good for @thin, but @shadow wouldn't work," "Yeah, you're right," etc. They become a shorthand that stops me from comprehension. And certainly anybody approaching Leo has got to stop short when they confront all those @ thingies.

I'd pipe up more if not for this. My approach to the Leo stable of concerns is very different, but I can't relate it to your problem du jour, so I just watch and muse . . .

Seth

Obviously, we can call @file! anything we want. My thinking is that the new @file might actually be all there is, so @file would seem to be the clearest choice. I'm not wild about @shadow, because @file! is an amalgam of @thin, @auto, @shadow and maybe even @edit. We probably should leave the final name for later, after all the details become clearer.

Whatever you call the @ directive, it seems your public vs. private files are really external vs. "Leo-specific".

Seth

Ville M. Vainio

unread,

Mar 26, 2009, 2:20:43 AM3/26/09

to leo-e...@googlegroups.com

On Thu, Mar 26, 2009 at 3:45 AM, Edward K. Ream <edre...@gmail.com> wrote:

>> Hashing to rescue! For file foo.py ver 3, we have a private version
>> 2368237aaee.txt.py. The name is the hash digest of contents of foo.py.

Grab a chair.

*this is the speedup cache I've been talking about*

It will not contain @thin like nodes. It will contain the whole data
structure in a pickle [(gnx1, h1, b1, (gnx1.1, h1.1, b1.1)),
(gnx2...)]

This will allow us to avoid the slow sentinel scanning.

So, besides providing a structure for @auto nodes, it can be used to
speed up @thin nodes. Also, if we have that file available, no @auto
scanning need to be done either - just grab the structure from quickly
read pickle.

So basically, this is a rehashed version of the hash speedup scheme,
with this twist:

- "cache" files actually contain valuable data, i.e. it's not just a
function of file contents.
- cache files are published.

There *are* a few holes in the scheme (garbage collection needs to be
thought of, @shadow preserves structure on external file change etc.)
. Just wanted to throw this out quickly.

Edward K. Ream

unread,

Mar 26, 2009, 6:28:10 AM3/26/09

to leo-e...@googlegroups.com

On Wed, Mar 25, 2009 at 9:28 PM, Seth Johnson <seth.p....@gmail.com> wrote:

On Wed, Mar 25, 2009 at 9:45 PM, Edward K. Ream <edre...@gmail.com> wrote:

I'm a bit shocked that you haven't found a glaring hole in the idea. I'm starting to confront how good this scheme might be. For example, it looks like @file! does not need to care whether the public or private files are committed to bzr. As another example, the docs will no longer have to describe the 9 (!) ways to create external files.

This is a very serious boon. I try to track the discussions here, but I can't because everything everybody says is "this would be good for @thin, but @shadow wouldn't work," "Yeah, you're right," etc.

:-) To have finally solved this problem will free up energy and creativity for other matters. You can not possibly imagine how much thought I have given it. I now longer have to explain why not solving this problem doesn't matter ;-) This is *the* problem that prevented many people from using Leo.

Whatever you call the @ directive, it seems your public vs. private files are really external vs. "Leo-specific".

Good point. The terminology must change for users. Internally (in the code) the private/public terminology was an important organizer, much clearer that the previous clumsy terminology (with and without sentinels). It's not so important to change the code-level terminology.

Edward

Edward K. Ream

unread,

Mar 26, 2009, 6:58:04 AM3/26/09

to leo-e...@googlegroups.com

On Thu, Mar 26, 2009 at 1:20 AM, Ville M. Vainio <viva...@gmail.com> wrote:

On Thu, Mar 26, 2009 at 3:45 AM, Edward K. Ream <edre...@gmail.com> wrote:

>> Hashing to rescue! For file foo.py ver 3, we have a private version
>> 2368237aaee.txt.py. The name is the hash digest of contents of foo.py.

Grab a chair.

*this is the speedup cache I've been talking about*

Very cool. I don't understand the details, but clearly it opens many possibilities.

It will not contain @thin like nodes. It will contain the whole data
structure in a pickle [(gnx1, h1, b1, (gnx1.1, h1.1, b1.1)),
(gnx2...)]

This will allow us to avoid the slow sentinel scanning.

What you are saying, as I understand it, is that the hidden (shadow) files could have any format we want. They don't have to contain "traditional" sentinels. They wouldn't even actually have to be text files!

BTW, this has a bearing on the (misnamed) question of "what do we call @file! ?". The proper question is, instead, "will there still be a role for @thin?". My first thought was, "surely yes": @thin is the most elegant and safest way when collaboration is not a factor. All data is in a single file, so syncing issues never arise.

But now I am not so sure. If caching speeds up sentinel scanning, a strong case can be made for using @shadow for *all* files, even in situations where collaboration is not a factor at all.

Again, this is very cool. Using @file! for *all* situations looks like a big performance win.

. Just wanted to throw this out quickly.

I'm glad you did. I want to clarifysome other questions on this thread before delving into your idea further, but this is a great idea that I want to pursue immediately.

Edward

Reply all

Reply to author

Forward