Verifying that @file! will work

1 view
Skip to first unread message

Edward K. Ream

unread,
Mar 26, 2009, 12:27:17 PM3/26/09
to leo-editor
QQQ
let me skip ahead to what might be called the "real" Aha, namely that
we can imagine three
(or is it four?) ways of using bzr.

0. Don't use bzr at all. There is no cooperation.
1. A fully Leonine environment, such as leo-editor. All committers
to the repository use Leo.
2. A fully non-Leonine environment. At most one committer uses Leo.
3. A mixed Leonine environment. More than one, but not all,
committers use Leo.
QQQ

As I see it, the primary task at present is to verify that @file! will
work in all four environments. Until this is completely verified, all
other issues, while exciting in their own right, could be called
"premature optimizations".

Clearly, cases 0 and 2 pose no problems. The bzr repository will use
so-called "public" files, that is, files without Leo sentinels.
Indeed, it would not be possible, looking at the bzr repository, to
determine that anyone is using Leo.

OTOH, for cases 1 and 3 it *will* be possible to infer that various
people are using Leo. Some or all of the source files will contain
Leo sentinels (or their equivalent, if Leo's sentinel cashing scheme
is used).

The real purpose of this thread is to ask the following picky
questions:

Question 1: Exactly what will the bzr repository contain? and

Question 2: Exactly how will public and private files be put in sync?

At present, I don't have a clear answer for either question. Yes, I
know we have discussed these questions a bit, but I don't think we
have really thought things through yet.

BTW, I had a "little" Aha re cases 1 and 3, the two collaborative
cases. It seems to me that we should focus on the *hard* case, case
3, rather than the easy case, case 1. Indeed, leo-editor may remain
the only project for which case 1 applies.

Rather than "cheating" by using case 1 for leo-editor, I think leo-
editor should "eat its own dog food" and use the case 3 approach.
That is, both the public and private files should be committed to the
bzr repository for leo-editor. If leo-editor can't live with this
approach, it would be foolish to expect anyone else to do so.

The Aha is both a challenge and a simplification. It's a challenge
because it forces us to confront the hard issues. It's a
simplification because we no longer have to worry much about case 1.

This is just the start of the verification process. But I thought I
would emphasize the significant work that remains.

Edward

Edward K. Ream

unread,
Mar 27, 2009, 7:46:31 AM3/27/09
to leo-editor
On Mar 26, 11:27 am, "Edward K. Ream" <edream...@gmail.com> wrote:

> The real purpose of this thread is to ask the following picky questions:

I am steadfastly refusing to get excited about @file! until all
questions are fully resolved. Hand waving will not do.

> Question 1: Exactly what will the bzr repository contain?

This is the easier question. Indeed, if we focus on the case 3 (some,
but not all people use Leo), the answer is clear: the repository will
contain the public files in their usual places, and the private files
in some inconspicuous place, where most people won't notice.

This is a significant change from the leo-editor repository. At
present, the files in leo/core and leo/plugins are @thin files
containing sentinels. In the new scheme, the files in leo/core and
leo/plugins will be public files without sentinels.

In this regard, the harder question to answer involves case 1
(everyone uses Leo), but we can ignore that case now that we treat it
as equivalent to case 3.

> Question 2: Exactly how will public and private files be put in sync?

This is a much harder question. I've been daydreaming about what the
real issues are. Some preliminary thoughts:

1. Each private file will contain timestamp and checksum information
about itself and the corresponding public file. By definition, the
format of private files can contain anything without offending non-Leo
users, so the sky is the limit. It may also turn out to be useful to
have a single timestamp/checksum line at the start of *public* files.
However, this would be a measure of last resort, as non-Leo users
would likely complain loudly. OTOH, all source code control systems
have mechanisms for putting time stamps in files automatically, so
maybe this would not be too odious to non-Leo users.

We can use this information to detect out-of sync conditions.
Probably, Leo will use the actual file modification dates (not the
cached dates) to determine whether to use the full-blown @shadow
update algorithm (the public file is newer) or just use the normal
read algorithm (the private file is newer or contemporaneous with the
public file).

2. Inspired by Ville's idea of a multi-pass algorithm to implement
sentinel padding, I see that there is no need to revise the
fundamental @shadow update algorithm to handle changed private-file
format. No matter what the private file's format morphs into, a
prepass could recreate the old (@thin) format from the new format.
Leo would then apply the @shadow update algorithm to the reconstituted
@thin-format text. In practice, it is rare to use the @shadow update
algorithm, so this should not be a performance concern.

Presumably this would be the simpler option. I'm not entirely ruling
out changes to the @shadow update algorithm, but it's not something I
would do lightly. In contrast, converting new-style private files to
old-style private files seems like a straightforward approach.
Furthermore, the unit tests involved would seem to be useful in
developing the new-style format in the first place.

3. It is still not clear how to simplify the task of committing files
in synch. At present, I think that ensuring synchronization when
files are committed would be very handy. In contrast, detecting and
dealing with out-of-synch files when pulling or merging appears to be
asking for trouble. From my experience, it is hopeless to ask any
user (including, especially, me) to sort out what to do when a problem
arises while reading a .leo file. The user is surprised, confused and
not in possession of enough data to make a reasonable choice. Out-of-
synch warning dialogs can *never* be a design option. If they are
required, the whole scheme fails. Period.

4. Happily, bzr is written in Python, so it is conceivable to hack bzr
to aid Leo updates. OTOH, this probably would not be tolerated in non-
Leo environments, and may not work so well in cvs and other non-Python
source code control systems. In any case, I am willing, at least for
now, to consider allowing bzr to help, if only as a thought
experiment.

As you can see, significant issues remain. This hardly surprises me.
The difficulty of synchronizing files reliably is *the* reason* Leo
has always used sentinels.

Perhaps this will turn out to be an intractible problem, once again.
The difference this time, however, is that the problem is being
examined in a new context, namely that of bzr in particular and source
code control systems in general. And we must not forget the new tools
available to Leo: @shadow and @auto. Taken together, the new context
and new tools may make possible a reliable solution that was not ever
available before. We shall see. This is most definitely an open
question.

I am hoping for your comments and suggestions. This is an important
problem, and it is by no means totally solved at present.

Edward

Ville M. Vainio

unread,
Mar 27, 2009, 8:46:03 AM3/27/09
to leo-e...@googlegroups.com
On Fri, Mar 27, 2009 at 1:46 PM, Edward K. Ream <edre...@gmail.com> wrote:

> This is a much harder question.  I've been daydreaming about what the
> real issues are. Some preliminary thoughts:
>
> 1. Each private file will contain timestamp and checksum information
> about itself and the corresponding public file.  By definition, the

Timestamp is very fragile, checksum is the safe & correct way to go.

> users, so the sky is the limit.  It may also turn out to be useful to
> have a single timestamp/checksum line at the start of *public* files.

Contents of public file in itself gives the checksum.

> We can use this information to detect out-of sync conditions.

Again, checksum is different => we are out of sync. It's that simple.

> not in possession of enough data to make a reasonable choice.  Out-of-
> synch warning dialogs can *never* be a design option.  If they are
> required, the whole scheme fails.  Period.

Being out of sync will just revert to @auto import.

> 4. Happily, bzr is written in Python, so it is conceivable to hack bzr
> to aid Leo updates.  OTOH, this probably would not be tolerated in non-

No need to hack bzr, it's possible to create pre-commit hooks for
these things. Conceivable, a pre-commit hook would filter out
"garbage" files (unused private files).

> As you can see, significant issues remain.  This hardly surprises me.
> The difficulty of synchronizing files reliably is *the* reason* Leo
> has always used sentinels.

I think @thin should still be used for situations where everybody uses
leo most of the time.

This is also a reason to proceed with @sentinelpadding - it's the
simplest thing that could possibly work.

--
Ville M. Vainio
http://tinyurl.com/vainio

Edward K. Ream

unread,
Mar 27, 2009, 10:06:14 AM3/27/09
to leo-editor
On Mar 27, 6:46 am, "Edward K. Ream" <edream...@gmail.com> wrote:

> I am steadfastly refusing to get excited about @file! until all
> questions are fully resolved.  Hand waving will not do.

Standing in the shower a few minutes ago I saw the essence of the
situation.

I am now officially starting to allow myself to get excited :-)

The big Aha is that there is *no need* to keep public and private
files in synch. Indeed, in an environment in which not everyone uses
Leo, it is *not possible* to keep files in synch. But that doesn't
matter!

There are only two real cases to consider:

Case 1: A public file is committed, but not the corresponding private
file.

Case 2: A private file is committed, but not the corresponding public
file.

Let's consider what will happen if somebody does an bzr merge or pull
in each situation.

Case 1: The user will get the updated public file. If the user
doesn't use Leo, the user will still get the "useful" version of the
file. If the user *does* use Leo, the next time the user reloads the
relevant .leo file, Leo will use the full @shadow algorithm to update
the corresponding private file. Of course, the "real" private file
won't be available, but so what? At most some structure info will be
lost.

Case 2: The user will get the updated private file. If the user
doesn't use Leo, it will be as if nothing got committed. If the user
does use Leo, the @shadow logic will see that the private file is more
recent than the public file, so the @shadow logic will *ignore* the
old public file and use only private file.

I believe that in each case, Leo will continue to work when the
"laggard" file is eventually committed. I suppose there are some
"round trip" proofs involved, so not all the i's have been dotted and
the t's crossed, but I don't anticipate problems.

In short, there is *no need* to keep public and private files in
synch. It's not possible anyway, because non-Leo users will *never*
commit private files.

I believe that this may indeed solve the problem that I have been
working on since approximately 1996.

Edward

Edward K. Ream

unread,
Mar 27, 2009, 10:19:57 AM3/27/09
to leo-editor
On Mar 27, 9:06 am, "Edward K. Ream" <edream...@gmail.com> wrote:

> I believe that in each case, Leo will continue to work when the
> "laggard" file is eventually committed.  I suppose there are some
> "round trip" proofs involved, so not all the i's have been dotted and
> the t's crossed, but I don't anticipate problems.

Just to be clear, the question is whether a Leo user will get the same
result regardless of the order in which they pull the public or
private version of a particular file from bzr. This is similar to the
square "transitivity diagrams" one sees throughout advanced abstract
algebra texts.

My intuition is that the results will always be the same. Not proven,
but we can stand the situation on its head by requiring that the
@shadow logic preserve this transitivity requirement. Again, I do not
foresee any major problems in this area.

Edward

P.S. This issue never arises for non-Leo users because they never use
private files.

P.P.S. I suppose it is just barely possible that some kind of
interweaving of commits from Leo and non-Leo users could cause
problems, but for now this does not concern me.

EKR
Reply all
Reply to author
Forward
0 new messages