A new db-oriented vision for Leo?

87 views
Skip to first unread message

Edward K. Ream

unread,
Dec 19, 2011, 11:59:12 AM12/19/11
to leo-editor
This post is pure speculation: the vision may turn out to be a
mirage. It's also long. Feel free to ignore.

Otoh, the ideas here are completely new, and exist in a new design
space.

Last night I realized that the a db-oriented Leo would run into
immediate problems: what to do about @file nodes and external files?

When I awoke this morning I saw that a new direction: simpler in some
ways, but wide ranging and potentially impossibly complex.

Influences
========

Rereading the sqlite documentation primed the subconscious pump:

There was a note (somewhere) about simplicity being the foundation of
(I think) sqlite.

As you will see, fossil's sha1 keys are very important.

The various behind-the-scenes complexities of fossil also contributed
somehow.

Summary
=======

Let me try to give a big picture overview of my thoughts, before the
myriad complexities arise. The challenge will be to create simplicity
everywhere. The simplicity *must* be real: it must be on the order of
"webs are outlines in disguise".

Suppose *all* (or almost all) information is contained in a **Leonine
db**. The kind of the db doesn't matter, except insofar as it
supports what is needed. I'll assume sqlite db, as required by
fossil, possibly extended.

Suppose external files start with one or more sha1 keys, which are the
*only* sentinels in a file. For example, for .py files::

#@sha1: <sha1 key>

We can think of this line as a "universal link" into a database
created by a particular program.

Multiple programs or projects might want to add such links, so it
might be good to include a "statement of responsibility"

#@sha1: <contributor A's url> <sha1 key>
#@sha1: <contributor B's url> <another sha1 key>

For Leo, the url would be https://launchpad.net/leo-editor, and the
link would be a link into the Leonine db.

As explained below, we can use a single Leonine key even though
multiple Leonistas have edited the file.

As a kind of commit hook, we will probably want to update this key
when committing a change to the external file.

Using sha1 keys effectively
=====================

The world has not begun to appreciate how cool sha1 keys are. Any
program or project may generate them by the millions, with no fear of
conflict. Thus, we can consider sha1 keys to **be** any kind of data
structure we like!

Thus, the one and only (Leonine) key in each external file can
represent *all* data associated with the file, not just *any* data
associated with the file. That is, we can imagine "amalgamating" data
structures (dicts) that contain, for instance:

- The complete outline structure of the file.
- The bzr/fossil revision info,
- The (link to) the .leo file that contains the external file,
- The file paths of the .leo file and all external files,
- The list of all people who have contributed to the file, and the
revisions that each individually has made. (Think bzr blame).
- Whatever else could possibly be useful.

So the sha1 key in the external file refers to this amalgamating data
structure. Furthermore, the format of the amalgamating data
structures can change at will, with no fear of sha1 conflicts. Thus,
data structure formats are completely dynamic: there is no such thing
as being incompatible with old data!

If we use fossil, fossil will associate *other* sha1 keys with this
amalgamated info (and the constituents), but we don't care. We can
safely assume that these other sha1 keys will never conflict with our
own sha1 keys.

All this (and more) is part of the under-appreciated "magic" of sha1
keys.

Everything disappears (temporarily)
===========================

If all (or almost all) data appears in a Leonine sqlite db we can say
the following:

- There is no need for a Leo cache.
- There is no need for private @shadow files.
- We can use the @shadow algorithm for *all* files, including @file,
@auto, etc.

That is, there is a single, universal, @file node.

Files seem to have disappeared, but they must be recreated somehow.
We can't avoid @path complications because we eventually have to
recreate external files *in the proper places*.

The challenges
============

It's all very well to put all (or almost all) info in the Leonine db.
The challenges are:

1. To create amalgamating data structures that can be updated simply.
Complexity here will likely doom the project.

2. To coordinate *distributed* Leonine db's without rewriting fossil
or bzr or sqlite ;-)

3. To continue to use most, if not all, of Leo's core code without
change, except where the existing code is no longer needed.

Point 2 seems to be the biggest challenge. Let us consider a .leo
file as a form of *personal* view into the Leonine db. EKR's view
might be analogous to leoPyRef.leo, but my view should not be
"privileged" in any way: any user should be able to access the
**published view** of any other user. A published view will be (or
comprise) an amalgamating data structure.

I don't know fossil well enough to know how my ideas map onto fossil
constructs. I suspect, though, that some correspondence will be
possible. Otherwise, the project fails challenge 2.

Of course, fossil is not the only possible way, but as a practical
matter challenge 2 is inviolate: I don't plan to create "yet another
git" unless the payoffs (and the human help) are huge.

Your thoughts please, Amigos.

Edward

Seth Johnson

unread,
Dec 19, 2011, 1:14:40 PM12/19/11
to leo-e...@googlegroups.com
On Mon, Dec 19, 2011 at 11:59 AM, Edward K. Ream <edre...@gmail.com> wrote:

<< SNIP >>

> Summary
> =======
>
> Let me try to give a big picture overview of my thoughts, before the
> myriad complexities arise.  The challenge will be to create simplicity
> everywhere.  The simplicity *must* be real: it must be on the order of
> "webs are outlines in disguise".


All formats are (outline) contexts with state, and with data elements
with scopes of relevance.


> Suppose *all* (or almost all) information is contained in a **Leonine
> db**.  The kind of the db doesn't matter, except insofar as it
> supports what is needed.  I'll assume sqlite db, as required by
> fossil, possibly extended.
>
> Suppose external files start with one or more sha1 keys, which are the
> *only* sentinels in a file.  For example, for .py files::
>
>    #@sha1: <sha1 key>
>
> We can think of this line as a "universal link" into a database
> created by a particular program.

<< SNIP >>

> The world has not begun to appreciate how cool sha1 keys are.  Any
> program or project may generate them by the millions, with no fear of
> conflict.  Thus, we can consider sha1 keys to **be** any kind of data
> structure we like!
>
> Thus, the one and only (Leonine) key in each external file can
> represent *all* data associated with the file, not just *any* data
> associated with the file.  That is, we can imagine "amalgamating" data
> structures (dicts) that contain, for instance:
>
> - The complete outline structure of the file.
> - The bzr/fossil revision info,
> - The (link to) the .leo file that contains the external file,
> - The file paths of the .leo file and all external files,
> - The list of all people who have contributed to the file, and the
> revisions that each individually has made. (Think bzr blame).
> - Whatever else could possibly be useful.
>
> So the sha1 key in the external file refers to this amalgamating data
> structure.  Furthermore, the format of the amalgamating data
> structures can change at will, with no fear of sha1 conflicts.  Thus,
> data structure formats are completely dynamic: there is no such thing
> as being incompatible with old data!


Just to put in your pipe -- but note my last comment at the bottom,
that this is mostly for thinking things through:


You can have one "Leonine" key that serves to point to your
amalgamating structure, or you could have a fixed number of keys in
the external file that designate the Leonine-amalgamating structure as
a "context" defined in terms of a uniform/universal data structure.

The above items are all types of uses of types of nodes (which I call
links). Each type of use combined with a type of link can have
whatever nodes (or attributes of nodes) are relevant to that context,
with "context" defined as a particular instance within a relationship
of a use type with a link type. So the external file can have a fixed
set of sha1 keys that correlate with that definition: a key for the
use type (such as a Leo file), another for link type (such as a Leo
node), another for particular use (LeoPyRef.leo), plus some more
similar keys to designate state (a few comments on that below).


<< SNIP >>


>  The challenges
> ============
>
> It's all very well to put all (or almost all) info in the Leonine db.
> The challenges are:
>
> 1. To create amalgamating data structures that can be updated simply.
> Complexity here will likely doom the project.
>
> 2. To coordinate *distributed* Leonine db's without rewriting fossil
> or bzr or sqlite ;-)
>
> 3. To continue to use most, if not all, of Leo's core code without
> change, except where the existing code is no longer needed.
>
> Point 2 seems to be the biggest challenge.  Let us consider a .leo
> file as a form of *personal* view into the Leonine db.  EKR's view
> might be analogous to leoPyRef.leo, but my view should not be
> "privileged" in any way: any user should be able to access the
> **published view** of any other user.  A published view will be (or
> comprise) an amalgamating data structure.


A uniform data structure would include a general representation of
state, and that can include individual "standpoints" -- yet that
standpoint can contain elements that others freely include in their
own contexts. The point of a standpoint is about generality (or
rather, making a state particular so you can work things out); it's
not about published or shared or not, which can be their own toggles
independently of state. State is about generality, not publishing or
shareability or collaborative modifiability. So my personal
standpoint I could make accessible or editable at large, but what I'm
trying to do is to muck around with it before I make its state more
general -- that is, intended to be pertinent to everybody in some
broader "space" or among some one or more "locations." So I just
designate it as my own standpoint until I'm ready to designate what
state it's designed to exist within. Naturally, any element
(data/text/node) within my contexts, regardless of the generality of
their state, can be placed into other contexts with different states.
What you do with those elements in my own contexts, just happens
within them, their states and their scopes of relevance.


> I don't know fossil well enough to know how my ideas map onto fossil
> constructs.  I suspect, though, that some correspondence will be
> possible.  Otherwise, the project fails challenge 2.
>
> Of course, fossil is not the only possible way, but as a practical
> matter challenge 2 is inviolate: I don't plan to create "yet another
> git" unless the payoffs (and the human help) are huge.


I haven't thought much at all about integration with revision control
systems, except to state that I wish those systems would be built
around a uniform/universal data structure. They would then be
automatically interoperable.

Given the complexities of contemplating development in that area
(every store has its own architecture, so whatever existing code you
might find would be tailored to those), this is probably where my
ideas will be hard to work with. But maybe my ideas can help clarify
or organize conceptions.


Seth

HansBKK

unread,
Dec 19, 2011, 9:58:00 PM12/19/11
to leo-e...@googlegroups.com
On Monday, December 19, 2011 11:59:12 PM UTC+7, Edward K. Ream wrote:

>    This post is pure speculation: the vision may turn out to be a mirage.  It's also long.  Feel free to ignore.

>    Otoh, the ideas here are completely new, and exist in a new design space.

Reading this fired off so many synapses, generated so much dopamine etc., was as enjoyable to me as sex or quality chocolate. And that without even understanding most of the implementation side, just imagining that you consider such things possible. . .

And you know already I hardly have a clue what I'm talking about here, completely unable to respond at the same level of abstraction and likely just flat out wrong, so of course feel free to ignore, but sometimes with blue-sky thinking ignorance can be an advantage 8-)


>    2. To coordinate *distributed* Leonine db's without rewriting fossil or bzr or sqlite ;-)

>    Point 2 seems to be the biggest challenge.

By "db" do you mean the underlying database engine (eg sqlite?). IOWs what needs to be distributed? Syncronizing multiple database servers would indeed be IMO overly complex.

To me, the fundamental unit that provides the necessary simplicity is the node. In an always-connected model the server acts as the centralized coordinating storage mechanism to avoid conflicts. That doesn't seem to me to be what we're after here, we want to allow data mods by disconnected users to be synchronized back to the "center" which in these modern distributed systems is just a concept, implementation details determined by the SOP/workflow as defined by a given work group.

To effectively leverage Fossil (or whatever) VCS as a data store without having to write your own, it seems to me that both the nodes and the "views" need to be stored out in the filesystem in a standardized way, most likely each node being a separate file. I could see "view files" as representing each tree defined by the "top-level nodes" - that would allow me to pull in or drop a given tree from my current "collection" as needed.

Obviously this is completely separate from the @ <file> mechanism, which is IMO just the export mechanism at a much higher level, after "assembly" by the "view", or import when dealing with the non-Leo world.

So to my mind (admittedly limited), if we're really focused on the distributed side, why not just stick with the filesystem as storage model - isn't it the database that's complicating the picture vis-à-vis the VCS?

And if the goal is to be able to work with a given distributed VCS without customizing it, the why not set a goal of being able to work with any of them? Then the group can choose to work with whatever distributed VCS they like, and you don't have to rewrite big chunks of code to accommodate the shifting sands of that domain.

It also forces the simplicity mandate - if your new model of data storage is a simply a collection of plain-text diff-able files in folders, at that level they all **are** automatically interoperable.



Edward K. Ream

unread,
Dec 20, 2011, 9:26:45 AM12/20/11
to leo-e...@googlegroups.com
On Mon, Dec 19, 2011 at 9:58 PM, HansBKK <han...@gmail.com> wrote:

> Reading this fired off so many synapses, generated so much dopamine etc.,
> was as enjoyable to me as sex or quality chocolate. And that without even
> understanding most of the implementation side, just imagining that you
> consider such things possible. . .

:-) BTW, I recently I had a big personal aha. I'm not a chocoholic:
I'm a sugar, salt and fat-aholic. It expands my range of deserts,
while letting me avoid "chocolates" that are, in fact, as salty as
chips. But I digress...

> And you know already I hardly have a clue what I'm talking about here,
> completely unable to respond at the same level of abstraction and likely
> just flat out wrong, so of course feel free to ignore, but sometimes with
> blue-sky thinking ignorance can be an advantage 8-)

There is no need to apologize. I'm always in the same boat when
significant invention is happening. The primary requirement, as I
have often said, is to be able to live with, tolerate, and even enjoy
massive confusion.

Edward

Edward K. Ream

unread,
Dec 20, 2011, 9:36:36 AM12/20/11
to leo-editor
On Mon, Dec 19, 2011 at 11:59 AM, Edward K. Ream <edre...@gmail.com> wrote:
> If all (or almost all) data appears in a Leonine sqlite db we can say
> the following:
>
> - There is no need for a Leo cache.
> - There is no need for private @shadow files.
> - We can use the @shadow algorithm for *all* files, including @file,
> @auto, etc.

Wrong, wrong, wrong.

@shadow will *never* be the model for most files because it demotes
structure info to second-class status, namely, personal preference
grafted on to the "real" data. But the *essence* of Leo is that
structure is first-class data.

@shadow is fine for non-cooperative (private) environments. In that
case, the "preference" structure is, in fact, the only structure there
is. But in shared environments outline structure must be part of each
external file. Thus, sentinels are, in general, essential as well.

This must be the fourth or fifth time I have rediscovered this basic
principle. In the past, the emphasis has been mostly on sentinels,
but here we see that the underlying principle is that outline
structure must be first-class data in shared (cooperative,
distributed) environments. So this is progress of a sort.

As a direct consequence, any approach that abandons sentinels must be
rejected. I don't know, in detail, this affects the current
discussion, but I think Seth and Hans have ideas that are compatible
with this principle.

Edward

Edward K. Ream

unread,
Dec 20, 2011, 9:41:03 AM12/20/11
to leo-editor
On Mon, Dec 19, 2011 at 11:59 AM, Edward K. Ream <edre...@gmail.com> wrote:

> All this (and more) is part of the under-appreciated "magic" of sha1
> keys.

I suspect this whole line of thought is a dead end: there is no way,
in a cooperative environment, to separate outline structure from
external files, which is the actual intent of these keys. Thus, the
keys are an attempt to do the wrong thing.

You could say this is "comforting" because the idea of some kind of
gigantic amalgamation of data structures would likely have been a
truly bad design: way too complex to be feasible in practice.

Edward

P.S. While in confusion/invention mode, it is vital to be as clear as
possible about what one does, in fact, know clearly. So this is
progress, even if it seem negative in character.

EKR

Seth Johnson

unread,
Dec 20, 2011, 10:00:51 AM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 9:41 AM, Edward K. Ream <edre...@gmail.com> wrote:
> On Mon, Dec 19, 2011 at 11:59 AM, Edward K. Ream <edre...@gmail.com> wrote:
>
>> All this (and more) is part of the under-appreciated "magic" of sha1
>> keys.
>
> I suspect this whole line of thought is a dead end: there is no way,
> in a cooperative environment, to separate outline structure from
> external files, which is the actual intent of these keys.  Thus, the
> keys are an attempt to do the wrong thing.


As long as nobody's changing the external files. If they aren't, the
cooperation can be done in the database(s), with external files just
written as the "current situation" of the database representation.

Now, if people recognize the database representation as universal for
any app, then they may readily accept that they shouldn't mess with
the flat file. To me, no separate flat file should be needed, as all
"files" should be universally interoperable, distributed and
outlineable contexts.


Seth

> You could say this is "comforting" because the idea of some kind of
> gigantic amalgamation of data structures would likely have been a
> truly bad design: way too complex to be feasible in practice.
>
> Edward
>
> P.S.  While in confusion/invention mode, it is vital to be as clear as
> possible about what one does, in fact, know clearly.  So this is
> progress, even if it seem negative in character.
>
> EKR
>

> --
> You received this message because you are subscribed to the Google Groups "leo-editor" group.
> To post to this group, send email to leo-e...@googlegroups.com.
> To unsubscribe from this group, send email to leo-editor+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
>

HansBKK

unread,
Dec 20, 2011, 10:19:54 AM12/20/11
to leo-e...@googlegroups.com
On Tuesday, December 20, 2011 9:36:36 PM UTC+7, Edward K. Ream wrote:

@shadow will *never* be the model for most files because it demotes
structure info to second-class status, namely, personal preference
grafted on to the "real" data.  But the *essence* of Leo is that
structure is first-class data.

@shadow is fine for non-cooperative (private) environments.  In that
case, the "preference" structure is, in fact, the only structure there
is.  But in shared environments outline structure must be part of each
external file.  Thus, sentinels are, in general, essential as well.


On Tuesday, December 20, 2011 9:41:03 PM UTC+7, Edward K. Ream wrote:
there is no way, in a cooperative environment, to separate outline structure from external files, which is the actual intent of these keys.  Thus, the

keys are an attempt to do the wrong thing.

 
I'm most likely completely wrong, but it seems to me that the above corresponds to the current reality of Leo's storage implementation.

My understanding is as follows - please do let me know if/how I am wrong: given a single user, and never looking at the external files generated @shadow and @thin are functionally equivalent. The difference is how the "view" or outline information (and what other "meta" stuff is stored in sentinels) is stored, the former simply keeping it separate from the content, the latter embedded.

This difference is the cause of @shadow's **current** problem of not being able to share that meta information between users, because right now the only way to share such data is via embedded sentinels.

But I thought one of the main points of your OP (switching over to the "imagined possible" reality) was the ability to store (almost?) everything related to the content, outside of the individual user's Leo file, in particular the "views" created by that user. Each node has a unique identifier (sha1 strings replacing gnx?) and these data are actually stored outside the Leo files. The input/output routines with the @ <file> files are no longer the "primary canonical" locations, but something like reports from a database, output from compiles - or imports being matched against Leo's nodes by a particular view.

The "view" (outline+) data is now an externally stored and therefore shareable data object, kept in the distributed VCS, right? In this model I don't understand the necessity for the data currently carried by sentinels to continue to be stored inside the @ files.


Kent Tenney

unread,
Dec 20, 2011, 10:55:33 AM12/20/11
to leo-e...@googlegroups.com
FWIW, I have little interest in collaborative editing of Leo files, I
consider Leo
files to be "views" on data. In my world the structure is therefore
second class,
a personalization of the first class data. This puts me 180 degrees from Edward,
but that's the beauty of Leo.

I am extremely interested in a db backend because it offers the potential for
'time travel' to aid my wandering in the information wilderness. I want arrows
which take me back and forth in versions of a node.

My last push in this direction stalled when hooking node select and deselect
events seemed unreliable, but I am sure I'll return to it at some point.

Thanks,
Kent

Edward K. Ream

unread,
Dec 20, 2011, 11:48:35 AM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 10:00 AM, Seth Johnson <seth.p....@gmail.com> wrote:

>> Thus, the keys are an attempt to do the wrong thing.

> As long as nobody's changing the external files.  If they aren't, the
> cooperation can be done in the database(s), with external files just
> written as the "current situation" of the database representation.

I don't thing there is anything interesting in this direction. The
fundamental problem is keeping information in synch. By far the
simplest thing that could possibly work is to put sentinels in files.

> Now, if people recognize the database representation as universal for
> any app, then they may readily accept that they shouldn't mess with
> the flat file.  To me, no separate flat file should be needed, as all
> "files" should be universally interoperable, distributed and
> outlineable contexts.

The "everything in one file" approach seems to me to be a mirage. It
doesn't really add anything except a lot of complexity.

Having said that, we can imagine a "clone server", as has been
suggested, being implemented using a single sqlite file.

It seems to me that the real problem is to make sense out of BH's
(LeoUser's) old old suggestion to create a "sea of nodes". This
raises many other issues. For example, does a node include its child
links? (Imo, it probably should not include parent links, as those
links would depend on context).

At present, I don't see a good way forward. That's ok: I've got bugs to fix ;-)

Edward

Edward K. Ream

unread,
Dec 20, 2011, 11:54:24 AM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 10:19 AM, HansBKK <han...@gmail.com> wrote:

> given a single user, and never looking at the external files generated
> @shadow and @thin are functionally equivalent.

@thin "carries" structural information in the actual external file.
@shadow carries structure in the private file. The difference is
subtle, but important.

When you do a bzr pull on a node containing sentinels you get the
*new* structure from the push. When you do a bzr pull on an public
@shadow file you get the new data but *not* any new structure: the
@shadow update algorithm "invents" a way to incorporate the new data
using the *old* structure. In general, this will be a lot less
convenient than using files with sentinels.

> But I thought one of the main points of your OP (switching over to the
> "imagined possible" reality) was the ability to store (almost?) everything
> related to the content, outside of the individual user's Leo file, in
> particular the "views" created by that user.

I am no longer sure what the original intention was ;-)

Edward

Seth Johnson

unread,
Dec 20, 2011, 12:14:42 PM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 11:48 AM, Edward K. Ream <edre...@gmail.com> wrote:
> On Tue, Dec 20, 2011 at 10:00 AM, Seth Johnson <seth.p....@gmail.com> wrote:
>
>>> Thus, the keys are an attempt to do the wrong thing.
>
>> As long as nobody's changing the external files.  If they aren't, the
>> cooperation can be done in the database(s), with external files just
>> written as the "current situation" of the database representation.
>
> I don't thing there is anything interesting in this direction.  The
> fundamental problem is keeping information in synch.  By far the
> simplest thing that could possibly work is to put sentinels in files.
>
>> Now, if people recognize the database representation as universal for
>> any app, then they may readily accept that they shouldn't mess with
>> the flat file.  To me, no separate flat file should be needed, as all
>> "files" should be universally interoperable, distributed and
>> outlineable contexts.
>
> The "everything in one file" approach seems to me to be a mirage.  It
> doesn't really add anything except a lot of complexity.


For me, there don't need to be any files, just a lot of "nodes" and
their attributes -- but they are all brought together particular
purposes using one uniform formal schema, so all that really needs to
be managed and transferred between contexts and physical servers while
you're manipulating your "views," is metadata. When you break it down
this way, the attribute values are hosted at multiple particular
servers, from which they can be accessed as you navigate through and
read a full "file" (a "context") -- but when you manipulate it, it's
only the structure that's distributed, in a standard uniform metadata
format.

Seth

> Having said that, we can imagine a "clone server", as has been
> suggested, being implemented using a single sqlite file.


Which might be said to be similar to what I'm talking about, except I
provide for specifying contexts, states, and scopes of relevance for
nodes and their attribute values.


> It seems to me that the real problem is to make sense out of BH's
> (LeoUser's) old old suggestion to create a "sea of nodes".  This
> raises many other issues.  For example, does a node include its child
> links?  (Imo, it probably should not include parent links, as those
> links would depend on context).


In my architecture, nodes point up to their parents. You access all
the nodes for a context, and they come with pointers to their parents,
and maybe the server can traverse that and give them to you in outline
order, or you can do that locally once the server just provides the
relevant nodes.


> At present, I don't see a good way forward.  That's ok: I've got bugs to fix ;-)


:-)


Seth


> Edward

Kent Tenney

unread,
Dec 20, 2011, 12:58:24 PM12/20/11
to leo-e...@googlegroups.com
>
> In my architecture, nodes point up to their parents.

How do you maintain sibling order?

I've been settling on node "addresses" in metadata,
it's a list of child indexes. It provides each node with a
self contained description of where it lives, which I like.
I haven't come across a more succinct way to do that.

Seth Johnson

unread,
Dec 20, 2011, 1:24:47 PM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 12:58 PM, Kent Tenney <kte...@gmail.com> wrote:
>>
>> In my architecture, nodes point up to their parents.
>
> How do you maintain sibling order?
>
> I've been settling on node "addresses" in metadata,
> it's a list of child indexes. It provides each node with a
> self contained description of where it lives, which I like.
> I haven't come across a more succinct way to do that.


A counter field that you add to an index on the parent key field.

So you search for the record that has no parent key -- that's the top.

Then you search for that record's key in the parent key index -- that
will have its children in sibling order.

And so forth.

I don't do SQL searches, which would probably make this slow, just a
local navigation algorithm on the index.

You seem to have hit on something similar to my scheme with your self
contained descriptions of where nodes live. The records I'm speaking
of are just fields holding keys in that uniform structure I speak of,
so each record holds the context -- plus the particular "node" key,
which I call a link.

"Nodes" are actually also metadata -- you don't really get "data data"
until you seek attribute values.


Seth

Kent Tenney

unread,
Dec 20, 2011, 1:39:08 PM12/20/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 12:24 PM, Seth Johnson <seth.p....@gmail.com> wrote:
> On Tue, Dec 20, 2011 at 12:58 PM, Kent Tenney <kte...@gmail.com> wrote:
>>>
>>> In my architecture, nodes point up to their parents.
>>
>> How do you maintain sibling order?
>>
>> I've been settling on node "addresses" in metadata,
>> it's a list of child indexes. It provides each node with a
>> self contained description of where it lives, which I like.
>> I haven't come across a more succinct way to do that.
>
>
> A counter field that you add to an index on the parent key field.
>
> So you search for the record that has no parent key -- that's the top.
>
> Then you search for that record's key in the parent key index -- that
> will have its children in sibling order.
>
> And so forth.
>
> I don't do SQL searches, which would probably make this slow, just a
> local navigation algorithm on the index.
>
> You seem to have hit on something similar to my scheme with your self
> contained descriptions of where nodes live.  The records I'm speaking
> of are just fields holding keys in that uniform structure I speak of,
> so each record holds the context -- plus the particular "node" key,
> which I call a link.
>
> "Nodes" are actually also metadata -- you don't really get "data data"
> until you seek attribute values.

It sounds like the moral equivalent of a linked list.

The "address" approach I've been leaning towards feels more
robust / useful, but I haven't implemented with it, so I really
don't know. It seems like what you describe is closer to the
"node and edge" world view that graph software uses.

In the web framework world there is a similiar dichotomy,
mapping urls to content vs. traversing url paths.

mdb

unread,
Dec 20, 2011, 1:48:12 PM12/20/11
to leo-editor
>> we can imagine a "clone server", as has been suggested, being implemented using a single sqlite file.

I look forward to seeing where this db vision leads. The idea in my
mind is to still use outlines to re-arange and organize data (program
code, documents, task lists) for views -- could be or internal windows
in Leo, external files, or passed on to other programs. The db serves
as an efficient way to store, search for and import data (often will
be snippets of code and text).

So ... two points

1. Sentinels and flat files are not incompatible with this vision, and
their relationship with outlines does not need to change

The leo outline would serve as an flexible way to use the db data, and
reading a file with sentinels will re-recreate an outline, but
changing the db entries due to a change in an external file should
come from an explicit choice of the user (i.e., not automagical, too
dangerous). Nonetheless, the db should store extra node info (name,
key, date, owner) that is written to sentinels that can be stored
JSON like and this can be manipulated and re-read & refreshed.

2. Working collaboration will surely have many unforeseen problems

Thus a restricted one-user-only version of db storage seems a better
first step

My guess is that few leo users, including myself, currently make full
use of attributes by using or extending unknown attributes (uA's) and
this would change.

Seth Johnson

unread,
Dec 20, 2011, 2:31:10 PM12/20/11
to leo-e...@googlegroups.com


A linked list where every "node" has the context fully specified. And
the fact there's only one "tree structure value" per record (the
parent key) keeps things simple, with everything in a simple, flat
denormalized data structure that's as fast to work with as
conceivable, as well as massively scalable -- perfectly amenable for
NoSQL backends, for instance.

That flat structure lets you extend the basic concept of db relations
to what I call a context (so it has all the functions you need
regardless of specific context, rather than being simply joins of two
tables representing particular entities in the real world).


Seth

Seth Johnson

unread,
Dec 21, 2011, 7:36:39 PM12/21/11
to leo-e...@googlegroups.com
On Tue, Dec 20, 2011 at 2:31 PM, Seth Johnson <seth.p....@gmail.com> wrote:
> On Tue, Dec 20, 2011 at 1:39 PM, Kent Tenney <kte...@gmail.com> wrote:
>>
>> It sounds like the moral equivalent of a linked list.
>>
>> The "address" approach I've been leaning towards feels more
>> robust / useful, but I haven't implemented with it, so I really
>> don't know. It seems like what you describe is closer to the
>> "node and edge" world view that graph software uses.
>>
>> In the web framework world there is a similiar dichotomy,
>> mapping urls to content vs. traversing url paths.


Reading more carefully, I should say that if I understand the
distinction I think you're drawing, I think I'm doing both at the same
time: every "node" has the full address for its context with it. But
the outline structure is like a linked list, sort of.


Seth

Kent Tenney

unread,
Dec 21, 2011, 10:28:43 PM12/21/11
to leo-e...@googlegroups.com
On Wed, Dec 21, 2011 at 6:36 PM, Seth Johnson <seth.p....@gmail.com> wrote:
> On Tue, Dec 20, 2011 at 2:31 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>> On Tue, Dec 20, 2011 at 1:39 PM, Kent Tenney <kte...@gmail.com> wrote:
>>>
>>> It sounds like the moral equivalent of a linked list.
>>>
>>> The "address" approach I've been leaning towards feels more
>>> robust / useful, but I haven't implemented with it, so I really
>>> don't know. It seems like what you describe is closer to the
>>> "node and edge" world view that graph software uses.
>>>
>>> In the web framework world there is a similiar dichotomy,
>>> mapping urls to content vs. traversing url paths.
>
>
> Reading more carefully, I should say that if I understand the
> distinction I think you're drawing, I think I'm doing both at the same
> time: every "node" has the full address for its context with it.

I'm reading your description as having 2 components,
- a graph of pointers to nodes
- the nodes referenced by the pointers

the graph defines addresses, the nodes are the data

Is this correct?

 But
> the outline structure is like a linked list, sort of.
>
>
> Seth
>
>
>> A linked list where every "node" has the context fully specified.  And
>> the fact there's only one "tree structure value" per record (the
>> parent key) keeps things simple, with everything in a simple, flat
>> denormalized data structure that's as fast to work with as
>> conceivable, as well as massively scalable -- perfectly amenable for
>> NoSQL backends, for instance.
>>
>> That flat structure lets you extend the basic concept of db relations
>> to what I call a context (so it has all the functions you need
>> regardless of specific context, rather than being simply joins of two
>> tables representing particular entities in the real world).
>>
>>
>> Seth
>

Seth Johnson

unread,
Dec 21, 2011, 11:29:59 PM12/21/11
to leo-e...@googlegroups.com
On Wed, Dec 21, 2011 at 10:28 PM, Kent Tenney <kte...@gmail.com> wrote:
> On Wed, Dec 21, 2011 at 6:36 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>>
>> Reading more carefully, I should say that if I understand the
>> distinction I think you're drawing, I think I'm doing both at the same
>> time: every "node" has the full address for its context with it.
>
> I'm reading your description as having 2 components,
> - a graph of pointers to nodes
> - the nodes referenced by the pointers
>
> the graph defines addresses, the nodes are the data
>
> Is this correct?


The pointers are the address. But a set of pointers for standard
components that make up an "address" (a set of addresses, one for each
component) that together designate how each node and its relevant data
are being used. The "nodes" (conceived as points in an outline) are
also pointers. But you can at least see an outline at the level of
"nodes."

Let's set aside data for now -- they are stored granularly, per
attribute, and they have a fuller "address" specification.

For now, just looking at "nodes": You have a table, with columns for:

State: that's three columns:
Space
Location
Standpoint

Context: that's three columns:
Use Type
Link Type
Use

The above are all "pointers" -- they are unique key values, which here
also include URLs.

And then, a column for what we can call a "node":
Link

That's also a "pointer" with a key value like the rest, but for now
let's just pretend it's a useful text field like a Leo headline.

You can call the Space and Context keys the "address" of the "node"
which is in the Link column.

There can be many Links, and each of those Links is another record in
this table, with all the same columns. What brings those Links
together into one "outline" is their having common key values in the
State and Context columns.


After that, you have the tree structure fields:
Parent Link
Counter/Sibling Order

By your terminology you could call the Parent Link field a "pointer"
to "nodes." Maybe the totality of "nodes" and "pointers" in that
sense is your "graph."

But the "node" -- the Link -- has a fully specified address in the
State and Context columns. It *also* has tree structure pointers in
the Parent Link column.


Seth

Kent Tenney

unread,
Dec 22, 2011, 1:54:47 PM12/22/11
to leo-e...@googlegroups.com
OK, we're talking apples and oranges here.

My approach is pretty simple-minded, aimed at versioning nodes,
your schema is much richer, over my head.

I just want a snapshot which saves node state to a dict / json object.
It's also intended to be somewhat generic, the Leo specific stuff
is in the ``other`` dict.

================================================
def node2dict(self, node=None, string_address=True):
"""Return a dictionary of some node attributes.

Communicate outside the Leo world.
"""

if node is None:
node = self.c.currentPosition()

address = self.get_address(node)
if string_address is True:
# convert ints to strings
address = self.address2string(address)

# some items are To Be Determined
return {
'timestamp': self.gnx2timestamp(node.gnx),
'type': "TBD",
'hash': "TBD", # probably a dict of hashes {'name':headhash,
'content':bodyhash, 'location':UNLhash ...}
'uri': self.fname,
'other': {
'headline': node.h,
'body': node.b,
'address': address,
'gnx': node.gnx,
'key': node.key(),
'level': node.level(),
'uA': node.v.u,
}
}
================================================

An example of what I call address, this UNL:
UNL: /home/ktenney/work/leotools.leo#leotools.leo-->Sandbox-->using
@data nodes-->@settings-->@string VENV_HOME = /home/ktenney/venv

has address:
address: 0-5-0-0-1

BTW, this "address" syntax can be seen in p.key()
key: 150749356:0.151500012:5.151500300:0.151500492:0.151500844:1

Seth Johnson

unread,
Dec 22, 2011, 2:16:51 PM12/22/11
to leo-e...@googlegroups.com
Okay, I think I have a sense of where you're situated. What do those
5 key components of the address represent? Or is it just arbitrarily
a 5-part key value, one large key that happens to be made up of 5
"meaningless key" parts?

I get that you are grabbing the state, and the json objects that
result will then be stored in something larger that keeps versions. I
feel I can't comment intelligently unless I get the nature of that
address you're using.


Seth

Kent Tenney

unread,
Dec 22, 2011, 3:30:15 PM12/22/11
to leo-e...@googlegroups.com
On Thu, Dec 22, 2011 at 1:16 PM, Seth Johnson <seth.p....@gmail.com> wrote:
> Okay, I think I have a sense of where you're situated.  What do those
> 5 key components of the address represent?

They are a set of child indexes
0-5-0-0-1

root
\
0
1
2
3
4
5
\
0 my address is 0-5-0
\
0
\
0
1 my address is 0-5-0-0-1

make sense?

The Leo file can be reconstructed from these addresses:
address 0 is the first root node
address 1 is the second root node
address 0-4 is the fifth child of the first root node
etc.

Seth Johnson

unread,
Dec 22, 2011, 5:43:32 PM12/22/11
to leo-e...@googlegroups.com
On Thu, Dec 22, 2011 at 3:30 PM, Kent Tenney <kte...@gmail.com> wrote:
> On Thu, Dec 22, 2011 at 1:16 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>> Okay, I think I have a sense of where you're situated.  What do those
>> 5 key components of the address represent?
>
> They are a set of child indexes
> 0-5-0-0-1
>
> root
> \
>  0
>  1
>  2
>  3
>  4
>  5
>  \
>  0 my address is 0-5-0
>   \
>    0
>     \
>      0
>      1  my address is 0-5-0-0-1
>
> make sense?
>
> The Leo file can be reconstructed from these addresses:
> address 0 is the first root node
> address 1 is the second root node
> address 0-4 is the fifth child of the first root node
> etc.


Got it. And I gather that with p.key(), this is a Leo feature, where
these chains of child indexes are stored with each node.

It wouldn't be particularly hard to add a Parent attribute, holding
the gnx of the Parent -- but why you would bother, just to solve your
problem, is what you would wonder; and I don't understand everything
that's handled with these chain addresses.

This probably relates to the earlier question of how external files
relate to the db backend. And I imagine that's caught up in how Leo
does that, perhaps using these address chains. I would guess they
would help track what changes have occurred between versions -- in
particular, among external files in a collaborative context. Just
guessing there.

Whereas my concept is about stuff only going on in the database, and
if need be updating external files with the current status of the
database representation.

My approach is about generalizing how to map data/content to URLs (or,
a standard *set* of URL types defining a standardized, generalized
notion of a context) for data that's cloned, distributed and
manipulated in shared outlines. I store tree structure in a minimal,
flat way rather than store arbitrarily complex structures, and then
let tree traversal be done programmatically.

However, assuming that backend, one can certainly add attributes like
"address" holding chain addresses like you use, if those kinds of
addresses are the optimal way to do whatever you're using them for.
Not that that would necessarily be of interest to you in solving your
problem.

When I say address, I'm not saying "where in the tree," but something
more like "exactly how a node or data is being used," so the backend
server can manage the data, keeping it "in scope". That's the kind of
"clone server" it is.


Seth

Eoin

unread,
Dec 23, 2011, 5:27:06 PM12/23/11
to leo-editor

I've been following this discussion with some interest. A couple of
things spring to mind now:
1. This seems to be the first statement of a problem:
"outline structure must be first-class data in shared
(cooperative, distributed) environments." Maybe it's been said earlier
and I've missed it though.
2. Rich Hickey's talks/videos on state, value and identity (with a nod
to AN Whitehead) chime. Not sure if the specifics apply but the
approach is interesting.
I'd like to have a slightly more informed input to this. Edward, what
can you recommend by way of reading/coding for the current Leo data
structures?

Best regards,
Eoin

On Dec 20, 2:36 pm, "Edward K. Ream" <edream...@gmail.com> wrote:

Seth Johnson

unread,
Dec 23, 2011, 7:16:31 PM12/23/11
to leo-e...@googlegroups.com
On Fri, Dec 23, 2011 at 5:27 PM, Eoin <eoinmc...@fastmail.fm> wrote:
>
> I've been following this discussion with some interest. A couple of
> things spring to mind now:
> 1. This seems to be the first statement of a problem:
> "outline structure must be first-class data in shared
> (cooperative, distributed) environments." Maybe it's been said earlier
> and I've missed it though.


I created a procedural language that was essentially XBASE-plus, and
part of it was the ability to assign context and state (indeed, any
component of these notions) to variables. This was how you could
navigate. You could log onto a Uniform Context Transfer Protocol
server and type commands like:

GET CONTEXT

(and it would return the current use type, link type, use, etc.)

or:

x = STATE
y = CONTEXT

GET x
GET y

Or to navigate in a context, you could:

SET CONTEXT TO y
GO TOP LINK
do while not eoc() (endofcontext)
PRINT LINK
SKIP LINK
enddo

or you can create the context at runtime:
SET USE TYPE TO someusetypeilike
SET LINK TYPE TO somelinktypeilike
GO TOP USE
z = CONTEXT

None of Rich Hackey's persistent objects, separation of value from
identity, and transaction or state transition management. But it is a
model of distributed state, so by formalizing that, it is at least one
way to start thinking about the things he does with Clojure within a
set definition of distributed state.


Seth

Seth Johnson

unread,
Dec 23, 2011, 7:18:18 PM12/23/11
to leo-e...@googlegroups.com
That's Rich *Hickey*, not *Hackey* :-)

HansBKK

unread,
Dec 23, 2011, 10:29:20 PM12/23/11
to leo-e...@googlegroups.com
I realize my thinking is on a completely lower-level channel from the thread's current direction, and much of the below is a rehash of my previous postings, so do skip it if you're busy or truly at a dead-end with this line of enquiry.

On Saturday, December 24, 2011 5:27:06 AM UTC+7, Eoin wrote:

>    "outline structure must be first-class data in shared (cooperative, distributed) environments."


On Tuesday, December 20, 2011 11:54:24 PM UTC+7, Edward K. Ream wrote:

I am no longer sure what the original intention was ;-)


Obviously outline structure is already first-class data in any context. My interest in this thread could be phrased as (perhaps similarly to what the above was intended to convey?):

    Outline structure should in future be as easily shared as external @ <file> content is today among the members of a distributed work group.


AFAIK @shadow files are the only current mechanism that isn't internal to the .leo file nor implemented as in-content sentinels, which is why from my user perspective I see them as **an analog** to a way forward - don't let your greater knowledge of their implementation get in the way. Their key limitation in this context is that they are "private", which (again AFAIK) means tied to both the local .leo and the local working copy it's interacting with. I say it that way, rather than "per user", because obviously one **can** share these outlines with another local user, but only serially, not concurrently.

I'm sure the solution is much more complex than I know, but I envision //something like// @shadow files storing outline information on a "per tree" basis - from the top-level node down, which I could somehow "publish", designate as shareable. Rather than storing such "tree objects" (?) in a relatively opaque, always-need to be connected database, or having Leo code deal with the complexities of n-way replication/sync'ing these data, I would think storing them in simple diff-able text files would allow the workgroup to (continue to) use bazaar/mercurial/git or (as originally imagined here) Fossil, Leo's implementation of the functionality remaining independent of that choice.

I believe such a path would also require the nodes themselves (and perhaps other meta-data?) to be stored externally from the .leo file, independently of the @ <file>s it's interacting with, and of course my simplistic POV comes up again with diff-able text files to be managed externally by whatever distributed VCS. The question comes up as to what remains in the .leo file itself, an answer to which of course I have no clue other than the opinion "only that which shouldn't be shared".

My own limited understanding of the issues likely makes this a completely off-target approach to a solution, especially since the goal I'm positing may not even be what is under discussion here.

And since I'm probably celebrating Christmas here earlier than most of the others, let me take the opportunity to wish everyone here a Happy and Merry one around the world, may next year be a better one than this one was for all. . .

Seth Johnson

unread,
Dec 24, 2011, 10:30:02 AM12/24/11
to leo-e...@googlegroups.com
On Thu, Dec 22, 2011 at 3:30 PM, Kent Tenney <kte...@gmail.com> wrote:
> On Thu, Dec 22, 2011 at 1:16 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>> Okay, I think I have a sense of where you're situated. What do those
>> 5 key components of the address represent?
>
> They are a set of child indexes
> 0-5-0-0-1


Are you actually doing state transition management with persistent
objects like what Eoin pointed to re Rich Hickey? Hickey stores the
"diffs" between instances of unique "values" under one "identity" (his
special way of thinking of variables/data structures) over time, as
tree chains like this, holding just the part of the structure that has
changed. This lets him treat "values" as "the whole structure at a
moment of time," which is a useful concept in a concurrent execution
environment, rather than using traditional data structures whose
individual pieces of data could be changed independently by different
processes. Rather than copying the whole structure, he virtualizes
distinct value instances by pointing at "diff" chains like yours for
the part that has changed, plus a pointer to the rest of the original
structure that hasn't changed.

In any case, key chains like you use could be stored like any other
tree in my architecture. That could become an implementation of
unique values under identity a la Hickey, I guess.

I made my system open to diverse blocking approaches -- I'm trying to
remember, but mostly all I recall clearly is that you request
"occasions" from the authoritative host servers of the state in which
you're working -- and I didn't design it as a way to hold outlines as
snapshots in time, as part of an approach to concurrent execution in a
particular way like Hickey does. (My focus tends to be more on
generality than "containment.")

Seth

Kent Tenney

unread,
Dec 24, 2011, 4:30:19 PM12/24/11
to leo-e...@googlegroups.com
Wow, I feel like the Rabbi in the following joke:

You are talking way above my head, I'll study this thread and
try to come up with something to contribute. In the mean time, enjoy.

http://www.awordinyoureye.com/jokes83rdset.html

(#1705) The Pope and the Rabbi [Author unknown] Several centuries ago, the Pope
decreed that all the Jews had to convert or leave Italy. There was a huge outcry
from the Jewish community, so the Pope offered a deal. He would have a religious
debate with the leader of the Jewish community. If the Jews won, they could stay
in Italy, if the Pope won, they would have to leave. The Jewish people met and
picked an aged but wise Rabbi Moshe, to represent them in the debate. However,
as Moshe spoke no Italian and the Pope spoke no Yiddish, they all agreed that it
would be a "silent" debate. On the chosen day, the Pope and Rabbi Moshe sat
opposite each other for a full minute before the Pope raised his hand and showed
three fingers. Rabbi Moshe looked back and raised one finger. Next the Pope
waved his finger around his head. Rabbi Moshe pointed to the ground where he
sat. The Pope then brought out a communion wafer and a chalice of wine. Rabbi
Moshe pulled out an apple. With that, the Pope stood up and declared that he was
beaten, that Rabbi Moshe was too clever and that the Jews could stay. Later,
the Cardinals met with the Pope, asking what had happened. The Pope said, "First
I held up three fingers to represent the Trinity. He responded by holding up one
finger to remind me that there is still only one God common to both our beliefs.
Then, I waved my finger to show him that God was all around us. He responded by
pointing to the ground to show that God was also right here with us. I pulled
out the wine and wafer to show that God absolves us of all our sins. He pulled
out an apple to remind me of the original sin. He had me beaten and I could not
continue." Meanwhile the Jewish community were gathered around Rabbi Moshe. "How
did you win the debate?" they asked. "I haven't a clue," said Moshe. "First he
said to me that we had three days to get out of Italy, so I said to him, ‘up
yours!’ Then he tells me that the whole country would be cleared of Jews and I
said to him, we're staying right here." "And then what," asked a woman. "Who
knows?" said Moshe, "He took out his lunch so I took out mine."

Seth Johnson

unread,
Dec 24, 2011, 4:44:09 PM12/24/11
to leo-e...@googlegroups.com
I like that joke! :-)

Edward K. Ream

unread,
Dec 26, 2011, 7:44:52 AM12/26/11
to leo-e...@googlegroups.com
On Sat, Dec 24, 2011 at 4:30 PM, Kent Tenney <kte...@gmail.com> wrote:
> Wow, I feel like the Rabbi in the following joke:

Yes. Good joke. BTW, I often feel clueless in these discussions
myself. Furthermore, I often forget that we've *had* these
discussions.

I think that's ok. Leo is not going to change in any "big" way unless
the way forward is so simple and compelling that it will be impossible
to forget: like "webs are outlines in disguise." So far, nothing
remotely that simple has appeared.

Edward

Alia K

unread,
Dec 26, 2011, 8:19:45 AM12/26/11
to leo-editor
Edward wrote:
> This post is pure speculation: the vision may turn out to be a
> mirage.  It's also long.  Feel free to ignore.
>
> Otoh, the ideas here are completely new, and exist in a new design
> space.
>
> Last night I realized that the a db-oriented Leo would run into
> immediate problems: what to do about @file nodes and external files?

Edward, thank you for opening up this subject.

If I may contribute to this new design space, I've been following this
discussion with interest: the idea of storing the whole leo outline in
an sqlite (or any other SQL database) is very intriguing. Just being
allowed to sql query one's leo nodes would be amazing (-:

So, just speculating out loud, let's assume the following:

- the db maintains leo tree state: node content, structure and
versions thereof which are saved by default (until an explicit 'flush'
or 'shrink' command does away with prior or unneeded versions)

- all external files are just rendered views of a particular
(versioned) state of the leo db. i.e. filesystem objects are generated
on demand.

- the leo ui gives the user the option to save, view and edit versions
of leo nodes (and their head + body data)

- the db fulfils the part of @shadow sentinel files

If we are using sqlite, we would therefore get one file (a leo project
file) which carries within it (if unshrunk) a versioned history of all
the changes made to all nodes in terms of content and structure.

If we are using a networked RDBMS (e.g mysql, postgres, sqlserver,
oracle, etc.), we get multi-user leo projects for free (because each
change to the leo node structure is saved on the server and references
a particular user.)

This means that the sql data model for leo projects should be able to
capture all changes by multiple users to all aspects of the project
data and structure.

Assuming this is possible, from a solution perspective, I suggest that
the sqlalchemy option is looked at closely. It offers an excellent
level of abstraction from databases and has superb documentation.

Thinking about the datamodel, I realized that I had considered a very
simplified (for my sake) notion of a leoNode in my experimental
leofunc concept back in 2009.

For the purposes of this discussion, I have converted the old version
into a sqlalchemy based one, and it seems to work ok with the
following caveats:
- attributes are not persisted
- the versioning is very basically implemented (leo nodes have a
version, head elements have a version, body elements have a version
that is all...)

You can ignore the leofunc part which is probably dated as it
attempted to conceptualize leo directives as python decorators
operating on nodes and their constituent elements. I have not been
following leo development lately, but I believe the idea is perhaps to
some extent implemented through features such as @cl definition
(http://webpages.charter.net/edreamleo/IPythonBridge.html#cl-
definitions) and the @g.command decorator (http://webpages.charter.net/
edreamleo/scripting.html#the-g-command-decorator).

The relevant part for this discussion is in a file called leonode.py
which shows how the Head, Body and Node classes are implemented in
terms of sqlalchemy, which in turn facilitates node data and structure
storage in sqlite.

Feel free to download the code from github: https://github.com/aliakhouri/leofunc
or git clone g...@github.com:aliakhouri/leofunc.git

Hope this helps the discussion,

Cheers,

Alia

Seth Johnson

unread,
Dec 26, 2011, 11:19:39 AM12/26/11
to leo-e...@googlegroups.com


One very simple thing that can be done very easily would be to just
store the Leo data as it is, with no thought of distribution or
collaboration "within the database implementation" -- then you just
store .leo files in the database, produce the external files as you
currently do, and collaborate with the external files the way you do
now. That would create a database backend that could be extended
gradually. As long as it is done in a way that's basically "the same
as" a .leo file, any more fundamental reengineering for distribution
and collaboration would be no more complex than converting from the
model of the Leo file would be in the first place. And in the
meantime, people might bang on the backend in interesting ways while
keeping its compatibility with the Leo app and its file format. If
people show ways of doing distribution and collaboration that way, you
can ponder those without worrying about impact on standard Leo.

Seth

Seth Johnson

unread,
Dec 26, 2011, 11:26:48 AM12/26/11
to leo-e...@googlegroups.com


Distribution and collaboration *and versioning* -- I forget!

It might be good to start without versioning and all the state stuff,
just do without any added features, and then look at different ways to
do the versioning based on having a "classic Leo" database
implementation in place. Apparently versioning is a key rationale for
going to the database, but maybe you can move forward by just setting
a gold standard for classic Leo first.

> Seth

Kent Tenney

unread,
Dec 26, 2011, 12:15:21 PM12/26/11
to leo-e...@googlegroups.com
The entry point I envision is this:
The gui shows 3 buttons:
<=, ||, =>

If the node which currently has focus, only the double bars are active,
clicking that button puts the current node in the repository.

if the node is edited, the double bar and the left arrow become active:
clicking <= reverts the node and makes the right arrow active

|| puts current version into the repository

rinse and repeat.

The backend would be some sort of db, a versioning system would make
it pretty simple. There could be any number of ways to define the node state.

I don't see @auto files as exceptional, other than the gnx changing.

In my workflow, an @auto node is either a
- class declaration
- method definition
- var declaration
- chunk of doc

In each case, I'd like to have access to versions of the node.

If this were implemented, I think experience would dictate what metatada
was important/useful to store for a node.

And, my interest is not in Leo files which multiple people edit at once, that's
for source code, and a solved problem. I consider my Leo files a reflection of
a personal proclivities in viewing data.

Thanks,
Kent
-

Seth Johnson

unread,
Dec 26, 2011, 1:24:06 PM12/26/11
to leo-e...@googlegroups.com
In this approach you could continue to do collaboration using the
external files, while just storing in the db while you go about
editing the file with node-level granularity, including going through
versions.

But for those who are concerned with collaborating, the minimalist
first implementation would either mean everyone uses @shadow files to
reconcile their work outside the database, or use a separate vcs at
the same time.

That is, let multiple users store their own instances of the same
complete Leo file in the database, regardless of their lack of
consistency -- and then have a separate commit process using external
files by someone with the appropriate level of rights, who would store
an additional *authoritative* instance in the database. Those working
with the database would then have to access that version.

As far as I understand external files, I think this means the
authoritative version would be created either using @shadow files, or
a separate vcs. Either all collaborators would have to provide
@shadow files and *not* use another vcs; or all users of the database
would be required to provide their external files to a separate vcs.

This would be clunky, but 1) only clunky for people who want to do
this node-level versioning; and 2) it would be one person doing the
authoritative "reconciliation."


Seth

Seth Johnson

unread,
Dec 26, 2011, 1:31:21 PM12/26/11
to leo-e...@googlegroups.com
On Mon, Dec 26, 2011 at 1:24 PM, Seth Johnson <seth.p....@gmail.com> wrote:
> In this approach you could continue to do collaboration using the
> external files, while just storing in the db while you go about
> editing the file with node-level granularity, including going through
> versions.
>
> But for those who are concerned with collaborating, the minimalist
> first implementation would either mean everyone uses @shadow files to
> reconcile their work outside the database, or use a separate vcs at
> the same time.
>
> That is, let multiple users store their own instances of the same
> complete Leo file in the database, regardless of their lack of
> consistency -- and then have a separate commit process using external
> files by someone with the appropriate level of rights, who would store
> an additional *authoritative* instance in the database.  Those working
> with the database would then have to access that version.


Oh: and I guess this means you would lose all the versions. Version
history isn't shared in this approach.

Hmm, probably: NEVer mind . . . :-)


Seth

Seth Johnson

unread,
Dec 27, 2011, 12:53:06 AM12/27/11
to leo-e...@googlegroups.com
I read a bit of SQLAlchemy tonight, and it does seem promising. Makes
working with data as objects easy, and seems to promise to do the
versioning. While I'd start by just implementing a basic save of Leo
files in database form without regard for collaboration or versioning,
this seems to be a good package to try out as a preparation for having
those things handled by the database.

Seth

Ville M. Vainio

unread,
Dec 27, 2011, 1:12:24 AM12/27/11
to leo-e...@googlegroups.com

Take a look at my earlier code that does this, in collab branch. Also mentioned in this thread.

Sqlalchemy dependency is unnecessary since the database is simple enough

Seth Johnson

unread,
Dec 27, 2011, 4:31:44 AM12/27/11
to leo-e...@googlegroups.com
On Tue, Dec 27, 2011 at 1:12 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> Take a look at my earlier code that does this, in collab branch. Also
> mentioned in this thread.

Okay, that'll be my next thing. I scanned the thread, and I surmise
you're talking about a "clone server."

Going back to that point in the thread, my next comment would be that
the way to do a "sea of nodes" is exactly with the denormalized flat
table with context (and state) fully specified per "node."

That would also make it possible to store tree structure very simply,
exactly as I have described it, with pointers going up to parents. It
would help, to understand that scheme, to point out that the "nodes"
aren't encapsulated object thingies, but actually what I call them:
links. What one might tend to think of as "contained" in "nodes" --
attributes -- are actually stored separately, in a somewhat similar
flat file with key columns providing a full specification of state,
context, and now link. The "sea of nodes" is stored in the first flat
file. The values we often think of as "contained" in those nodes, and
which may be cloned across multiple instances of the same node in the
sea, are stored as distinct attribute values in the second flat file
I've just mentioned. Every piece of the architecture has a full
specification of its context, state, and for link attributes, the link
("node") and scope of relevance of each attribute, stored with it.

That's the next peek into what I'm talking about. I think if you're
talking about the clone server, it will be interesting to contemplate
how that might be adapted to my highly generalized schema.


Seth

Terry Brown

unread,
Dec 27, 2011, 10:15:14 AM12/27/11
to leo-e...@googlegroups.com
On Tue, 27 Dec 2011 00:53:06 -0500
Seth Johnson <seth.p....@gmail.com> wrote:

> I read a bit of SQLAlchemy tonight, and it does seem promising. Makes
> working with data as objects easy, and seems to promise to do the
> versioning.

Ville's schema is worth looking at. Also the Django ORM is neat,
although SQLAlchemy might make a regular python classes based framework
easier, not sure.

I think there are many places where Leo does something similar to

for n in c.unique_nodes():
blah blah

which might be an issue for DB based outlines if the goal was to have
massive outlines - Leo doesn't do much lazy evaluation.

Even for outlines only as large as the Leo code base for example I
and even with a DB on the local machine I'd expect some performance
hit, although RAM disk buffering can help a lot of course.

Cheers -Terry

Seth Johnson

unread,
Dec 27, 2011, 10:36:29 AM12/27/11
to leo-e...@googlegroups.com
On Tue, Dec 27, 2011 at 10:15 AM, Terry Brown <terry_...@yahoo.com> wrote:
> On Tue, 27 Dec 2011 00:53:06 -0500
> Seth Johnson <seth.p....@gmail.com> wrote:
>
>> I read a bit of SQLAlchemy tonight, and it does seem promising.  Makes
>> working with data as objects easy, and seems to promise to do the
>> versioning.
>
> Ville's schema is worth looking at.  Also the Django ORM is neat,
> although SQLAlchemy might make a regular python classes based framework
> easier, not sure.


I will, somewhat soon . . .


> I think there are many places where Leo does something similar to
>
> for n in c.unique_nodes():
>    blah blah
>
> which might be an issue for DB based outlines if the goal was to have
> massive outlines - Leo doesn't do much lazy evaluation.
>
> Even for outlines only as large as the Leo code base for example I
> and even with a DB on the local machine I'd expect some performance
> hit, although RAM disk buffering can help a lot of course.


My approach would scale globally very easily, but yes, traversal might
get intensive in some contexts. I think optimizing that could be done
locally and/or in the interface rather than the backend -- no reason
why you couldn't just do the massive query, then build a list of child
nodes pointer structure locally on that "cursor set."

In the ancient world, the good designs supported bottomless sized
structures, by just moving a window through it, swapping pieces into
memory and on the display. Plus, shockingly enough: my system
actually allows the return of record-oriented db approaches -- it
solves the problem of how to work with arbitrarily complex relational
structure across networks (the basic technical issue that really
killed dBASE and settled us on SQL) -- you can navigate through
"tables" and "relations" record-by-record with all the facility with
which one used to do it using dBASE on the old 8 bit office desktop.
The horror!: database development for the masses again, with a
BASIC-like language, now all over the net! :-)


Seth

> Cheers -Terry

Seth Johnson

unread,
Dec 27, 2011, 10:46:50 AM12/27/11
to leo-e...@googlegroups.com
On Tue, Dec 27, 2011 at 10:36 AM, Seth Johnson <seth.p....@gmail.com> wrote:
> On Tue, Dec 27, 2011 at 10:15 AM, Terry Brown <terry_...@yahoo.com> wrote:
>> On Tue, 27 Dec 2011 00:53:06 -0500
>> Seth Johnson <seth.p....@gmail.com> wrote:
>>
>>> I read a bit of SQLAlchemy tonight, and it does seem promising.  Makes
>>> working with data as objects easy, and seems to promise to do the
>>> versioning.
>>
>> Ville's schema is worth looking at.  Also the Django ORM is neat,
>> although SQLAlchemy might make a regular python classes based framework
>> easier, not sure.
>
>
> I will, somewhat soon . . .
>
>
>> I think there are many places where Leo does something similar to
>>
>> for n in c.unique_nodes():
>>    blah blah
>>
>> which might be an issue for DB based outlines if the goal was to have
>> massive outlines - Leo doesn't do much lazy evaluation.
>>
>> Even for outlines only as large as the Leo code base for example I
>> and even with a DB on the local machine I'd expect some performance
>> hit, although RAM disk buffering can help a lot of course.
>
>
> My approach would scale globally very easily, but yes, traversal might
> get intensive in some contexts.  I think optimizing that could be done
> locally and/or in the interface rather than the backend -- no reason
> why you couldn't just do the massive query, then build a list of child
> nodes pointer structure locally on that "cursor set."


And note: using an index on the parent key seems to be a very
effective way to do the traversal -- but I haven't tried it on massive
outlines.


Seth

Seth Johnson

unread,
Dec 27, 2011, 11:11:24 AM12/27/11
to leo-e...@googlegroups.com
On Tue, Dec 27, 2011 at 10:46 AM, Seth Johnson <seth.p....@gmail.com> wrote:
>>
>> My approach would scale globally very easily, but yes, traversal might
>> get intensive in some contexts.  I think optimizing that could be done
>> locally and/or in the interface rather than the backend -- no reason
>> why you couldn't just do the massive query, then build a list of child
>> nodes pointer structure locally on that "cursor set."
>
>
> And note: using an index on the parent key seems to be a very
> effective way to do the traversal -- but I haven't tried it on massive
> outlines.


And using parent pointers and an index, there's no reason why you
can't "window through" a massive tree, so getting the massive query
isn't even necessary. Even starting at an arbitrary node in the tree,
you can quickly trace up to the top, getting the path from there to
the branch you're on, check whether your node is the first child of
your immediate parent, get any other coordinate child nodes if
necessary -- and then just traverse down x rows from there, just
enough to fill a local memory buffer. Rinse and repeat when you page
down. Easy. Paging up/going backwards would be more tricky, but the
algorithm for limiting how far backward you have to go to make sure
you go only a minimal amount necessary to get the previous page's
worth is a pretty simple one in my mind.

Seth Johnson

unread,
Dec 28, 2011, 12:21:16 AM12/28/11
to leo-e...@googlegroups.com, viva...@mail.com
On Tue, Dec 27, 2011 at 1:12 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> Take a look at my earlier code that does this, in collab branch. Also
> mentioned in this thread.


I don't know how and where to get this collab branch. All I found was
this: https://code.launchpad.net/~leo-editor-team/leo-editor/at-shadow-collaboration


Seth

Ville M. Vainio

unread,
Dec 28, 2011, 2:51:42 AM12/28/11
to leo-e...@googlegroups.com, viva...@mail.com

Sorry, contrib branch.

https://code.launchpad.net/~leo-editor-team/leo-editor/contrib

Especially:

http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/files/head:/Projects/leoq/

http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/files/head:/Projects/leoqviewer/

(latter is "reimplementation" of Leo for mobile phones, in C++, QML
and sqlite, so the first link is probably easier to swallow ;-).

The code to dump outline to sql database (and create the database) is here:

http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/view/head:/Projects/leoq/create_leoq.py

You can put it to a node and press ctrl+b

Seth Johnson

unread,
Dec 28, 2011, 12:17:44 PM12/28/11
to leo-e...@googlegroups.com
I guess you're not limiting it to trees? Leo doesn't have cycles,
right? If you're going beyond outlining as such, is that really
necessary for Leo?

Are you saying this code stores everything that's needed in a
production Leo file?

And if anybody has time, I'd like to hear why edges are stored
separately (as an entity of its own stored in a separate physical
memory structure) -- either in Leo or in general. In Leo would be an
easier way to answer that question. Is it, in the general sense of
what graph databases are for, basically to cover two-way-ness and
many-to-many situations? If so, I don't see outlines as many-to-many.
One could say that's the point of trees.

If you've got topmost nodes, and that makes sense in the sense that
there are no cycles, you're not modeling a general graph "sea of
nodes" idea. If you're trying to extend the possibilities to
something like that, that's not really Leo (as I understand it).
Maybe you just introduce limits or constraints in various ways.
Two-way-ness as a technique for details of implementation (optimizing
algorithms) might make sense, but many-to-many-ness isn't what you're
modeling when you model outlines.

I wish someone could tell me the functional role that edges play in
Leo code, how Leo is implemented using them. Since as far as I can
tell, they're just pointers, I suspect edges are really only important
with reference to details of a visual interface.

If edges are important, and if I'm right that that has to do with
two-way-ness (or many-to-many-ness, though that seems like it would be
apparent that it's not relevant), then I suspect those details can and
will be sloughed away in the code as well as the architecture. I
don't see a need to have edges except in the sense of pointer/key
values stored in the data architecture.

All of that said, I suppose I could just be wrong and clueless about
what will be gained by the node-and-edges graph model.

I see a node as just a way of having data brought together into a
position in a tree. It will have a set of values associated with it
that you want to organize in groups of other similar nodes under
parent nodes that make some sense out of the groupings, acting as a
heading.


Seth

Ville M. Vainio

unread,
Dec 28, 2011, 5:56:31 PM12/28/11
to leo-e...@googlegroups.com
My scheme indeed represents a generic graph. While Leo is a limited
graph (DAG), there is no reason to reflect that limitiation in the
data structure - unless there is a proof that an alternative is
faster.

One advantage of this is that the same sql file could be used by a
program that expect to deal with arbitrary graphs (e.g. something that
emits graphviz source notation for plotting).

In Leo in-memory representation, "children" is just a list of pointers
to children, but this is not possible in SQL.

My schema is not full representation yet, it lacks uA's - because my
C++/QML/Javascript implementation doesn't do anything with them.
Adding uA's is trivial, I'd probably add a new table UA that links
nodeid to a string key and a string value (or BLOBS entry for large
binary uA data). If we decide to add this feature to leo proper, we'll
add the uA dumping too.

Seth Johnson

unread,
Dec 28, 2011, 7:33:25 PM12/28/11
to leo-e...@googlegroups.com
On Wed, Dec 28, 2011 at 5:56 PM, Ville M. Vainio <viva...@gmail.com> wrote:
> My scheme indeed represents a generic graph. While Leo is a limited
> graph (DAG), there is no reason to reflect that limitiation in the
> data structure - unless there is a proof that an alternative is
> faster.


As long as you keep it a generic graph, what's going to be done with
it by users or developers may become more complex than may be optimal
according to some criterion or other. As a developer, you may find
that you will keep it generic (as in a generic graph) until you decide
what you're going to do with it. Generic in the sense of representing
any arbitrary diagram is different from generic in the sense of, say
"all file formats are outlines." ( :-) ) So you may either optimize
in light of function, or keep it general in service of the prospect of
people doing arbitrary kinds of graph analysis. If you manage by
force of discipline to keep it a generic graph despite the functions
you're developing, then that may in itself be the source of a tradeoff
against various criteria, including speed.

I would stress the simplicity and interoperability that comes from a
generalized formal specification -- it creates a structure that both
supports any kind of outline context (or file format) and provides
terms that let you work with all "files" with common understanding and
interoperability that comes from having such a generalized formal
schema.

That it is all in flat denormalized files should tell you a lot about
both speed and scalability, but I am not situated to offer any kind of
quantification for that.

> One advantage of this is that the same sql file could be used by a
> program that expect to deal with arbitrary graphs (e.g. something that
> emits graphviz source notation for plotting).
>
> In Leo in-memory representation, "children" is just a list of pointers
> to children, but this is not possible in SQL.


. . . well, not possible except in the sense that you can get a set of
records all pointing to the same parent.


> My schema is not full representation yet, it lacks uA's - because my
> C++/QML/Javascript implementation doesn't do anything with them.
> Adding uA's is trivial, I'd probably add a new table UA that links
> nodeid to a string key and a string value (or BLOBS entry for large
> binary uA data). If we decide to add this feature to leo proper, we'll
> add the uA dumping too.


Anybody can add an attribute at any time to my schema, and the schema
doesn't then get any more complex in physical form by the supposed
need to collect sets of relevant attributes into a separate unit (a
table, or a "kind" of node such as a Leo node or some Leo user's
custom type of node) representing some entity to which they are
relevant. All attributes and their values for all contexts and
"nodes" go together in the same flat file, regardless of what context
they're being used in. They have scopes of relevance -- and those are
broader than just particular entities they relate to -- but they are
physically all the same. You can store as complex a logical structure
as you like, but the physical structure remains the same. One
standard set of files with one generic structure, that holds all
formats and all relations between entities. No compounding complexity
in the physical structure ever, though users can go as crazy as they
please logically.

Universal interoperability for free. All file formats are outline
contexts! :-)

Ville M. Vainio

unread,
Dec 29, 2011, 2:20:22 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 2:33 AM, Seth Johnson <seth.p....@gmail.com> wrote:

> On Wed, Dec 28, 2011 at 5:56 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>> My scheme indeed represents a generic graph. While Leo is a limited
>> graph (DAG), there is no reason to reflect that limitiation in the
>> data structure - unless there is a proof that an alternative is
>> faster.
>
>
> As long as you keep it a generic graph, what's going to be done with
> it by users or developers may become more complex than may be optimal
> according to some criterion or other.  As a developer, you may find

Just like the .leo format, normal developers will never see the sql
schema. You can read in this sql in one swoop, yielding the exact same
memory structure we have now.

> That it is all in flat denormalized files should tell you a lot about
> both speed and scalability, but I am not situated to offer any kind of
> quantification for that.

Flat files are sort of off-topic to SQL discussion, I guess...

>> In Leo in-memory representation, "children" is just a list of pointers
>> to children, but this is not possible in SQL.
>
>
> . . . well, not possible except in the sense that you can get a set of
> records all pointing to the same parent.

Pointing at parent doesn't work in Leo because of clones. This is why
Leo outline is more of a DAG than a simple tree, a node can have
multiple parents if it's cloned somewhere in the outline.

> Anybody can add an attribute at any time to my schema, and the schema

Yup, adding uA's to my schema would allow arbitrary key-value pairs as
well. So far, I chose to make the hard "physical" structure a hard
coded part of the schema, I have a hunch it's faster than hiding the
info behind key-value data. Esp. the "edges" table is very fast to
scan even without index, because it's all integer data that can be
slurped in to memory (possibly fitting in cpu cache), and scanned in
linear fashion.

Seth Johnson

unread,
Dec 29, 2011, 4:43:06 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 2:20 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> On Thu, Dec 29, 2011 at 2:33 AM, Seth Johnson <seth.p....@gmail.com> wrote:
>
>> On Wed, Dec 28, 2011 at 5:56 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>>> My scheme indeed represents a generic graph. While Leo is a limited
>>> graph (DAG), there is no reason to reflect that limitiation in the
>>> data structure - unless there is a proof that an alternative is
>>> faster.
>>
>>
>> As long as you keep it a generic graph, what's going to be done with
>> it by users or developers may become more complex than may be optimal
>> according to some criterion or other.  As a developer, you may find
>
> Just like the .leo format, normal developers will never see the sql
> schema. You can read in this sql in one swoop, yielding the exact same
> memory structure we have now.


Okay, it sounds like regardless of the functions you're supporting,
you're planning to keep it a generic graph. My point is that keeping
it a generic graph that way may also bring tradeoffs.


>> That it is all in flat denormalized files should tell you a lot about
>> both speed and scalability, but I am not situated to offer any kind of
>> quantification for that.
>
> Flat files are sort of off-topic to SQL discussion, I guess...


No reason you can't query a flat file; just no joins involved, is all.
I should probably have said flat table. It's basically a fact table,
like for star schemas, where when you do joins, you just don't go more
than one level out from the center, and it's very fast and flexible
for queries. The flat fact table just holds the central keys, the
state and context for each "node" or attribute value.


>>> In Leo in-memory representation, "children" is just a list of pointers
>>> to children, but this is not possible in SQL.
>>
>>
>> . . . well, not possible except in the sense that you can get a set of
>> records all pointing to the same parent.
>
> Pointing at parent doesn't work in Leo because of clones. This is why
> Leo outline is more of a DAG than a simple tree, a node can have
> multiple parents if it's cloned somewhere in the outline.


That it's directed doesn't mean you can't do clones with parent
pointers -- except that within the model of a graph scheme one tends
to think of nodes like objects, rather than modeling "nodes" as
metadata for the values associated with them. (And the convention is
to point toward children, I assume.) There's no reason you can't have
the same key value designating the same "node" in more than one place
in the outline, pointing up to a different parent in each case; the
values associated with that key value aren't "contained" in it like a
node -- they're just *relevant* everywhere that "node" is used.


>> Anybody can add an attribute at any time to my schema, and the schema
>
> Yup, adding uA's to my schema would allow arbitrary key-value pairs as
> well. So far, I chose to make the hard "physical" structure a hard
> coded part of the schema, I have a hunch it's faster than hiding the
> info behind key-value data. Esp. the "edges" table is very fast to
> scan even without index, because it's all integer data that can be
> slurped in to memory (possibly fitting in cpu cache), and scanned in
> linear fashion.


Moving from the particular physical structure you're using is
something you might do in order to virtualize it in service of
implementing useful generic functions that would be available to any
kind of outline. That would involve pointers. You may never want to
do that sort of thing, just do different kinds of operations on a
graph scheme. But watch for when you start adding levels of
indirection (in the form of pointers/key values) in the service of
generality. That's the impetus I've generalized into a universal data
architecture.

Graph schemes have their own kind of simplicity. In databases you
denormalize in order to eliminate the complexity of relational
structures, and that means create a wide, flat fact table with key
value columns for every entity you want to work with.


Seth

Ville M. Vainio

unread,
Dec 29, 2011, 5:13:07 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 11:43 AM, Seth Johnson <seth.p....@gmail.com> wrote:

> No reason you can't query a flat file; just no joins involved, is all.

Yeah, and you would have to implement sql engine yourself ;-).


>> Pointing at parent doesn't work in Leo because of clones. This is why
>> Leo outline is more of a DAG than a simple tree, a node can have
>> multiple parents if it's cloned somewhere in the outline.
>
>
> That it's directed doesn't mean you can't do clones with parent
> pointers -- except that within the model of a graph scheme one tends
> to think of nodes like objects, rather than modeling "nodes" as
> metadata for the values associated with them.  (And the convention is
> to point toward children, I assume.)  There's no reason you can't have
> the same key value designating the same "node" in more than one place
> in the outline, pointing up to a different parent in each case; the
> values associated with that key value aren't "contained" in it like a
> node -- they're just *relevant* everywhere that "node" is used.

Not sure whether I'm following you, but if you want to have "parent
pointer + child index" representation, you would re-introduce the
already killed concept of vnodes and tnodes to support clones.

Can you paste your schema here, since I may be misunderstanding
something? What I have is just a bog standard graph representation I
found on the internet, as I'm averse to inventing solutions to
problems for which simple solution already exists.

Separate edge section was suggested for next gen .leo xml file format
as well - it would allow representing soft links (that can be cyclic!)
directly in the graph section as first class citizens, instead of the
uA solution that is used now.

Seth Johnson

unread,
Dec 29, 2011, 7:57:07 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 5:13 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> On Thu, Dec 29, 2011 at 11:43 AM, Seth Johnson <seth.p....@gmail.com> wrote:
>
>> No reason you can't query a flat file; just no joins involved, is all.
>
> Yeah, and you would have to implement sql engine yourself ;-).


Huh? Why? If you're talking about tree traversal, I've addressed
that elsewhere in this thread. But querying a flat table in SQL . . .
what are you going on about?


>>> Pointing at parent doesn't work in Leo because of clones. This is why
>>> Leo outline is more of a DAG than a simple tree, a node can have
>>> multiple parents if it's cloned somewhere in the outline.
>>
>>
>> That it's directed doesn't mean you can't do clones with parent
>> pointers -- except that within the model of a graph scheme one tends
>> to think of nodes like objects, rather than modeling "nodes" as
>> metadata for the values associated with them.  (And the convention is
>> to point toward children, I assume.)  There's no reason you can't have
>> the same key value designating the same "node" in more than one place
>> in the outline, pointing up to a different parent in each case; the
>> values associated with that key value aren't "contained" in it like a
>> node -- they're just *relevant* everywhere that "node" is used.
>
> Not sure whether I'm following you, but if you want to have "parent
> pointer + child index" representation, you would re-introduce the
> already killed concept of vnodes and tnodes to support clones.
>
> Can you paste your schema here, since I may be misunderstanding
> something? What I have is just a bog standard graph representation I
> found on the internet, as I'm averse to inventing solutions to
> problems for which simple solution already exists.

Your aversion would likely stand in the way of appreciating it then.

I would have to get it off an old hard drive. I wonder if I zipped it
up and put it online somewhere. I will dig out an SQL tool and create
the main link table, link attribute, and link attribute value tables,
from memory. The indexes are done in dBASE, and are rather unique so
I won't reconstruct them.

This is better anyway, to describe the system gradually.

However, I've drawn the structure of the main "node" outline table
above in this thread. Attributes and their values are stored in
separate tables with the same key values, now including the "node's"
key (and the attribute and the attribute value).


Seth

> Separate edge section was suggested for next gen .leo xml file format
> as well - it would allow representing soft links (that can be cyclic!)
> directly in the graph section as first class citizens, instead of the
> uA solution that is used now.
>

Ville M. Vainio

unread,
Dec 29, 2011, 9:21:24 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 2:57 PM, Seth Johnson <seth.p....@gmail.com> wrote:

>>> No reason you can't query a flat file; just no joins involved, is all.
>>
>> Yeah, and you would have to implement sql engine yourself ;-).
>
>
> Huh?  Why?  If you're talking about tree traversal, I've addressed
> that elsewhere in this thread.  But querying a flat table in SQL . . .
> what are you going on about?

Naive assumption is that database == sql database, unless explicitly
stated otherwise (e.g. "object database", nosql database, etc...). If
we venture outside standard databases, we are talking about custom
binary formats more than anything else - and that's a whole different
conversation that is not of interest to me at this time (seems like
premature optimization when both xml and sqlite seem to perform
acceptably).

Overall, threads like this are too long and abstract for me to read. I
trust that relevant bits get summarized under new subject line, as is
usually the case :).

Seth Johnson

unread,
Dec 29, 2011, 10:00:14 AM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 9:21 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> On Thu, Dec 29, 2011 at 2:57 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>
>>>> No reason you can't query a flat file; just no joins involved, is all.
>>>
>>> Yeah, and you would have to implement sql engine yourself ;-).
>>
>>
>> Huh?  Why?  If you're talking about tree traversal, I've addressed
>> that elsewhere in this thread.  But querying a flat table in SQL . . .
>> what are you going on about?
>
> Naive assumption is that database == sql database, unless explicitly
> stated otherwise (e.g. "object database", nosql database, etc...). If
> we venture outside standard databases, we are talking about custom
> binary formats more than anything else - and that's a whole different
> conversation that is not of interest to me at this time (seems like
> premature optimization when both xml and sqlite seem to perform
> acceptably).


Many people do star schemas in SQL; it's just a design for a
particular kind of use -- data mining. I'm an old school dBASE coder.
Everything I'm talking about here is traditional databases, whether
dBASE or SQL. When I say it's compatible with big data, I mean it
being flat tables makes it optimal for putting in those environments.
The tree traversal I do is the most esoteric thing we're talking
about, and it's done using dBASE indexes, aside from the general idea
of universal outline contexts.


Seth

mdb

unread,
Dec 29, 2011, 11:05:56 AM12/29/11
to leo-editor
Ville,

I ran the code in

> The code to dump outline to sql database (and create the database) is here:
> http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/view/...

after changing the db location in

def test(c):
tf = TreeFrag("/home/ville/treefrag.db")

of course

Do you have code to recreate a tree from the db?
Or better .. to select and include in an outline a single node

Ville M. Vainio

unread,
Dec 29, 2011, 2:50:37 PM12/29/11
to leo-e...@googlegroups.com

I recreate the structure in leoqviewer, but this is c++ code. Dumb algorithm to create the tree is easy to do for python Leo as well, but if I write full file format support I'd like to do it the fast way - first create nodes, then put them to their place in the graph, in just 2 passes.

mdb

unread,
Dec 29, 2011, 6:19:46 PM12/29/11
to leo-editor
On Dec 29, 2:50 pm, "Ville M. Vainio" <vivai...@gmail.com> wrote:
> I recreate the structure in leoqviewer, but this is c++ code. Dumb
> algorithm to create the tree is easy to do for python Leo as well, but if I
> write full file format support I'd like to do it the fast way - first
> create nodes, then put them to their place in the graph, in just 2 passes.

I am able to recreate nodes from the sqlite db but I do not fully
understand how your edges scheme can be used to recreate a tree
design. It seems easier to simply give the db a field that tells
whether a node is a child of another node and if so, which one. Am I
missing something?

Ville M. Vainio

unread,
Dec 29, 2011, 6:37:04 PM12/29/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 1:19 AM, mdb <mdbo...@gmail.com> wrote:

> I am able to recreate nodes from the sqlite db but I do not fully
> understand how your edges scheme can be used to recreate a tree
> design.  It seems easier to simply give the db a field that tells
> whether a node is a child of another node and if so, which one.  Am I
> missing something?

- Because of clones, a node can have several parents. Therefore, a
single slot for parent id won't work

- Even if you were able to list all parents (which SQL does not
allow), you would have to specify the child index for every parent (to
retain sibling order)

Given the list of edges (a,b) :

If you want all children of node N, list edges where N is the 'a'
node, and store the 'b' node from every edge. To find parent(s) of N,
list edges where N is the 'b' node.

Ville M. Vainio

unread,
Dec 29, 2011, 6:40:34 PM12/29/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 1:37 AM, Ville M. Vainio <viva...@gmail.com> wrote:

> Given the list of edges (a,b) :
>
> If you want all children of node N, list edges where N is the 'a'
> node, and store the 'b' node from every edge. To find parent(s) of N,
> list edges where N is the 'b' node.

In C++ & SQL:

https://gist.github.com/1536719

Seth Johnson

unread,
Dec 29, 2011, 7:53:51 PM12/29/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 2:50 PM, Ville M. Vainio <viva...@gmail.com> wrote:
> I recreate the structure in leoqviewer, but this is c++ code. Dumb algorithm
> to create the tree is easy to do for python Leo as well, but if I write full
> file format support I'd like to do it the fast way - first create nodes,
> then put them to their place in the graph, in just 2 passes.


I hope you do implement both phases: the full format export, and
putting them into the outline. Leo's functionality seems so intricate
and obscure that I feel I can only deal with it when I know how the
full thing goes in and out.

I hope to plug in my old hard drive tonight and get the app I built on
the schema I'm describing off of it. I promise not to carp over
abstract theory -- having the schema (and dBASE code!) will help a
lot.


Seth

Seth Johnson

unread,
Dec 30, 2011, 8:11:09 AM12/30/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 7:53 PM, Seth Johnson <seth.p....@gmail.com> wrote:
>
> I hope to plug in my old hard drive tonight and get the app I built on
> the schema I'm describing off of it.  I promise not to carp over
> abstract theory -- having the schema (and dBASE code!) will help a
> lot.


Two hard drives, both seem dead using my handy USB cable - IDE thingy.
Maybe later I'll try swapping them in on my mom's tower, which is a
bigger hassle. Or create it in SQL . . .

Seth Johnson

unread,
Dec 30, 2011, 8:25:26 AM12/30/11
to leo-e...@googlegroups.com
On Thu, Dec 29, 2011 at 6:37 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>
> Given the list of edges (a,b) :
>
> If you want all children of node N, list edges where N is the 'a'
> node, and store the 'b' node from every edge. To find parent(s) of N,
> list edges where N is the 'b' node.


One can use two indexes instead of having an edges entity:

To find all children of node N, seek N in an index on the parent key
field. Skip through until it doesn't match.

To find all parents of node N, seek N in an index on the node key
field. Skip through (reading the parent key field) until it doesn't
match.


Seth

Ville M. Vainio

unread,
Dec 30, 2011, 9:13:21 AM12/30/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 3:25 PM, Seth Johnson <seth.p....@gmail.com> wrote:

> On Thu, Dec 29, 2011 at 6:37 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>>
>> Given the list of edges (a,b) :
>>
>> If you want all children of node N, list edges where N is the 'a'
>> node, and store the 'b' node from every edge. To find parent(s) of N,
>> list edges where N is the 'b' node.
>
>
> One can use two indexes instead of having an edges entity:
>
> To find all children of node N, seek N in an index on the parent key
> field.  Skip through until it doesn't match.

This doesn't work if N is cloned somewhere, i.e. N has several parents.

Seth Johnson

unread,
Dec 30, 2011, 9:41:53 AM12/30/11
to leo-e...@googlegroups.com

Node key - Parent key
A - N
B - N
C - N
D - X
E - X
N - D
N - E

X - D - N - A
\ B
\ C
\ E - N - A
\ B
\ C

To find all children of node N, seek N in an index on the parent key
field. Skip through until it doesn't match.

Node key - Parent key
A - N
B - N
C - N


To find all parents of node N, seek N in an index on the node key

field. Skip through (reading the parent key field) until it doesn't
match.

Node key - Parent key
N - D
N - E


Seems to work . . .


Seth

Ville M. Vainio

unread,
Dec 30, 2011, 10:37:28 AM12/30/11
to leo-e...@googlegroups.com

I don't really understand this. What do you mean by 'index'? In rdbms, index can only contain data that can be trivially derived from tables. Everything needs to work without index as well.

Seth Johnson

unread,
Dec 30, 2011, 11:03:23 AM12/30/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 10:37 AM, Ville M. Vainio <viva...@gmail.com> wrote:
> I don't really understand this. What do you mean by 'index'? In rdbms, index
> can only contain data that can be trivially derived from tables. Everything
> needs to work without index as well.


I do some clever index stuff that I think only the old XBASE
environments allow -- but I'm surprised that this example doesn't seem
legit -- it's only using two simple one-field indexes.

I suspect this is set orientation at work: SQL makes you work with
query response sets, rather than navigating around record-by-record.

Here's simple XBASE code that does the above. The indexes are not
complex for this example:


USE nodes

INDEX ON nodes->parent TO parentindex
INDEX ON nodes->nodekey TO nodeindex

x = "N"
SET INDEX TO parentindex
SEEK x
do while nodes->parentindex = x
PRINT nodes->nodekey, nodes->parent
SKIP
enddo

SET INDEX TO nodeindex
SEEK x
do while nodes->nodeindex = x
PRINT nodes->nodekey, nodes->parent
SKIP
enddo

Seth Johnson

unread,
Dec 30, 2011, 11:05:53 AM12/30/11
to leo-e...@googlegroups.com
I think there are two key conceptual differences: 1) set orientation
vs record orientation; 2) nodes as encapsulated objects vs. nodes as
records

Seth

Seth Johnson

unread,
Dec 30, 2011, 11:29:28 AM12/30/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 11:05 AM, Seth Johnson <seth.p....@gmail.com> wrote:
> I think there are two key conceptual differences: 1) set orientation
> vs record orientation; 2) nodes as encapsulated objects vs. nodes as
> records

or more to the point: nodes as encapsulated objects vs. nodes
represented in more than one record

I think that's the difference, anyway.

Seth

Seth Johnson

unread,
Dec 30, 2011, 11:53:09 AM12/30/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 9:41 AM, Seth Johnson <seth.p....@gmail.com> wrote:

This is a nodes table, two columns/fields:

> Node key - Parent key
> A  -  N
> B  -  N
> C  -  N
> D  -  X
> E  -  X
> N  -  D
> N  -  E

This is the tree represented by the above records:

>  X  - D  - N  - A
>                 \ B
>                 \ C
>    \ E  - N  - A
>                 \ B
>                 \ C

(eom)

Ville M. Vainio

unread,
Dec 30, 2011, 12:20:35 PM12/30/11
to leo-e...@googlegroups.com

So your nodes table is essentially my EDGES table :)

Seth Johnson

unread,
Dec 30, 2011, 12:28:18 PM12/30/11
to leo-e...@googlegroups.com
On Fri, Dec 30, 2011 at 12:20 PM, Ville M. Vainio <viva...@gmail.com> wrote:
> So your nodes table is essentially my EDGES table :)

Ah! Will check that, but I think you're probably right. And your
nodes table would be more appropriately identified with my attributes
table(s).

The "bottom level" for me is the individual attribute values. You
might call my entire structure "edges gone wild." I don't have
encapsulation at the node level.

Maybe this will let me match it up with Leo format. Would be good to
see some sort of "certified" db representation of Leo, with code that
gets it in and out.


Seth

Seth Johnson

unread,
Dec 30, 2011, 12:30:07 PM12/30/11
to leo-e...@googlegroups.com
No, not edges gone wild -- pointers: they don't really correlate with
arrows between nodes . . . but sure, edges probably goes right into my
Links level table. :-)

Offray Vladimir Luna Cárdenas

unread,
Jan 2, 2012, 4:04:53 PM1/2/12
to leo-e...@googlegroups.com
Hi,

I read all the thread on db-oriented version of Leo. I got lost on
implementation details (what is a shame... :-/), so I'm just want to say
that the things that Alia points here about a single db file containing
all data (external or not) of a project, and collaboration for free is
already possible working with Leo + Fossil without the need to reinvent
fossil, as Edward fears, but just talking more explicitly to it from
Leo. That's what me and my team are doing. It is a newbie
non-programmers approach but is working for us. This is what we do to
collaborate using Leo + Fossil to work collaboratively on a project
called "The Project":

We create a Folder called "TheProject" with the following structure:

TheProject/
|_ theProject.fossil
|_ collaborator1TheProject.leo
|_ collaborator1TheProject.leo
:
|_ collaboratorNTheProject.leo
|_ Folder1/
|_ file1-1.abc
|_ file1-2.xyz
:
|_ file1-n.xyz
|_ Folder2/
|_ ...
:
|_ FolderN/
|_ ...

There is still a lot of shared convention to make this work. The idea is
that TheProject contains .leo files which are _personal views_ of each
collaborator on the data of the project, usually scattered across
external files, which are inside the same folder. theProject.fossil is
the personal repo of each collaborator containing the history of the
collaboration and the files. We don't need to have versions of each
collaborator on external files or on the fossil repo, because putting
this data in conversation is the work of fossil, using a central
repository or in a p2p fashion (if Internet is not working but we have
still Intranet, or at the first stages of a project when there was no
central repository). Each of us works on external data or in leo
personal views as it fits the workflow of everyone and use fossil for
coordination. We abandon, for the moment the idea of shared Leo trees as
shared understanding of a project, but we can see others peoples threes
when we want and use the Nav button to locate the particular info and
see the context of it in the personal view of somebody else. We use
some ideas of GoboLinux about[1] having a personal convention for
organizing the files and symlinked to canonical Gnu/Linux trees.

[1] http://en.wikipedia.org/wiki/GoboLinux

This convention and hierarchy let us to carry the project and its
history in a single Folder, so it can be seen as plain data or as a
fossil repository. There is replication on information, having it in the
fossil repo and in TheProject folder, but having this alternative
approach from the point of view of the repo or the plain files,
justifies this redundancy. Still we have a lot of portability carrying
just one folder and its contents or just carrying the repo.

Now having convention over configuration was our choice, but we imagine
a more automagical world: At first we want to create buttons and
commands inside Leo to all the external operation of fossil, but now
seeing this discussion I imagine further levels of conversation between
fossil and Leo. The idea of a sea of nodes in a Leo database, seems a
lot like the idea of a sea of objects in Smalltalk images. And this idea
of emergence in the sea of data seems more suited for a NoSQL database,
but fortunately Fossil is already a NoSQL database[2], which supports
external files, collaboration and portability.

[2] http://www.sqlite.org/debug1/doc/trunk/www/theory1.wiki

So the question is, how to increase conversation between Fossil and Leo
to solve some of the problems addressed on this thread? My first idea
was to use the DAG[3] support in Fossil to map DAG in Leo, so Fossil
would have kind of special trees which are not for tracking project time
line[4], but Leo trees and the second one was using NoSQL sea of nodes
implemented in the NoSQL fossil database. As I said I have not the
proper knowledge about implementation details, so I will hold this ideas
until a more knowledgeable person tell me more about the implementation
or I have a proper context to play with them.

[3] http://www.sqlite.org/debug1/doc/trunk/www/branching.wiki
[4] http://www.sqlite.org/debug1/timeline

Cheers,

Offray

On 12/26/11 08:19, Alia K wrote:
>
> So, just speculating out loud, let's assume the following:
>
> - the db maintains leo tree state: node content, structure and
> versions thereof which are saved by default (until an explicit 'flush'
> or 'shrink' command does away with prior or unneeded versions)
>
> - all external files are just rendered views of a particular
> (versioned) state of the leo db. i.e. filesystem objects are generated
> on demand.
>
> - the leo ui gives the user the option to save, view and edit versions
> of leo nodes (and their head + body data)
>
> - the db fulfils the part of @shadow sentinel files
>

> If we are using sqlite, we would therefore get one file (a leo project
> file) which carries within it (if unshrunk) a versioned history of all
> the changes made to all nodes in terms of content and structure.
>
> If we are using a networked RDBMS (e.g mysql, postgres, sqlserver,
> oracle, etc.), we get multi-user leo projects for free (because each
> change to the leo node structure is saved on the server and references
> a particular user.)

> This means that the sql data model for leo projects should be able to
> capture all changes by multiple users to all aspects of the project
> data and structure.

Seth Johnson

unread,
Jan 3, 2012, 12:51:35 AM1/3/12
to leo-e...@googlegroups.com
On Mon, Jan 2, 2012 at 4:04 PM, Offray Vladimir Luna Cárdenas
<off...@riseup.net> wrote:
> Hi,
>
> I read all the thread on db-oriented version of Leo. I got lost on
> implementation details (what is a shame... :-/),

I think the single thing that will best get db-oriented Leo going will
be someone just writing code that stores complete Leo files in a
database of any kind at all. Code to put that database into the
outline interface app would complete the picture for lots of people to
start providing running code that does collaboration and versioning.

That would, for instance, make it easy for you or Alia to demonstrate
what Fossil can do.

Maybe the thing to do is to think in terms of a "reference
implementation" of the basic, non-collaborative and non-versioning
classic Leo, and embrace the notion that many backends could be
developed for it. That "reference implementation" could add functions
for collab or versioning once someone has shown how it could be done.
Different approaches would likely have different characteristics --
something that is designed for concurrent execution a la Rich Hickey
concepts will behave differently from other approaches, but we'd see
the implications of each approach while each solution would have to at
least meet the reference implementation requirements before we'd see
it as "mature."

I think your post was easily the most constructive one in this entire thread.


Seth

Offray Vladimir Luna Cárdenas

unread,
Jan 3, 2012, 5:36:37 AM1/3/12
to leo-e...@googlegroups.com
Hi,

On 01/03/12 00:51, Seth Johnson wrote:
>
> I think the single thing that will best get db-oriented Leo going will
> be someone just writing code that stores complete Leo files in a
> database of any kind at all. Code to put that database into the
> outline interface app would complete the picture for lots of people to
> start providing running code that does collaboration and versioning.
>
> That would, for instance, make it easy for you or Alia to demonstrate
> what Fossil can do.
>
> Maybe the thing to do is to think in terms of a "reference
> implementation" of the basic, non-collaborative and non-versioning
> classic Leo, and embrace the notion that many backends could be
> developed for it. That "reference implementation" could add functions
> for collab or versioning once someone has shown how it could be done.
> Different approaches would likely have different characteristics --
> something that is designed for concurrent execution a la Rich Hickey
> concepts will behave differently from other approaches, but we'd see
> the implications of each approach while each solution would have to at
> least meet the reference implementation requirements before we'd see
> it as "mature."
>
> I think your post was easily the most constructive one in this entire thread.
>

Thanks. About implementation my idea is to start with a different
approach. The idea is to "teach fossil to Leo" and having in that way a
default support for versioning and collab in Leo, so the convention for
working together in a collaborative p2p fashion that I showed would be
automatically supported by Leo + Fossil. Adding an external file to Leo
would add it to the Fossil repo and all the commands for working with
Fossil would be supported inside Leo and so on. Then I will try to
deconstruct the Leo data structure, if this is needed, so it can be
supported by the NoSQL database or the DAG of Fossil. So the idea is to
have a particular implementation of collaboration + versioning that may
be abstracted later to work with more approaches.

Cheers,

Offray

Seth Johnson

unread,
Jan 3, 2012, 11:28:59 AM1/3/12
to leo-e...@googlegroups.com
On Tue, Jan 3, 2012 at 5:36 AM, Offray Vladimir Luna Cárdenas
<off...@riseup.net> wrote:
>
> Thanks. About implementation my idea is to start with a different approach.
> The idea is to "teach fossil to Leo" and having in that way a default
> support for versioning and collab in Leo, so the convention for working
> together in a collaborative p2p fashion that I showed would be automatically
> supported by Leo + Fossil. Adding an external file to Leo would add it to
> the Fossil repo and all the commands for working with Fossil would be
> supported inside Leo and so on. Then I will try to deconstruct the Leo data
> structure, if this is needed, so it can be supported by the NoSQL database
> or the DAG of Fossil. So the idea is to have a particular implementation of
> collaboration + versioning that may be abstracted later to work with more
> approaches.


But if there was a database representation of the Leo document
already, with code for saving that plus putting it back into Leo,
wouldn't that give you the information and/or understanding you need
to demonstrate your approach? Including seeing where in the code to
teach Leo? My impression is the only thing that keeps people from
going ahead with what you propose, or demo'ing any approach at all, is
uncertainty about having an adequate model that works with Leo's
intricacies.


Seth

Terry Brown

unread,
Jan 3, 2012, 11:41:42 AM1/3/12
to leo-e...@googlegroups.com
On Tue, 3 Jan 2012 11:28:59 -0500
Seth Johnson <seth.p....@gmail.com> wrote:

> My impression is the only thing that keeps people from
> going ahead with what you propose, or demo'ing any approach at all, is
> uncertainty about having an adequate model that works with Leo's
> intricacies.

I think that's the case. @<file> handling is complex, I'm not sure how
familiar with it Ville is, I've poked around the edges a bit, but don't
know how it would interact with loading to / from a DB.

And even if that wasn't an issue, the next step beyond using a DB to
replace the file system would also be complex, whether it was sharing
or whatever. Well, perhaps versioning wouldn't be so hard, but
sharing is certainly challenging.

Cheers -Terry

Ville M. Vainio

unread,
Jan 3, 2012, 3:20:07 PM1/3/12
to leo-e...@googlegroups.com
On Tue, Jan 3, 2012 at 6:41 PM, Terry Brown <terry_...@yahoo.com> wrote:

> I think that's the case.  @<file> handling is complex, I'm not sure how
> familiar with it Ville is, I've poked around the edges a bit, but don't
> know how it would interact with loading to / from a DB.

My code just dumps everything, including under @<file> stuff, to db.

This is by design, as I want to ship .leoq as a standalone file that
you can send to your phone in email, or whatever.

Not writing @file nodes is easy, just stop traversal when you see one.
Likewise, reading back in is easy, just expand the whole tree and
after that do the @file node handling for the whole tree.

> And even if that wasn't an issue, the next step beyond using a DB to
> replace the file system would also be complex, whether it was sharing
> or whatever.  Well, perhaps versioning wouldn't be so hard, but
> sharing is certainly challenging.

I think it's best to left advanced use cases like that unhandled. Even
if coding it wasn't too hard (it probably is ;-), people want simple
and reliable workflows for their creative work, users are paranoid
about losing data if they can't completely understand what is
happening under the hood (e.g. clone wars must have spooked of many of
us)

Seth Johnson

unread,
Jan 3, 2012, 3:48:59 PM1/3/12
to leo-e...@googlegroups.com


I think we should leave everything new unhandled. Just regard the db
as a file system you're saving your own Leo files into. The only new
thing should be, instead of just saving it as one file as a whole,
store the elements as separate units (records, nodes, whatever), but
just make sure that you have a save and load procedure for that db
representation, that works because all it does is save your file in
that form and load it. Then everything current in Leo will just work.
Plus people would know what they need to store in the db
representation, and what they need to feed to the load procedure.
They can use that and hack at different ideas, just knowing that at
bottom the classic, basic Leo is still going to work so long as they
represent it equivalently to that reference representation, and feed
the load procedure with what it needs.

Seth

Seth Johnson

unread,
Jan 3, 2012, 3:58:04 PM1/3/12
to leo-e...@googlegroups.com

And why not, for the @file business, keep those saving to the local
standard file system: just put the .leo file in the db for now. In
the routine for saving, have a flag that once it's set, will save the
.leo file in the db, as well as save it in the traditional fiel system
-- or one or the other. I don't see why we have to start by solving
distributed, collaborative work on the @file stuff, with versioning.
Just save the .leo file. Let people suggest different approaches
starting from there.

(Getting redundant, sorry. I'm done. :-) )


Seth

Ville M. Vainio

unread,
Jan 3, 2012, 4:33:02 PM1/3/12
to leo-e...@googlegroups.com
I'd sooner use a ,zip file with the .leo file + external files.

Seth Johnson

unread,
Jan 3, 2012, 5:36:11 PM1/3/12
to leo-e...@googlegroups.com
On Tue, Jan 3, 2012 at 4:33 PM, Ville M. Vainio <viva...@gmail.com> wrote:
> I'd sooner use a ,zip file with the .leo file + external files.


Right. But that doesn't get us a development context that facilitates
people's participation in developing db solutions.

All that's needed is a db representation and code that gets that in
and out of Leo. Then db developers will be empowered to proceed in
all sorts of ways, and Edward can ponder how and whether to
incorporate the various approaches. Coupled with that process, we get
a separation of the db representation from the interface functions
that might make Leo something that could be put on any back end that
provides for its current reference implementation. That reference
implementation does not need to have distribution, collaboration, or
versioning. Or concurrent executability, to address Eoin's
suggestions. All of those things can all be carefully considered
after they've been demonstrated with running code that at least makes
the current Leo reference implementation work.

(Oops, I did it again.)


Seth

If I want to zip up the file with its external files, I can always do
that. But if my .leo file is

Offray Vladimir Luna Cárdenas

unread,
Jan 4, 2012, 3:50:22 AM1/4/12
to leo-e...@googlegroups.com
Hi,

Well my approach is that deconstructing textual computer interaction
requires to think in two axes: one of structure in space (Leo outlines
are this) and one of structure in time (DVCS/SCM, specially Fossil are
this). So using my strategy will demonstrate my approach and will have
the advantage of solving a day to day problem in the way that my team
and I work and at the same time could evolve in a more abstract solution
as you propose where Leo DOM could be mapped to database vía DAG or Sea
of nodes in a NoSQL database. Living in the Global South ("Developing
Countries" as called by someones) and working with digital technology is
about developing this kind of strategies that deal with the day to day
problems at first, at the same time that envision some abstract
structure. Is about "acting contextually but thinking systemically".

Cheers,

Offray

Seth Johnson

unread,
Jan 6, 2012, 10:49:45 PM1/6/12
to leo-e...@googlegroups.com


Okay, but in the absence of a db representation and code to get it in
and out of Leo as it is, you will have to be the one to provide it --
only in your case, you're planning additional functionality. You can
certainly do that, but I would say understanding of Leo intricacies is
the main hurdle. That's what I see as lacking, that somebody can
provide either with nothing new or with your feature set. Whichever
way it gets done, it's the fact that the db representation and code
for loading/saving it are done that will provide the necessary
understanding people need to start contributing on the db front.

Seth

Seth Johnson

unread,
Jan 7, 2012, 1:43:41 PM1/7/12
to leo-e...@googlegroups.com
I guess it's like anything else: one could just look at the load and
save code, make whatever db representation one wants that hooks into
that. I keep wanting to adapt a database representation, when I guess
reading Python wouldn't be hard. But all the things Leo must be
parsing . . . guess I ought to just look at it.

Seth

Reply all
Reply to author
Forward
0 new messages