Otoh, the ideas here are completely new, and exist in a new design
space.
Last night I realized that the a db-oriented Leo would run into
immediate problems: what to do about @file nodes and external files?
When I awoke this morning I saw that a new direction: simpler in some
ways, but wide ranging and potentially impossibly complex.
Influences
========
Rereading the sqlite documentation primed the subconscious pump:
There was a note (somewhere) about simplicity being the foundation of
(I think) sqlite.
As you will see, fossil's sha1 keys are very important.
The various behind-the-scenes complexities of fossil also contributed
somehow.
Summary
=======
Let me try to give a big picture overview of my thoughts, before the
myriad complexities arise. The challenge will be to create simplicity
everywhere. The simplicity *must* be real: it must be on the order of
"webs are outlines in disguise".
Suppose *all* (or almost all) information is contained in a **Leonine
db**. The kind of the db doesn't matter, except insofar as it
supports what is needed. I'll assume sqlite db, as required by
fossil, possibly extended.
Suppose external files start with one or more sha1 keys, which are the
*only* sentinels in a file. For example, for .py files::
#@sha1: <sha1 key>
We can think of this line as a "universal link" into a database
created by a particular program.
Multiple programs or projects might want to add such links, so it
might be good to include a "statement of responsibility"
#@sha1: <contributor A's url> <sha1 key>
#@sha1: <contributor B's url> <another sha1 key>
For Leo, the url would be https://launchpad.net/leo-editor, and the
link would be a link into the Leonine db.
As explained below, we can use a single Leonine key even though
multiple Leonistas have edited the file.
As a kind of commit hook, we will probably want to update this key
when committing a change to the external file.
Using sha1 keys effectively
=====================
The world has not begun to appreciate how cool sha1 keys are. Any
program or project may generate them by the millions, with no fear of
conflict. Thus, we can consider sha1 keys to **be** any kind of data
structure we like!
Thus, the one and only (Leonine) key in each external file can
represent *all* data associated with the file, not just *any* data
associated with the file. That is, we can imagine "amalgamating" data
structures (dicts) that contain, for instance:
- The complete outline structure of the file.
- The bzr/fossil revision info,
- The (link to) the .leo file that contains the external file,
- The file paths of the .leo file and all external files,
- The list of all people who have contributed to the file, and the
revisions that each individually has made. (Think bzr blame).
- Whatever else could possibly be useful.
So the sha1 key in the external file refers to this amalgamating data
structure. Furthermore, the format of the amalgamating data
structures can change at will, with no fear of sha1 conflicts. Thus,
data structure formats are completely dynamic: there is no such thing
as being incompatible with old data!
If we use fossil, fossil will associate *other* sha1 keys with this
amalgamated info (and the constituents), but we don't care. We can
safely assume that these other sha1 keys will never conflict with our
own sha1 keys.
All this (and more) is part of the under-appreciated "magic" of sha1
keys.
Everything disappears (temporarily)
===========================
If all (or almost all) data appears in a Leonine sqlite db we can say
the following:
- There is no need for a Leo cache.
- There is no need for private @shadow files.
- We can use the @shadow algorithm for *all* files, including @file,
@auto, etc.
That is, there is a single, universal, @file node.
Files seem to have disappeared, but they must be recreated somehow.
We can't avoid @path complications because we eventually have to
recreate external files *in the proper places*.
The challenges
============
It's all very well to put all (or almost all) info in the Leonine db.
The challenges are:
1. To create amalgamating data structures that can be updated simply.
Complexity here will likely doom the project.
2. To coordinate *distributed* Leonine db's without rewriting fossil
or bzr or sqlite ;-)
3. To continue to use most, if not all, of Leo's core code without
change, except where the existing code is no longer needed.
Point 2 seems to be the biggest challenge. Let us consider a .leo
file as a form of *personal* view into the Leonine db. EKR's view
might be analogous to leoPyRef.leo, but my view should not be
"privileged" in any way: any user should be able to access the
**published view** of any other user. A published view will be (or
comprise) an amalgamating data structure.
I don't know fossil well enough to know how my ideas map onto fossil
constructs. I suspect, though, that some correspondence will be
possible. Otherwise, the project fails challenge 2.
Of course, fossil is not the only possible way, but as a practical
matter challenge 2 is inviolate: I don't plan to create "yet another
git" unless the payoffs (and the human help) are huge.
Your thoughts please, Amigos.
Edward
<< SNIP >>
> Summary
> =======
>
> Let me try to give a big picture overview of my thoughts, before the
> myriad complexities arise. The challenge will be to create simplicity
> everywhere. The simplicity *must* be real: it must be on the order of
> "webs are outlines in disguise".
All formats are (outline) contexts with state, and with data elements
with scopes of relevance.
> Suppose *all* (or almost all) information is contained in a **Leonine
> db**. The kind of the db doesn't matter, except insofar as it
> supports what is needed. I'll assume sqlite db, as required by
> fossil, possibly extended.
>
> Suppose external files start with one or more sha1 keys, which are the
> *only* sentinels in a file. For example, for .py files::
>
> #@sha1: <sha1 key>
>
> We can think of this line as a "universal link" into a database
> created by a particular program.
<< SNIP >>
> The world has not begun to appreciate how cool sha1 keys are. Any
> program or project may generate them by the millions, with no fear of
> conflict. Thus, we can consider sha1 keys to **be** any kind of data
> structure we like!
>
> Thus, the one and only (Leonine) key in each external file can
> represent *all* data associated with the file, not just *any* data
> associated with the file. That is, we can imagine "amalgamating" data
> structures (dicts) that contain, for instance:
>
> - The complete outline structure of the file.
> - The bzr/fossil revision info,
> - The (link to) the .leo file that contains the external file,
> - The file paths of the .leo file and all external files,
> - The list of all people who have contributed to the file, and the
> revisions that each individually has made. (Think bzr blame).
> - Whatever else could possibly be useful.
>
> So the sha1 key in the external file refers to this amalgamating data
> structure. Furthermore, the format of the amalgamating data
> structures can change at will, with no fear of sha1 conflicts. Thus,
> data structure formats are completely dynamic: there is no such thing
> as being incompatible with old data!
Just to put in your pipe -- but note my last comment at the bottom,
that this is mostly for thinking things through:
You can have one "Leonine" key that serves to point to your
amalgamating structure, or you could have a fixed number of keys in
the external file that designate the Leonine-amalgamating structure as
a "context" defined in terms of a uniform/universal data structure.
The above items are all types of uses of types of nodes (which I call
links). Each type of use combined with a type of link can have
whatever nodes (or attributes of nodes) are relevant to that context,
with "context" defined as a particular instance within a relationship
of a use type with a link type. So the external file can have a fixed
set of sha1 keys that correlate with that definition: a key for the
use type (such as a Leo file), another for link type (such as a Leo
node), another for particular use (LeoPyRef.leo), plus some more
similar keys to designate state (a few comments on that below).
<< SNIP >>
> The challenges
> ============
>
> It's all very well to put all (or almost all) info in the Leonine db.
> The challenges are:
>
> 1. To create amalgamating data structures that can be updated simply.
> Complexity here will likely doom the project.
>
> 2. To coordinate *distributed* Leonine db's without rewriting fossil
> or bzr or sqlite ;-)
>
> 3. To continue to use most, if not all, of Leo's core code without
> change, except where the existing code is no longer needed.
>
> Point 2 seems to be the biggest challenge. Let us consider a .leo
> file as a form of *personal* view into the Leonine db. EKR's view
> might be analogous to leoPyRef.leo, but my view should not be
> "privileged" in any way: any user should be able to access the
> **published view** of any other user. A published view will be (or
> comprise) an amalgamating data structure.
A uniform data structure would include a general representation of
state, and that can include individual "standpoints" -- yet that
standpoint can contain elements that others freely include in their
own contexts. The point of a standpoint is about generality (or
rather, making a state particular so you can work things out); it's
not about published or shared or not, which can be their own toggles
independently of state. State is about generality, not publishing or
shareability or collaborative modifiability. So my personal
standpoint I could make accessible or editable at large, but what I'm
trying to do is to muck around with it before I make its state more
general -- that is, intended to be pertinent to everybody in some
broader "space" or among some one or more "locations." So I just
designate it as my own standpoint until I'm ready to designate what
state it's designed to exist within. Naturally, any element
(data/text/node) within my contexts, regardless of the generality of
their state, can be placed into other contexts with different states.
What you do with those elements in my own contexts, just happens
within them, their states and their scopes of relevance.
> I don't know fossil well enough to know how my ideas map onto fossil
> constructs. I suspect, though, that some correspondence will be
> possible. Otherwise, the project fails challenge 2.
>
> Of course, fossil is not the only possible way, but as a practical
> matter challenge 2 is inviolate: I don't plan to create "yet another
> git" unless the payoffs (and the human help) are huge.
I haven't thought much at all about integration with revision control
systems, except to state that I wish those systems would be built
around a uniform/universal data structure. They would then be
automatically interoperable.
Given the complexities of contemplating development in that area
(every store has its own architecture, so whatever existing code you
might find would be tailored to those), this is probably where my
ideas will be hard to work with. But maybe my ideas can help clarify
or organize conceptions.
Seth
> Reading this fired off so many synapses, generated so much dopamine etc.,
> was as enjoyable to me as sex or quality chocolate. And that without even
> understanding most of the implementation side, just imagining that you
> consider such things possible. . .
:-) BTW, I recently I had a big personal aha. I'm not a chocoholic:
I'm a sugar, salt and fat-aholic. It expands my range of deserts,
while letting me avoid "chocolates" that are, in fact, as salty as
chips. But I digress...
> And you know already I hardly have a clue what I'm talking about here,
> completely unable to respond at the same level of abstraction and likely
> just flat out wrong, so of course feel free to ignore, but sometimes with
> blue-sky thinking ignorance can be an advantage 8-)
There is no need to apologize. I'm always in the same boat when
significant invention is happening. The primary requirement, as I
have often said, is to be able to live with, tolerate, and even enjoy
massive confusion.
Edward
Wrong, wrong, wrong.
@shadow will *never* be the model for most files because it demotes
structure info to second-class status, namely, personal preference
grafted on to the "real" data. But the *essence* of Leo is that
structure is first-class data.
@shadow is fine for non-cooperative (private) environments. In that
case, the "preference" structure is, in fact, the only structure there
is. But in shared environments outline structure must be part of each
external file. Thus, sentinels are, in general, essential as well.
This must be the fourth or fifth time I have rediscovered this basic
principle. In the past, the emphasis has been mostly on sentinels,
but here we see that the underlying principle is that outline
structure must be first-class data in shared (cooperative,
distributed) environments. So this is progress of a sort.
As a direct consequence, any approach that abandons sentinels must be
rejected. I don't know, in detail, this affects the current
discussion, but I think Seth and Hans have ideas that are compatible
with this principle.
Edward
> All this (and more) is part of the under-appreciated "magic" of sha1
> keys.
I suspect this whole line of thought is a dead end: there is no way,
in a cooperative environment, to separate outline structure from
external files, which is the actual intent of these keys. Thus, the
keys are an attempt to do the wrong thing.
You could say this is "comforting" because the idea of some kind of
gigantic amalgamation of data structures would likely have been a
truly bad design: way too complex to be feasible in practice.
Edward
P.S. While in confusion/invention mode, it is vital to be as clear as
possible about what one does, in fact, know clearly. So this is
progress, even if it seem negative in character.
EKR
As long as nobody's changing the external files. If they aren't, the
cooperation can be done in the database(s), with external files just
written as the "current situation" of the database representation.
Now, if people recognize the database representation as universal for
any app, then they may readily accept that they shouldn't mess with
the flat file. To me, no separate flat file should be needed, as all
"files" should be universally interoperable, distributed and
outlineable contexts.
Seth
> You could say this is "comforting" because the idea of some kind of
> gigantic amalgamation of data structures would likely have been a
> truly bad design: way too complex to be feasible in practice.
>
> Edward
>
> P.S. While in confusion/invention mode, it is vital to be as clear as
> possible about what one does, in fact, know clearly. So this is
> progress, even if it seem negative in character.
>
> EKR
>
> --
> You received this message because you are subscribed to the Google Groups "leo-editor" group.
> To post to this group, send email to leo-e...@googlegroups.com.
> To unsubscribe from this group, send email to leo-editor+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
>
@shadow will *never* be the model for most files because it demotes
structure info to second-class status, namely, personal preference
grafted on to the "real" data. But the *essence* of Leo is that
structure is first-class data.@shadow is fine for non-cooperative (private) environments. In that
case, the "preference" structure is, in fact, the only structure there
is. But in shared environments outline structure must be part of each
external file. Thus, sentinels are, in general, essential as well.
there is no way, in a cooperative environment, to separate outline structure from external files, which is the actual intent of these keys. Thus, the
keys are an attempt to do the wrong thing.
I am extremely interested in a db backend because it offers the potential for
'time travel' to aid my wandering in the information wilderness. I want arrows
which take me back and forth in versions of a node.
My last push in this direction stalled when hooking node select and deselect
events seemed unreliable, but I am sure I'll return to it at some point.
Thanks,
Kent
>> Thus, the keys are an attempt to do the wrong thing.
> As long as nobody's changing the external files. If they aren't, the
> cooperation can be done in the database(s), with external files just
> written as the "current situation" of the database representation.
I don't thing there is anything interesting in this direction. The
fundamental problem is keeping information in synch. By far the
simplest thing that could possibly work is to put sentinels in files.
> Now, if people recognize the database representation as universal for
> any app, then they may readily accept that they shouldn't mess with
> the flat file. To me, no separate flat file should be needed, as all
> "files" should be universally interoperable, distributed and
> outlineable contexts.
The "everything in one file" approach seems to me to be a mirage. It
doesn't really add anything except a lot of complexity.
Having said that, we can imagine a "clone server", as has been
suggested, being implemented using a single sqlite file.
It seems to me that the real problem is to make sense out of BH's
(LeoUser's) old old suggestion to create a "sea of nodes". This
raises many other issues. For example, does a node include its child
links? (Imo, it probably should not include parent links, as those
links would depend on context).
At present, I don't see a good way forward. That's ok: I've got bugs to fix ;-)
Edward
> given a single user, and never looking at the external files generated
> @shadow and @thin are functionally equivalent.
@thin "carries" structural information in the actual external file.
@shadow carries structure in the private file. The difference is
subtle, but important.
When you do a bzr pull on a node containing sentinels you get the
*new* structure from the push. When you do a bzr pull on an public
@shadow file you get the new data but *not* any new structure: the
@shadow update algorithm "invents" a way to incorporate the new data
using the *old* structure. In general, this will be a lot less
convenient than using files with sentinels.
> But I thought one of the main points of your OP (switching over to the
> "imagined possible" reality) was the ability to store (almost?) everything
> related to the content, outside of the individual user's Leo file, in
> particular the "views" created by that user.
I am no longer sure what the original intention was ;-)
Edward
For me, there don't need to be any files, just a lot of "nodes" and
their attributes -- but they are all brought together particular
purposes using one uniform formal schema, so all that really needs to
be managed and transferred between contexts and physical servers while
you're manipulating your "views," is metadata. When you break it down
this way, the attribute values are hosted at multiple particular
servers, from which they can be accessed as you navigate through and
read a full "file" (a "context") -- but when you manipulate it, it's
only the structure that's distributed, in a standard uniform metadata
format.
Seth
> Having said that, we can imagine a "clone server", as has been
> suggested, being implemented using a single sqlite file.
Which might be said to be similar to what I'm talking about, except I
provide for specifying contexts, states, and scopes of relevance for
nodes and their attribute values.
> It seems to me that the real problem is to make sense out of BH's
> (LeoUser's) old old suggestion to create a "sea of nodes". This
> raises many other issues. For example, does a node include its child
> links? (Imo, it probably should not include parent links, as those
> links would depend on context).
In my architecture, nodes point up to their parents. You access all
the nodes for a context, and they come with pointers to their parents,
and maybe the server can traverse that and give them to you in outline
order, or you can do that locally once the server just provides the
relevant nodes.
> At present, I don't see a good way forward. That's ok: I've got bugs to fix ;-)
:-)
Seth
> Edward
How do you maintain sibling order?
I've been settling on node "addresses" in metadata,
it's a list of child indexes. It provides each node with a
self contained description of where it lives, which I like.
I haven't come across a more succinct way to do that.
A counter field that you add to an index on the parent key field.
So you search for the record that has no parent key -- that's the top.
Then you search for that record's key in the parent key index -- that
will have its children in sibling order.
And so forth.
I don't do SQL searches, which would probably make this slow, just a
local navigation algorithm on the index.
You seem to have hit on something similar to my scheme with your self
contained descriptions of where nodes live. The records I'm speaking
of are just fields holding keys in that uniform structure I speak of,
so each record holds the context -- plus the particular "node" key,
which I call a link.
"Nodes" are actually also metadata -- you don't really get "data data"
until you seek attribute values.
Seth
It sounds like the moral equivalent of a linked list.
The "address" approach I've been leaning towards feels more
robust / useful, but I haven't implemented with it, so I really
don't know. It seems like what you describe is closer to the
"node and edge" world view that graph software uses.
In the web framework world there is a similiar dichotomy,
mapping urls to content vs. traversing url paths.
I look forward to seeing where this db vision leads. The idea in my
mind is to still use outlines to re-arange and organize data (program
code, documents, task lists) for views -- could be or internal windows
in Leo, external files, or passed on to other programs. The db serves
as an efficient way to store, search for and import data (often will
be snippets of code and text).
So ... two points
1. Sentinels and flat files are not incompatible with this vision, and
their relationship with outlines does not need to change
The leo outline would serve as an flexible way to use the db data, and
reading a file with sentinels will re-recreate an outline, but
changing the db entries due to a change in an external file should
come from an explicit choice of the user (i.e., not automagical, too
dangerous). Nonetheless, the db should store extra node info (name,
key, date, owner) that is written to sentinels that can be stored
JSON like and this can be manipulated and re-read & refreshed.
2. Working collaboration will surely have many unforeseen problems
Thus a restricted one-user-only version of db storage seems a better
first step
My guess is that few leo users, including myself, currently make full
use of attributes by using or extending unknown attributes (uA's) and
this would change.
A linked list where every "node" has the context fully specified. And
the fact there's only one "tree structure value" per record (the
parent key) keeps things simple, with everything in a simple, flat
denormalized data structure that's as fast to work with as
conceivable, as well as massively scalable -- perfectly amenable for
NoSQL backends, for instance.
That flat structure lets you extend the basic concept of db relations
to what I call a context (so it has all the functions you need
regardless of specific context, rather than being simply joins of two
tables representing particular entities in the real world).
Seth
Reading more carefully, I should say that if I understand the
distinction I think you're drawing, I think I'm doing both at the same
time: every "node" has the full address for its context with it. But
the outline structure is like a linked list, sort of.
Seth
I'm reading your description as having 2 components,
- a graph of pointers to nodes
- the nodes referenced by the pointers
the graph defines addresses, the nodes are the data
Is this correct?
But
> the outline structure is like a linked list, sort of.
>
>
> Seth
>
>
>> A linked list where every "node" has the context fully specified. And
>> the fact there's only one "tree structure value" per record (the
>> parent key) keeps things simple, with everything in a simple, flat
>> denormalized data structure that's as fast to work with as
>> conceivable, as well as massively scalable -- perfectly amenable for
>> NoSQL backends, for instance.
>>
>> That flat structure lets you extend the basic concept of db relations
>> to what I call a context (so it has all the functions you need
>> regardless of specific context, rather than being simply joins of two
>> tables representing particular entities in the real world).
>>
>>
>> Seth
>
The pointers are the address. But a set of pointers for standard
components that make up an "address" (a set of addresses, one for each
component) that together designate how each node and its relevant data
are being used. The "nodes" (conceived as points in an outline) are
also pointers. But you can at least see an outline at the level of
"nodes."
Let's set aside data for now -- they are stored granularly, per
attribute, and they have a fuller "address" specification.
For now, just looking at "nodes": You have a table, with columns for:
State: that's three columns:
Space
Location
Standpoint
Context: that's three columns:
Use Type
Link Type
Use
The above are all "pointers" -- they are unique key values, which here
also include URLs.
And then, a column for what we can call a "node":
Link
That's also a "pointer" with a key value like the rest, but for now
let's just pretend it's a useful text field like a Leo headline.
You can call the Space and Context keys the "address" of the "node"
which is in the Link column.
There can be many Links, and each of those Links is another record in
this table, with all the same columns. What brings those Links
together into one "outline" is their having common key values in the
State and Context columns.
After that, you have the tree structure fields:
Parent Link
Counter/Sibling Order
By your terminology you could call the Parent Link field a "pointer"
to "nodes." Maybe the totality of "nodes" and "pointers" in that
sense is your "graph."
But the "node" -- the Link -- has a fully specified address in the
State and Context columns. It *also* has tree structure pointers in
the Parent Link column.
Seth
My approach is pretty simple-minded, aimed at versioning nodes,
your schema is much richer, over my head.
I just want a snapshot which saves node state to a dict / json object.
It's also intended to be somewhat generic, the Leo specific stuff
is in the ``other`` dict.
================================================
def node2dict(self, node=None, string_address=True):
"""Return a dictionary of some node attributes.
Communicate outside the Leo world.
"""
if node is None:
node = self.c.currentPosition()
address = self.get_address(node)
if string_address is True:
# convert ints to strings
address = self.address2string(address)
# some items are To Be Determined
return {
'timestamp': self.gnx2timestamp(node.gnx),
'type': "TBD",
'hash': "TBD", # probably a dict of hashes {'name':headhash,
'content':bodyhash, 'location':UNLhash ...}
'uri': self.fname,
'other': {
'headline': node.h,
'body': node.b,
'address': address,
'gnx': node.gnx,
'key': node.key(),
'level': node.level(),
'uA': node.v.u,
}
}
================================================
An example of what I call address, this UNL:
UNL: /home/ktenney/work/leotools.leo#leotools.leo-->Sandbox-->using
@data nodes-->@settings-->@string VENV_HOME = /home/ktenney/venv
has address:
address: 0-5-0-0-1
BTW, this "address" syntax can be seen in p.key()
key: 150749356:0.151500012:5.151500300:0.151500492:0.151500844:1
I get that you are grabbing the state, and the json objects that
result will then be stored in something larger that keeps versions. I
feel I can't comment intelligently unless I get the nature of that
address you're using.
Seth
They are a set of child indexes
0-5-0-0-1
root
\
0
1
2
3
4
5
\
0 my address is 0-5-0
\
0
\
0
1 my address is 0-5-0-0-1
make sense?
The Leo file can be reconstructed from these addresses:
address 0 is the first root node
address 1 is the second root node
address 0-4 is the fifth child of the first root node
etc.
Got it. And I gather that with p.key(), this is a Leo feature, where
these chains of child indexes are stored with each node.
It wouldn't be particularly hard to add a Parent attribute, holding
the gnx of the Parent -- but why you would bother, just to solve your
problem, is what you would wonder; and I don't understand everything
that's handled with these chain addresses.
This probably relates to the earlier question of how external files
relate to the db backend. And I imagine that's caught up in how Leo
does that, perhaps using these address chains. I would guess they
would help track what changes have occurred between versions -- in
particular, among external files in a collaborative context. Just
guessing there.
Whereas my concept is about stuff only going on in the database, and
if need be updating external files with the current status of the
database representation.
My approach is about generalizing how to map data/content to URLs (or,
a standard *set* of URL types defining a standardized, generalized
notion of a context) for data that's cloned, distributed and
manipulated in shared outlines. I store tree structure in a minimal,
flat way rather than store arbitrarily complex structures, and then
let tree traversal be done programmatically.
However, assuming that backend, one can certainly add attributes like
"address" holding chain addresses like you use, if those kinds of
addresses are the optimal way to do whatever you're using them for.
Not that that would necessarily be of interest to you in solving your
problem.
When I say address, I'm not saying "where in the tree," but something
more like "exactly how a node or data is being used," so the backend
server can manage the data, keeping it "in scope". That's the kind of
"clone server" it is.
Seth
I created a procedural language that was essentially XBASE-plus, and
part of it was the ability to assign context and state (indeed, any
component of these notions) to variables. This was how you could
navigate. You could log onto a Uniform Context Transfer Protocol
server and type commands like:
GET CONTEXT
(and it would return the current use type, link type, use, etc.)
or:
x = STATE
y = CONTEXT
GET x
GET y
Or to navigate in a context, you could:
SET CONTEXT TO y
GO TOP LINK
do while not eoc() (endofcontext)
PRINT LINK
SKIP LINK
enddo
or you can create the context at runtime:
SET USE TYPE TO someusetypeilike
SET LINK TYPE TO somelinktypeilike
GO TOP USE
z = CONTEXT
None of Rich Hackey's persistent objects, separation of value from
identity, and transaction or state transition management. But it is a
model of distributed state, so by formalizing that, it is at least one
way to start thinking about the things he does with Clojure within a
set definition of distributed state.
Seth
I am no longer sure what the original intention was ;-)
Outline structure should in future be as easily shared as external @ <file> content is today among the members of a distributed work group.
Are you actually doing state transition management with persistent
objects like what Eoin pointed to re Rich Hickey? Hickey stores the
"diffs" between instances of unique "values" under one "identity" (his
special way of thinking of variables/data structures) over time, as
tree chains like this, holding just the part of the structure that has
changed. This lets him treat "values" as "the whole structure at a
moment of time," which is a useful concept in a concurrent execution
environment, rather than using traditional data structures whose
individual pieces of data could be changed independently by different
processes. Rather than copying the whole structure, he virtualizes
distinct value instances by pointing at "diff" chains like yours for
the part that has changed, plus a pointer to the rest of the original
structure that hasn't changed.
In any case, key chains like you use could be stored like any other
tree in my architecture. That could become an implementation of
unique values under identity a la Hickey, I guess.
I made my system open to diverse blocking approaches -- I'm trying to
remember, but mostly all I recall clearly is that you request
"occasions" from the authoritative host servers of the state in which
you're working -- and I didn't design it as a way to hold outlines as
snapshots in time, as part of an approach to concurrent execution in a
particular way like Hickey does. (My focus tends to be more on
generality than "containment.")
Seth
You are talking way above my head, I'll study this thread and
try to come up with something to contribute. In the mean time, enjoy.
http://www.awordinyoureye.com/jokes83rdset.html
(#1705) The Pope and the Rabbi [Author unknown] Several centuries ago, the Pope
decreed that all the Jews had to convert or leave Italy. There was a huge outcry
from the Jewish community, so the Pope offered a deal. He would have a religious
debate with the leader of the Jewish community. If the Jews won, they could stay
in Italy, if the Pope won, they would have to leave. The Jewish people met and
picked an aged but wise Rabbi Moshe, to represent them in the debate. However,
as Moshe spoke no Italian and the Pope spoke no Yiddish, they all agreed that it
would be a "silent" debate. On the chosen day, the Pope and Rabbi Moshe sat
opposite each other for a full minute before the Pope raised his hand and showed
three fingers. Rabbi Moshe looked back and raised one finger. Next the Pope
waved his finger around his head. Rabbi Moshe pointed to the ground where he
sat. The Pope then brought out a communion wafer and a chalice of wine. Rabbi
Moshe pulled out an apple. With that, the Pope stood up and declared that he was
beaten, that Rabbi Moshe was too clever and that the Jews could stay. Later,
the Cardinals met with the Pope, asking what had happened. The Pope said, "First
I held up three fingers to represent the Trinity. He responded by holding up one
finger to remind me that there is still only one God common to both our beliefs.
Then, I waved my finger to show him that God was all around us. He responded by
pointing to the ground to show that God was also right here with us. I pulled
out the wine and wafer to show that God absolves us of all our sins. He pulled
out an apple to remind me of the original sin. He had me beaten and I could not
continue." Meanwhile the Jewish community were gathered around Rabbi Moshe. "How
did you win the debate?" they asked. "I haven't a clue," said Moshe. "First he
said to me that we had three days to get out of Italy, so I said to him, ‘up
yours!’ Then he tells me that the whole country would be cleared of Jews and I
said to him, we're staying right here." "And then what," asked a woman. "Who
knows?" said Moshe, "He took out his lunch so I took out mine."
Yes. Good joke. BTW, I often feel clueless in these discussions
myself. Furthermore, I often forget that we've *had* these
discussions.
I think that's ok. Leo is not going to change in any "big" way unless
the way forward is so simple and compelling that it will be impossible
to forget: like "webs are outlines in disguise." So far, nothing
remotely that simple has appeared.
Edward
One very simple thing that can be done very easily would be to just
store the Leo data as it is, with no thought of distribution or
collaboration "within the database implementation" -- then you just
store .leo files in the database, produce the external files as you
currently do, and collaborate with the external files the way you do
now. That would create a database backend that could be extended
gradually. As long as it is done in a way that's basically "the same
as" a .leo file, any more fundamental reengineering for distribution
and collaboration would be no more complex than converting from the
model of the Leo file would be in the first place. And in the
meantime, people might bang on the backend in interesting ways while
keeping its compatibility with the Leo app and its file format. If
people show ways of doing distribution and collaboration that way, you
can ponder those without worrying about impact on standard Leo.
Seth
Distribution and collaboration *and versioning* -- I forget!
It might be good to start without versioning and all the state stuff,
just do without any added features, and then look at different ways to
do the versioning based on having a "classic Leo" database
implementation in place. Apparently versioning is a key rationale for
going to the database, but maybe you can move forward by just setting
a gold standard for classic Leo first.
> Seth
If the node which currently has focus, only the double bars are active,
clicking that button puts the current node in the repository.
if the node is edited, the double bar and the left arrow become active:
clicking <= reverts the node and makes the right arrow active
|| puts current version into the repository
rinse and repeat.
The backend would be some sort of db, a versioning system would make
it pretty simple. There could be any number of ways to define the node state.
I don't see @auto files as exceptional, other than the gnx changing.
In my workflow, an @auto node is either a
- class declaration
- method definition
- var declaration
- chunk of doc
In each case, I'd like to have access to versions of the node.
If this were implemented, I think experience would dictate what metatada
was important/useful to store for a node.
And, my interest is not in Leo files which multiple people edit at once, that's
for source code, and a solved problem. I consider my Leo files a reflection of
a personal proclivities in viewing data.
Thanks,
Kent
-
But for those who are concerned with collaborating, the minimalist
first implementation would either mean everyone uses @shadow files to
reconcile their work outside the database, or use a separate vcs at
the same time.
That is, let multiple users store their own instances of the same
complete Leo file in the database, regardless of their lack of
consistency -- and then have a separate commit process using external
files by someone with the appropriate level of rights, who would store
an additional *authoritative* instance in the database. Those working
with the database would then have to access that version.
As far as I understand external files, I think this means the
authoritative version would be created either using @shadow files, or
a separate vcs. Either all collaborators would have to provide
@shadow files and *not* use another vcs; or all users of the database
would be required to provide their external files to a separate vcs.
This would be clunky, but 1) only clunky for people who want to do
this node-level versioning; and 2) it would be one person doing the
authoritative "reconciliation."
Seth
Oh: and I guess this means you would lose all the versions. Version
history isn't shared in this approach.
Hmm, probably: NEVer mind . . . :-)
Seth
Seth
Take a look at my earlier code that does this, in collab branch. Also mentioned in this thread.
Sqlalchemy dependency is unnecessary since the database is simple enough
Okay, that'll be my next thing. I scanned the thread, and I surmise
you're talking about a "clone server."
Going back to that point in the thread, my next comment would be that
the way to do a "sea of nodes" is exactly with the denormalized flat
table with context (and state) fully specified per "node."
That would also make it possible to store tree structure very simply,
exactly as I have described it, with pointers going up to parents. It
would help, to understand that scheme, to point out that the "nodes"
aren't encapsulated object thingies, but actually what I call them:
links. What one might tend to think of as "contained" in "nodes" --
attributes -- are actually stored separately, in a somewhat similar
flat file with key columns providing a full specification of state,
context, and now link. The "sea of nodes" is stored in the first flat
file. The values we often think of as "contained" in those nodes, and
which may be cloned across multiple instances of the same node in the
sea, are stored as distinct attribute values in the second flat file
I've just mentioned. Every piece of the architecture has a full
specification of its context, state, and for link attributes, the link
("node") and scope of relevance of each attribute, stored with it.
That's the next peek into what I'm talking about. I think if you're
talking about the clone server, it will be interesting to contemplate
how that might be adapted to my highly generalized schema.
Seth
> I read a bit of SQLAlchemy tonight, and it does seem promising. Makes
> working with data as objects easy, and seems to promise to do the
> versioning.
Ville's schema is worth looking at. Also the Django ORM is neat,
although SQLAlchemy might make a regular python classes based framework
easier, not sure.
I think there are many places where Leo does something similar to
for n in c.unique_nodes():
blah blah
which might be an issue for DB based outlines if the goal was to have
massive outlines - Leo doesn't do much lazy evaluation.
Even for outlines only as large as the Leo code base for example I
and even with a DB on the local machine I'd expect some performance
hit, although RAM disk buffering can help a lot of course.
Cheers -Terry
I will, somewhat soon . . .
> I think there are many places where Leo does something similar to
>
> for n in c.unique_nodes():
> blah blah
>
> which might be an issue for DB based outlines if the goal was to have
> massive outlines - Leo doesn't do much lazy evaluation.
>
> Even for outlines only as large as the Leo code base for example I
> and even with a DB on the local machine I'd expect some performance
> hit, although RAM disk buffering can help a lot of course.
My approach would scale globally very easily, but yes, traversal might
get intensive in some contexts. I think optimizing that could be done
locally and/or in the interface rather than the backend -- no reason
why you couldn't just do the massive query, then build a list of child
nodes pointer structure locally on that "cursor set."
In the ancient world, the good designs supported bottomless sized
structures, by just moving a window through it, swapping pieces into
memory and on the display. Plus, shockingly enough: my system
actually allows the return of record-oriented db approaches -- it
solves the problem of how to work with arbitrarily complex relational
structure across networks (the basic technical issue that really
killed dBASE and settled us on SQL) -- you can navigate through
"tables" and "relations" record-by-record with all the facility with
which one used to do it using dBASE on the old 8 bit office desktop.
The horror!: database development for the masses again, with a
BASIC-like language, now all over the net! :-)
Seth
> Cheers -Terry
And note: using an index on the parent key seems to be a very
effective way to do the traversal -- but I haven't tried it on massive
outlines.
Seth
And using parent pointers and an index, there's no reason why you
can't "window through" a massive tree, so getting the massive query
isn't even necessary. Even starting at an arbitrary node in the tree,
you can quickly trace up to the top, getting the path from there to
the branch you're on, check whether your node is the first child of
your immediate parent, get any other coordinate child nodes if
necessary -- and then just traverse down x rows from there, just
enough to fill a local memory buffer. Rinse and repeat when you page
down. Easy. Paging up/going backwards would be more tricky, but the
algorithm for limiting how far backward you have to go to make sure
you go only a minimal amount necessary to get the previous page's
worth is a pretty simple one in my mind.
I don't know how and where to get this collab branch. All I found was
this: https://code.launchpad.net/~leo-editor-team/leo-editor/at-shadow-collaboration
Seth
Sorry, contrib branch.
https://code.launchpad.net/~leo-editor-team/leo-editor/contrib
Especially:
http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/files/head:/Projects/leoq/
http://bazaar.launchpad.net/~leo-editor-team/leo-editor/contrib/files/head:/Projects/leoqviewer/
(latter is "reimplementation" of Leo for mobile phones, in C++, QML
and sqlite, so the first link is probably easier to swallow ;-).
The code to dump outline to sql database (and create the database) is here:
You can put it to a node and press ctrl+b
Are you saying this code stores everything that's needed in a
production Leo file?
And if anybody has time, I'd like to hear why edges are stored
separately (as an entity of its own stored in a separate physical
memory structure) -- either in Leo or in general. In Leo would be an
easier way to answer that question. Is it, in the general sense of
what graph databases are for, basically to cover two-way-ness and
many-to-many situations? If so, I don't see outlines as many-to-many.
One could say that's the point of trees.
If you've got topmost nodes, and that makes sense in the sense that
there are no cycles, you're not modeling a general graph "sea of
nodes" idea. If you're trying to extend the possibilities to
something like that, that's not really Leo (as I understand it).
Maybe you just introduce limits or constraints in various ways.
Two-way-ness as a technique for details of implementation (optimizing
algorithms) might make sense, but many-to-many-ness isn't what you're
modeling when you model outlines.
I wish someone could tell me the functional role that edges play in
Leo code, how Leo is implemented using them. Since as far as I can
tell, they're just pointers, I suspect edges are really only important
with reference to details of a visual interface.
If edges are important, and if I'm right that that has to do with
two-way-ness (or many-to-many-ness, though that seems like it would be
apparent that it's not relevant), then I suspect those details can and
will be sloughed away in the code as well as the architecture. I
don't see a need to have edges except in the sense of pointer/key
values stored in the data architecture.
All of that said, I suppose I could just be wrong and clueless about
what will be gained by the node-and-edges graph model.
I see a node as just a way of having data brought together into a
position in a tree. It will have a set of values associated with it
that you want to organize in groups of other similar nodes under
parent nodes that make some sense out of the groupings, acting as a
heading.
Seth
One advantage of this is that the same sql file could be used by a
program that expect to deal with arbitrary graphs (e.g. something that
emits graphviz source notation for plotting).
In Leo in-memory representation, "children" is just a list of pointers
to children, but this is not possible in SQL.
My schema is not full representation yet, it lacks uA's - because my
C++/QML/Javascript implementation doesn't do anything with them.
Adding uA's is trivial, I'd probably add a new table UA that links
nodeid to a string key and a string value (or BLOBS entry for large
binary uA data). If we decide to add this feature to leo proper, we'll
add the uA dumping too.
As long as you keep it a generic graph, what's going to be done with
it by users or developers may become more complex than may be optimal
according to some criterion or other. As a developer, you may find
that you will keep it generic (as in a generic graph) until you decide
what you're going to do with it. Generic in the sense of representing
any arbitrary diagram is different from generic in the sense of, say
"all file formats are outlines." ( :-) ) So you may either optimize
in light of function, or keep it general in service of the prospect of
people doing arbitrary kinds of graph analysis. If you manage by
force of discipline to keep it a generic graph despite the functions
you're developing, then that may in itself be the source of a tradeoff
against various criteria, including speed.
I would stress the simplicity and interoperability that comes from a
generalized formal specification -- it creates a structure that both
supports any kind of outline context (or file format) and provides
terms that let you work with all "files" with common understanding and
interoperability that comes from having such a generalized formal
schema.
That it is all in flat denormalized files should tell you a lot about
both speed and scalability, but I am not situated to offer any kind of
quantification for that.
> One advantage of this is that the same sql file could be used by a
> program that expect to deal with arbitrary graphs (e.g. something that
> emits graphviz source notation for plotting).
>
> In Leo in-memory representation, "children" is just a list of pointers
> to children, but this is not possible in SQL.
. . . well, not possible except in the sense that you can get a set of
records all pointing to the same parent.
> My schema is not full representation yet, it lacks uA's - because my
> C++/QML/Javascript implementation doesn't do anything with them.
> Adding uA's is trivial, I'd probably add a new table UA that links
> nodeid to a string key and a string value (or BLOBS entry for large
> binary uA data). If we decide to add this feature to leo proper, we'll
> add the uA dumping too.
Anybody can add an attribute at any time to my schema, and the schema
doesn't then get any more complex in physical form by the supposed
need to collect sets of relevant attributes into a separate unit (a
table, or a "kind" of node such as a Leo node or some Leo user's
custom type of node) representing some entity to which they are
relevant. All attributes and their values for all contexts and
"nodes" go together in the same flat file, regardless of what context
they're being used in. They have scopes of relevance -- and those are
broader than just particular entities they relate to -- but they are
physically all the same. You can store as complex a logical structure
as you like, but the physical structure remains the same. One
standard set of files with one generic structure, that holds all
formats and all relations between entities. No compounding complexity
in the physical structure ever, though users can go as crazy as they
please logically.
Universal interoperability for free. All file formats are outline
contexts! :-)
> On Wed, Dec 28, 2011 at 5:56 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>> My scheme indeed represents a generic graph. While Leo is a limited
>> graph (DAG), there is no reason to reflect that limitiation in the
>> data structure - unless there is a proof that an alternative is
>> faster.
>
>
> As long as you keep it a generic graph, what's going to be done with
> it by users or developers may become more complex than may be optimal
> according to some criterion or other. As a developer, you may find
Just like the .leo format, normal developers will never see the sql
schema. You can read in this sql in one swoop, yielding the exact same
memory structure we have now.
> That it is all in flat denormalized files should tell you a lot about
> both speed and scalability, but I am not situated to offer any kind of
> quantification for that.
Flat files are sort of off-topic to SQL discussion, I guess...
>> In Leo in-memory representation, "children" is just a list of pointers
>> to children, but this is not possible in SQL.
>
>
> . . . well, not possible except in the sense that you can get a set of
> records all pointing to the same parent.
Pointing at parent doesn't work in Leo because of clones. This is why
Leo outline is more of a DAG than a simple tree, a node can have
multiple parents if it's cloned somewhere in the outline.
> Anybody can add an attribute at any time to my schema, and the schema
Yup, adding uA's to my schema would allow arbitrary key-value pairs as
well. So far, I chose to make the hard "physical" structure a hard
coded part of the schema, I have a hunch it's faster than hiding the
info behind key-value data. Esp. the "edges" table is very fast to
scan even without index, because it's all integer data that can be
slurped in to memory (possibly fitting in cpu cache), and scanned in
linear fashion.
Okay, it sounds like regardless of the functions you're supporting,
you're planning to keep it a generic graph. My point is that keeping
it a generic graph that way may also bring tradeoffs.
>> That it is all in flat denormalized files should tell you a lot about
>> both speed and scalability, but I am not situated to offer any kind of
>> quantification for that.
>
> Flat files are sort of off-topic to SQL discussion, I guess...
No reason you can't query a flat file; just no joins involved, is all.
I should probably have said flat table. It's basically a fact table,
like for star schemas, where when you do joins, you just don't go more
than one level out from the center, and it's very fast and flexible
for queries. The flat fact table just holds the central keys, the
state and context for each "node" or attribute value.
>>> In Leo in-memory representation, "children" is just a list of pointers
>>> to children, but this is not possible in SQL.
>>
>>
>> . . . well, not possible except in the sense that you can get a set of
>> records all pointing to the same parent.
>
> Pointing at parent doesn't work in Leo because of clones. This is why
> Leo outline is more of a DAG than a simple tree, a node can have
> multiple parents if it's cloned somewhere in the outline.
That it's directed doesn't mean you can't do clones with parent
pointers -- except that within the model of a graph scheme one tends
to think of nodes like objects, rather than modeling "nodes" as
metadata for the values associated with them. (And the convention is
to point toward children, I assume.) There's no reason you can't have
the same key value designating the same "node" in more than one place
in the outline, pointing up to a different parent in each case; the
values associated with that key value aren't "contained" in it like a
node -- they're just *relevant* everywhere that "node" is used.
>> Anybody can add an attribute at any time to my schema, and the schema
>
> Yup, adding uA's to my schema would allow arbitrary key-value pairs as
> well. So far, I chose to make the hard "physical" structure a hard
> coded part of the schema, I have a hunch it's faster than hiding the
> info behind key-value data. Esp. the "edges" table is very fast to
> scan even without index, because it's all integer data that can be
> slurped in to memory (possibly fitting in cpu cache), and scanned in
> linear fashion.
Moving from the particular physical structure you're using is
something you might do in order to virtualize it in service of
implementing useful generic functions that would be available to any
kind of outline. That would involve pointers. You may never want to
do that sort of thing, just do different kinds of operations on a
graph scheme. But watch for when you start adding levels of
indirection (in the form of pointers/key values) in the service of
generality. That's the impetus I've generalized into a universal data
architecture.
Graph schemes have their own kind of simplicity. In databases you
denormalize in order to eliminate the complexity of relational
structures, and that means create a wide, flat fact table with key
value columns for every entity you want to work with.
Seth
> No reason you can't query a flat file; just no joins involved, is all.
Yeah, and you would have to implement sql engine yourself ;-).
>> Pointing at parent doesn't work in Leo because of clones. This is why
>> Leo outline is more of a DAG than a simple tree, a node can have
>> multiple parents if it's cloned somewhere in the outline.
>
>
> That it's directed doesn't mean you can't do clones with parent
> pointers -- except that within the model of a graph scheme one tends
> to think of nodes like objects, rather than modeling "nodes" as
> metadata for the values associated with them. (And the convention is
> to point toward children, I assume.) There's no reason you can't have
> the same key value designating the same "node" in more than one place
> in the outline, pointing up to a different parent in each case; the
> values associated with that key value aren't "contained" in it like a
> node -- they're just *relevant* everywhere that "node" is used.
Not sure whether I'm following you, but if you want to have "parent
pointer + child index" representation, you would re-introduce the
already killed concept of vnodes and tnodes to support clones.
Can you paste your schema here, since I may be misunderstanding
something? What I have is just a bog standard graph representation I
found on the internet, as I'm averse to inventing solutions to
problems for which simple solution already exists.
Separate edge section was suggested for next gen .leo xml file format
as well - it would allow representing soft links (that can be cyclic!)
directly in the graph section as first class citizens, instead of the
uA solution that is used now.
Huh? Why? If you're talking about tree traversal, I've addressed
that elsewhere in this thread. But querying a flat table in SQL . . .
what are you going on about?
>>> Pointing at parent doesn't work in Leo because of clones. This is why
>>> Leo outline is more of a DAG than a simple tree, a node can have
>>> multiple parents if it's cloned somewhere in the outline.
>>
>>
>> That it's directed doesn't mean you can't do clones with parent
>> pointers -- except that within the model of a graph scheme one tends
>> to think of nodes like objects, rather than modeling "nodes" as
>> metadata for the values associated with them. (And the convention is
>> to point toward children, I assume.) There's no reason you can't have
>> the same key value designating the same "node" in more than one place
>> in the outline, pointing up to a different parent in each case; the
>> values associated with that key value aren't "contained" in it like a
>> node -- they're just *relevant* everywhere that "node" is used.
>
> Not sure whether I'm following you, but if you want to have "parent
> pointer + child index" representation, you would re-introduce the
> already killed concept of vnodes and tnodes to support clones.
>
> Can you paste your schema here, since I may be misunderstanding
> something? What I have is just a bog standard graph representation I
> found on the internet, as I'm averse to inventing solutions to
> problems for which simple solution already exists.
Your aversion would likely stand in the way of appreciating it then.
I would have to get it off an old hard drive. I wonder if I zipped it
up and put it online somewhere. I will dig out an SQL tool and create
the main link table, link attribute, and link attribute value tables,
from memory. The indexes are done in dBASE, and are rather unique so
I won't reconstruct them.
This is better anyway, to describe the system gradually.
However, I've drawn the structure of the main "node" outline table
above in this thread. Attributes and their values are stored in
separate tables with the same key values, now including the "node's"
key (and the attribute and the attribute value).
Seth
> Separate edge section was suggested for next gen .leo xml file format
> as well - it would allow representing soft links (that can be cyclic!)
> directly in the graph section as first class citizens, instead of the
> uA solution that is used now.
>
>>> No reason you can't query a flat file; just no joins involved, is all.
>>
>> Yeah, and you would have to implement sql engine yourself ;-).
>
>
> Huh? Why? If you're talking about tree traversal, I've addressed
> that elsewhere in this thread. But querying a flat table in SQL . . .
> what are you going on about?
Naive assumption is that database == sql database, unless explicitly
stated otherwise (e.g. "object database", nosql database, etc...). If
we venture outside standard databases, we are talking about custom
binary formats more than anything else - and that's a whole different
conversation that is not of interest to me at this time (seems like
premature optimization when both xml and sqlite seem to perform
acceptably).
Overall, threads like this are too long and abstract for me to read. I
trust that relevant bits get summarized under new subject line, as is
usually the case :).
Many people do star schemas in SQL; it's just a design for a
particular kind of use -- data mining. I'm an old school dBASE coder.
Everything I'm talking about here is traditional databases, whether
dBASE or SQL. When I say it's compatible with big data, I mean it
being flat tables makes it optimal for putting in those environments.
The tree traversal I do is the most esoteric thing we're talking
about, and it's done using dBASE indexes, aside from the general idea
of universal outline contexts.
Seth
I recreate the structure in leoqviewer, but this is c++ code. Dumb algorithm to create the tree is easy to do for python Leo as well, but if I write full file format support I'd like to do it the fast way - first create nodes, then put them to their place in the graph, in just 2 passes.
> I am able to recreate nodes from the sqlite db but I do not fully
> understand how your edges scheme can be used to recreate a tree
> design. It seems easier to simply give the db a field that tells
> whether a node is a child of another node and if so, which one. Am I
> missing something?
- Because of clones, a node can have several parents. Therefore, a
single slot for parent id won't work
- Even if you were able to list all parents (which SQL does not
allow), you would have to specify the child index for every parent (to
retain sibling order)
Given the list of edges (a,b) :
If you want all children of node N, list edges where N is the 'a'
node, and store the 'b' node from every edge. To find parent(s) of N,
list edges where N is the 'b' node.
> Given the list of edges (a,b) :
>
> If you want all children of node N, list edges where N is the 'a'
> node, and store the 'b' node from every edge. To find parent(s) of N,
> list edges where N is the 'b' node.
In C++ & SQL:
I hope you do implement both phases: the full format export, and
putting them into the outline. Leo's functionality seems so intricate
and obscure that I feel I can only deal with it when I know how the
full thing goes in and out.
I hope to plug in my old hard drive tonight and get the app I built on
the schema I'm describing off of it. I promise not to carp over
abstract theory -- having the schema (and dBASE code!) will help a
lot.
Seth
Two hard drives, both seem dead using my handy USB cable - IDE thingy.
Maybe later I'll try swapping them in on my mom's tower, which is a
bigger hassle. Or create it in SQL . . .
One can use two indexes instead of having an edges entity:
To find all children of node N, seek N in an index on the parent key
field. Skip through until it doesn't match.
To find all parents of node N, seek N in an index on the node key
field. Skip through (reading the parent key field) until it doesn't
match.
Seth
> On Thu, Dec 29, 2011 at 6:37 PM, Ville M. Vainio <viva...@gmail.com> wrote:
>>
>> Given the list of edges (a,b) :
>>
>> If you want all children of node N, list edges where N is the 'a'
>> node, and store the 'b' node from every edge. To find parent(s) of N,
>> list edges where N is the 'b' node.
>
>
> One can use two indexes instead of having an edges entity:
>
> To find all children of node N, seek N in an index on the parent key
> field. Skip through until it doesn't match.
This doesn't work if N is cloned somewhere, i.e. N has several parents.
Node key - Parent key
A - N
B - N
C - N
D - X
E - X
N - D
N - E
X - D - N - A
\ B
\ C
\ E - N - A
\ B
\ C
To find all children of node N, seek N in an index on the parent key
field. Skip through until it doesn't match.
Node key - Parent key
A - N
B - N
C - N
To find all parents of node N, seek N in an index on the node key
field. Skip through (reading the parent key field) until it doesn't
match.
Node key - Parent key
N - D
N - E
Seems to work . . .
Seth
I don't really understand this. What do you mean by 'index'? In rdbms, index can only contain data that can be trivially derived from tables. Everything needs to work without index as well.
I do some clever index stuff that I think only the old XBASE
environments allow -- but I'm surprised that this example doesn't seem
legit -- it's only using two simple one-field indexes.
I suspect this is set orientation at work: SQL makes you work with
query response sets, rather than navigating around record-by-record.
Here's simple XBASE code that does the above. The indexes are not
complex for this example:
USE nodes
INDEX ON nodes->parent TO parentindex
INDEX ON nodes->nodekey TO nodeindex
x = "N"
SET INDEX TO parentindex
SEEK x
do while nodes->parentindex = x
PRINT nodes->nodekey, nodes->parent
SKIP
enddo
SET INDEX TO nodeindex
SEEK x
do while nodes->nodeindex = x
PRINT nodes->nodekey, nodes->parent
SKIP
enddo
Seth
or more to the point: nodes as encapsulated objects vs. nodes
represented in more than one record
I think that's the difference, anyway.
Seth
This is a nodes table, two columns/fields:
> Node key - Parent key
> A - N
> B - N
> C - N
> D - X
> E - X
> N - D
> N - E
This is the tree represented by the above records:
> X - D - N - A
> \ B
> \ C
> \ E - N - A
> \ B
> \ C
(eom)
So your nodes table is essentially my EDGES table :)
Ah! Will check that, but I think you're probably right. And your
nodes table would be more appropriately identified with my attributes
table(s).
The "bottom level" for me is the individual attribute values. You
might call my entire structure "edges gone wild." I don't have
encapsulation at the node level.
Maybe this will let me match it up with Leo format. Would be good to
see some sort of "certified" db representation of Leo, with code that
gets it in and out.
Seth
I read all the thread on db-oriented version of Leo. I got lost on
implementation details (what is a shame... :-/), so I'm just want to say
that the things that Alia points here about a single db file containing
all data (external or not) of a project, and collaboration for free is
already possible working with Leo + Fossil without the need to reinvent
fossil, as Edward fears, but just talking more explicitly to it from
Leo. That's what me and my team are doing. It is a newbie
non-programmers approach but is working for us. This is what we do to
collaborate using Leo + Fossil to work collaboratively on a project
called "The Project":
We create a Folder called "TheProject" with the following structure:
TheProject/
|_ theProject.fossil
|_ collaborator1TheProject.leo
|_ collaborator1TheProject.leo
:
|_ collaboratorNTheProject.leo
|_ Folder1/
|_ file1-1.abc
|_ file1-2.xyz
:
|_ file1-n.xyz
|_ Folder2/
|_ ...
:
|_ FolderN/
|_ ...
There is still a lot of shared convention to make this work. The idea is
that TheProject contains .leo files which are _personal views_ of each
collaborator on the data of the project, usually scattered across
external files, which are inside the same folder. theProject.fossil is
the personal repo of each collaborator containing the history of the
collaboration and the files. We don't need to have versions of each
collaborator on external files or on the fossil repo, because putting
this data in conversation is the work of fossil, using a central
repository or in a p2p fashion (if Internet is not working but we have
still Intranet, or at the first stages of a project when there was no
central repository). Each of us works on external data or in leo
personal views as it fits the workflow of everyone and use fossil for
coordination. We abandon, for the moment the idea of shared Leo trees as
shared understanding of a project, but we can see others peoples threes
when we want and use the Nav button to locate the particular info and
see the context of it in the personal view of somebody else. We use
some ideas of GoboLinux about[1] having a personal convention for
organizing the files and symlinked to canonical Gnu/Linux trees.
[1] http://en.wikipedia.org/wiki/GoboLinux
This convention and hierarchy let us to carry the project and its
history in a single Folder, so it can be seen as plain data or as a
fossil repository. There is replication on information, having it in the
fossil repo and in TheProject folder, but having this alternative
approach from the point of view of the repo or the plain files,
justifies this redundancy. Still we have a lot of portability carrying
just one folder and its contents or just carrying the repo.
Now having convention over configuration was our choice, but we imagine
a more automagical world: At first we want to create buttons and
commands inside Leo to all the external operation of fossil, but now
seeing this discussion I imagine further levels of conversation between
fossil and Leo. The idea of a sea of nodes in a Leo database, seems a
lot like the idea of a sea of objects in Smalltalk images. And this idea
of emergence in the sea of data seems more suited for a NoSQL database,
but fortunately Fossil is already a NoSQL database[2], which supports
external files, collaboration and portability.
[2] http://www.sqlite.org/debug1/doc/trunk/www/theory1.wiki
So the question is, how to increase conversation between Fossil and Leo
to solve some of the problems addressed on this thread? My first idea
was to use the DAG[3] support in Fossil to map DAG in Leo, so Fossil
would have kind of special trees which are not for tracking project time
line[4], but Leo trees and the second one was using NoSQL sea of nodes
implemented in the NoSQL fossil database. As I said I have not the
proper knowledge about implementation details, so I will hold this ideas
until a more knowledgeable person tell me more about the implementation
or I have a proper context to play with them.
[3] http://www.sqlite.org/debug1/doc/trunk/www/branching.wiki
[4] http://www.sqlite.org/debug1/timeline
Cheers,
Offray
On 12/26/11 08:19, Alia K wrote:
>
> So, just speculating out loud, let's assume the following:
>
> - the db maintains leo tree state: node content, structure and
> versions thereof which are saved by default (until an explicit 'flush'
> or 'shrink' command does away with prior or unneeded versions)
>
> - all external files are just rendered views of a particular
> (versioned) state of the leo db. i.e. filesystem objects are generated
> on demand.
>
> - the leo ui gives the user the option to save, view and edit versions
> of leo nodes (and their head + body data)
>
> - the db fulfils the part of @shadow sentinel files
>
> If we are using sqlite, we would therefore get one file (a leo project
> file) which carries within it (if unshrunk) a versioned history of all
> the changes made to all nodes in terms of content and structure.
>
> If we are using a networked RDBMS (e.g mysql, postgres, sqlserver,
> oracle, etc.), we get multi-user leo projects for free (because each
> change to the leo node structure is saved on the server and references
> a particular user.)
> This means that the sql data model for leo projects should be able to
> capture all changes by multiple users to all aspects of the project
> data and structure.
I think the single thing that will best get db-oriented Leo going will
be someone just writing code that stores complete Leo files in a
database of any kind at all. Code to put that database into the
outline interface app would complete the picture for lots of people to
start providing running code that does collaboration and versioning.
That would, for instance, make it easy for you or Alia to demonstrate
what Fossil can do.
Maybe the thing to do is to think in terms of a "reference
implementation" of the basic, non-collaborative and non-versioning
classic Leo, and embrace the notion that many backends could be
developed for it. That "reference implementation" could add functions
for collab or versioning once someone has shown how it could be done.
Different approaches would likely have different characteristics --
something that is designed for concurrent execution a la Rich Hickey
concepts will behave differently from other approaches, but we'd see
the implications of each approach while each solution would have to at
least meet the reference implementation requirements before we'd see
it as "mature."
I think your post was easily the most constructive one in this entire thread.
Seth
On 01/03/12 00:51, Seth Johnson wrote:
>
> I think the single thing that will best get db-oriented Leo going will
> be someone just writing code that stores complete Leo files in a
> database of any kind at all. Code to put that database into the
> outline interface app would complete the picture for lots of people to
> start providing running code that does collaboration and versioning.
>
> That would, for instance, make it easy for you or Alia to demonstrate
> what Fossil can do.
>
> Maybe the thing to do is to think in terms of a "reference
> implementation" of the basic, non-collaborative and non-versioning
> classic Leo, and embrace the notion that many backends could be
> developed for it. That "reference implementation" could add functions
> for collab or versioning once someone has shown how it could be done.
> Different approaches would likely have different characteristics --
> something that is designed for concurrent execution a la Rich Hickey
> concepts will behave differently from other approaches, but we'd see
> the implications of each approach while each solution would have to at
> least meet the reference implementation requirements before we'd see
> it as "mature."
>
> I think your post was easily the most constructive one in this entire thread.
>
Thanks. About implementation my idea is to start with a different
approach. The idea is to "teach fossil to Leo" and having in that way a
default support for versioning and collab in Leo, so the convention for
working together in a collaborative p2p fashion that I showed would be
automatically supported by Leo + Fossil. Adding an external file to Leo
would add it to the Fossil repo and all the commands for working with
Fossil would be supported inside Leo and so on. Then I will try to
deconstruct the Leo data structure, if this is needed, so it can be
supported by the NoSQL database or the DAG of Fossil. So the idea is to
have a particular implementation of collaboration + versioning that may
be abstracted later to work with more approaches.
Cheers,
Offray
But if there was a database representation of the Leo document
already, with code for saving that plus putting it back into Leo,
wouldn't that give you the information and/or understanding you need
to demonstrate your approach? Including seeing where in the code to
teach Leo? My impression is the only thing that keeps people from
going ahead with what you propose, or demo'ing any approach at all, is
uncertainty about having an adequate model that works with Leo's
intricacies.
Seth
> My impression is the only thing that keeps people from
> going ahead with what you propose, or demo'ing any approach at all, is
> uncertainty about having an adequate model that works with Leo's
> intricacies.
I think that's the case. @<file> handling is complex, I'm not sure how
familiar with it Ville is, I've poked around the edges a bit, but don't
know how it would interact with loading to / from a DB.
And even if that wasn't an issue, the next step beyond using a DB to
replace the file system would also be complex, whether it was sharing
or whatever. Well, perhaps versioning wouldn't be so hard, but
sharing is certainly challenging.
Cheers -Terry
> I think that's the case. @<file> handling is complex, I'm not sure how
> familiar with it Ville is, I've poked around the edges a bit, but don't
> know how it would interact with loading to / from a DB.
My code just dumps everything, including under @<file> stuff, to db.
This is by design, as I want to ship .leoq as a standalone file that
you can send to your phone in email, or whatever.
Not writing @file nodes is easy, just stop traversal when you see one.
Likewise, reading back in is easy, just expand the whole tree and
after that do the @file node handling for the whole tree.
> And even if that wasn't an issue, the next step beyond using a DB to
> replace the file system would also be complex, whether it was sharing
> or whatever. Well, perhaps versioning wouldn't be so hard, but
> sharing is certainly challenging.
I think it's best to left advanced use cases like that unhandled. Even
if coding it wasn't too hard (it probably is ;-), people want simple
and reliable workflows for their creative work, users are paranoid
about losing data if they can't completely understand what is
happening under the hood (e.g. clone wars must have spooked of many of
us)
I think we should leave everything new unhandled. Just regard the db
as a file system you're saving your own Leo files into. The only new
thing should be, instead of just saving it as one file as a whole,
store the elements as separate units (records, nodes, whatever), but
just make sure that you have a save and load procedure for that db
representation, that works because all it does is save your file in
that form and load it. Then everything current in Leo will just work.
Plus people would know what they need to store in the db
representation, and what they need to feed to the load procedure.
They can use that and hack at different ideas, just knowing that at
bottom the classic, basic Leo is still going to work so long as they
represent it equivalently to that reference representation, and feed
the load procedure with what it needs.
Seth
And why not, for the @file business, keep those saving to the local
standard file system: just put the .leo file in the db for now. In
the routine for saving, have a flag that once it's set, will save the
.leo file in the db, as well as save it in the traditional fiel system
-- or one or the other. I don't see why we have to start by solving
distributed, collaborative work on the @file stuff, with versioning.
Just save the .leo file. Let people suggest different approaches
starting from there.
(Getting redundant, sorry. I'm done. :-) )
Seth
Right. But that doesn't get us a development context that facilitates
people's participation in developing db solutions.
All that's needed is a db representation and code that gets that in
and out of Leo. Then db developers will be empowered to proceed in
all sorts of ways, and Edward can ponder how and whether to
incorporate the various approaches. Coupled with that process, we get
a separation of the db representation from the interface functions
that might make Leo something that could be put on any back end that
provides for its current reference implementation. That reference
implementation does not need to have distribution, collaboration, or
versioning. Or concurrent executability, to address Eoin's
suggestions. All of those things can all be carefully considered
after they've been demonstrated with running code that at least makes
the current Leo reference implementation work.
(Oops, I did it again.)
Seth
If I want to zip up the file with its external files, I can always do
that. But if my .leo file is
Well my approach is that deconstructing textual computer interaction
requires to think in two axes: one of structure in space (Leo outlines
are this) and one of structure in time (DVCS/SCM, specially Fossil are
this). So using my strategy will demonstrate my approach and will have
the advantage of solving a day to day problem in the way that my team
and I work and at the same time could evolve in a more abstract solution
as you propose where Leo DOM could be mapped to database vía DAG or Sea
of nodes in a NoSQL database. Living in the Global South ("Developing
Countries" as called by someones) and working with digital technology is
about developing this kind of strategies that deal with the day to day
problems at first, at the same time that envision some abstract
structure. Is about "acting contextually but thinking systemically".
Cheers,
Offray
Okay, but in the absence of a db representation and code to get it in
and out of Leo as it is, you will have to be the one to provide it --
only in your case, you're planning additional functionality. You can
certainly do that, but I would say understanding of Leo intricacies is
the main hurdle. That's what I see as lacking, that somebody can
provide either with nothing new or with your feature set. Whichever
way it gets done, it's the fact that the db representation and code
for loading/saving it are done that will provide the necessary
understanding people need to start contributing on the db front.
Seth
Seth