ENB: Are huge Leo outlines possible?

85 views
Skip to first unread message

Edward K. Ream

unread,
Oct 24, 2020, 9:42:49 AM10/24/20
to leo-editor
Issue #1123 is arguably the most interesting issue with the "Sabbatical" label.

Could be of any use as a huge database, say one representing the human genome project? Here are some new thoughts.

Background

At present, Leo represents vnodes in .leo.db files using sqlite tables. On startup, Leo loads (creates) vnodes for all the represented vnodes.

Loading millions of vnodes seems out of the question. There is a practical limit to how many vnodes Leo can create.

Similarly, drawing all vnodes with Leo's existing screen drawing code has a practical upper limit.

New strategy

Rather than loading (and drawing) all vnodes for .leo.db files on startup, Leo might load only a subset of vnodes, called the loaded vnodes. Leo's .leo.db format could easily be extended to remember loaded vnodes. A new .leo.db file would have only one loaded node.

Leo's all_positions and all_unique_positions generators would apply only to loaded vnodes. Ditto for Leo's legacy search commands.

Leo would add to the loaded vnodes set using new commands, say load-from-sql, which would issue an issue an sql query and populate new loaded vnodes. You can think of load-from-sql as an extension of Leo's clone-find commands. Deleting nodes would affect only the loaded vnodes. Such deletions would not affect the underlying db. Instead, another new commands, say delete-from-sql and permanently-delete-nodes would do that.

Summary

The ideas presented here limit the number of loaded vnodes. In effect, the loaded vnodes become a Leonine view of the underlying db.

It is an open question how useful such Leonine views would be.

Leo's existing commands apply only to the loaded vnodes. New commands would add, delete and update the underlying db using either sql queries or selected (trees of) vnodes.

All comments welcome.

Edward

P.S. The associative model of data is based on a DAG, just as normal Leo outlines. However, at this time I see no way to leverage that to Leo's advantage. Indeed, most people want sql queries, which the associative model does not support.

Edward

Thomas Passin

unread,
Oct 24, 2020, 12:53:02 PM10/24/20
to leo-editor
Remembering back to long ago when we only had 640k of RAM at the most, there were editors that kept three screens of data in memory at once - the current page, the previous page, and the next page.  When the user scrolled forward (say), parts of the next page would be brought in, and when necessary another page would be read in from disk. An analogy would be a long continuous (paper) scroll, where only one part in the middle would be visible at any one time.

This scheme would seem to fit right in with your thoughts above.

Edward K. Ream

unread,
Oct 24, 2020, 1:07:15 PM10/24/20
to leo-editor
On Sat, Oct 24, 2020 at 11:53 AM Thomas Passin <tbp1...@gmail.com> wrote:
Remembering back to long ago when we only had 640k of RAM at the most, there were editors that kept three screens of data in memory at once - the current page, the previous page, and the next page.  When the user scrolled forward (say), parts of the next page would be brought in, and when necessary another page would be read in from disk. An analogy would be a long continuous (paper) scroll, where only one part in the middle would be visible at any one time.

This scheme would seem to fit right in with your thoughts above.

Alas not. Leo's clone-find commands would pull in the entire db. That's why we need a search that is disconnected from positions and generators.

Edward

David Szent-Györgyi

unread,
Dec 9, 2020, 8:35:54 AM12/9/20
to leo-editor


On Saturday, October 24, 2020 at 1:07:15 PM UTC-4 Edward K. Ream wrote:
That the clone-find commands are written to work on the entire database does not change the engineering considerations that arise from the goal of working on a database that is either too large to fit in RAM or is so large that the existing code becomes too slow for a database of the desired size even if that database does fit in RAM. 

I suggest looking up the user manuals for the text editor in which the original Emacs was first written, namely TECO. TECO was written for DEC computers with tiny address space in which to hold text, and was written in terms of editing  the current contents of the buffer that held the address space's worth of the file under modification; it features flexible commands for paging the file's contents into the editor's buffer, allowing one to modify a file too big to fit. 

TECO was the power user's text editor on the PDP-8/e that was the first computer I used, back in the early Seventies - a machine with 12K words of RAM, which were divided into 4096-word fields; the computer could directly address the current field as pages of 128 words, meaning that addressing the current field required indirect addressing. TECO allowed me to think in terms of a buffer without tracking fields and pages - an abstraction you might want to consider for Leo to handle "huge outlines".

It might be of interest to note that the original Emacs was a series of macros written for TECO - Emacs originally stood for "Editor Macros". 

tbp1...@gmail.com

unread,
Dec 9, 2020, 10:11:01 AM12/9/20
to leo-editor
Ah, the pdp-8, a trip down memory lane.  I used an 8i extensively in the early 70s, but did not make the acquaintance of TECO.  After looking it up on Wikipedia, I' m glad I didn't.  Remember how 3 ascii characters were packed into two 12-bit words?  And while the 8e may have come with 12k of RAM, the 8i came with 4k, unless you had the money to get the extension to 8k (which ours had).

Edward K. Ream

unread,
Dec 9, 2020, 10:30:54 AM12/9/20
to leo-editor
On Wed, Dec 9, 2020 at 7:35 AM David Szent-Györgyi <das...@gmail.com> wrote:


On Saturday, October 24, 2020 at 1:07:15 PM UTC-4 Edward K. Ream wrote:
On Sat, Oct 24, 2020 at 11:53 AM Thomas Passin wrote:
Remembering back to long ago when we only had 640k of RAM at the most, there were editors that kept three screens of data in memory at once - the current page, the previous page, and the next page.  When the user scrolled forward (say), parts of the next page would be brought in, and when necessary another page would be read in from disk. An analogy would be a long continuous (paper) scroll, where only one part in the middle would be visible at any one time.

This scheme would seem to fit right in with your thoughts above.

Alas not. Leo's clone-find commands would pull in the entire db. That's why we need a search that is disconnected from positions and generators.

That the clone-find commands are written to work on the entire database does not change the engineering considerations that arise from the goal of working on a database that is either too large to fit in RAM or is so large that the existing code becomes too slow for a database of the desired size even if that database does fit in RAM. 

Yes, and those considerations have nothing to do with teco :-)

Edward

jkn

unread,
Dec 9, 2020, 11:42:02 AM12/9/20
to leo-editor
On Wednesday, December 9, 2020 at 3:11:01 PM UTC tbp1...@gmail.com wrote:
Ah, the pdp-8, a trip down memory lane.  I used an 8i extensively in the early 70s, but did not make the acquaintance of TECO.  After looking it up on Wikipedia, I' m glad I didn't.  Remember how 3 ascii characters were packed into two 12-bit words?  And while the 8e may have come with 12k of RAM, the 8i came with 4k, unless you had the money to get the extension to 8k (which ours had)

OT: I used a similar editor to TECO on 8-bit CP/M systems back in the early 80's. This was PMATE, which IIRC was derived from "Mike Aronsen's Text Editor". You had a command line and an edit screen. The commands you could use in the command line were exceedingly crude, but surprisingly capable with a bit of practice. It was entertaining to speculate what set of commands a random bit of line noise would correspond to...

One of my jobs in my early working life was to customise installations like this so that PMATE would work in the given system. You had to configure simple things like screen width and height, then harder things like "how to move cursor to position X, Y", and then on to even more complicated things via custom assembly language routines.

Happy days...
   J^n

David Szent-Györgyi

unread,
Dec 10, 2020, 8:25:45 PM12/10/20
to leo-editor
On Wednesday, December 9, 2020 at 10:30:54 AM UTC-5 Edward K. Ream wrote:
Yes, and those considerations have nothing to do with teco :-)

Leo doesn't need the engineering underlying TECO - fortunately, it doesn't have to work on an architecture of segmented memory - but editing text files too large to fit in RAM needs careful design of the user interface. Here's a short description of TECO's design:

The original TECO editors were created when computer systems were very memory limited, and were therefore optimized to run in small memory configurations. One way that this was accomplished was that TECO was a pipeline editor. Text was read from the input file into an edit buffer, and then written out to the output buffer. The only part of the file which was resident was the edit buffer, and this was typically kept quite small. Once text was paged out to the output file, it could not be called up again without writing out the entire contents of the files, and then re-reading to the point in question.

If that is not what you plan to deliver, you're going to need to figure out an abstraction that doesn't kill performance, and for that you're going to need to define use cases.

An image editor that I support as part of my job handles multi-image files that are indexed by number. Navigation among the images in a given image editor window is a matter of indicating which image is current. This editor allows tens of thousands of images in a single file; it loads into its memory buffer as many images as will fit. As the user moves from the first image to the last, the editor swaps images between a scratch file on disk and RAM. For operations that run image-by-image from first to last, that works fairly well. It slows to a crawl on other patterns of access to images, because the engineering cannot anticipate all cases. For some of those cases, the performant solution is to provide RAM enough to hold the all the images in the desired file - or two such images, to provide for creation of a modified result. 

For some customers, "RAM enough" means 128 GB. The image editor does not need to load all the images from disk when opening a file, but saving a multi-image file of 32 GB of data takes time, even on a current-day workstation - the data comes from memory, or from the scratch file, it has to be written to disk, and the metadata for each image and the file as a whole have to be written as well. 

David Szent-Györgyi

unread,
Dec 10, 2020, 8:51:30 PM12/10/20
to leo-editor
On Wednesday, December 9, 2020 at 10:11:01 AM UTC-5 tbp1...@gmail.com wrote:
Ah, the pdp-8, a trip down memory lane.  I used an 8i extensively in the early 70s, but did not make the acquaintance of TECO.  After looking it up on Wikipedia, I' m glad I didn't.  Remember how 3 ascii characters were packed into two 12-bit words?  And while the 8e may have come with 12k of RAM, the 8i came with 4k, unless you had the money to get the extension to 8k (which ours had).

Some of TECO's ugliness has to do with the media it supported, including (if I recall correctly) paper tape(!). I used it with files on floppy disk, fortunately for me. 

The 8/e I used had two ASR-33 teletypes and two (two!) eight-inch floppy disk drives, using disks which held 250KB if I recall.  We used OS/8 for single-user computing with access to the disk drives, and a time-sharing BASIC for two-user computing with paper tape for program storage. One of our projects was to modify a time-sharing BASIC with support for DEC's floppy disk drive unit to work with the third-party drive unit we used; after several years of work by us high school students, we figured out that the BASIC depended on the interrupts generated by DEC's floppy controller; the third-party unit used programmed I/O, and could not generate interrupts!

Ah, nostalgia. 

tbp1...@gmail.com

unread,
Dec 10, 2020, 9:31:41 PM12/10/20
to leo-editor
I remember writing the manual for the electronic device I had designed on our 8k 8/i using the asr-33 teletype. We had no other text display, so that was the output display as well. I can't remember any more how I edited it after first typing it.  We did have a fixed head disk unit, and I must have stored it there, but it's been too long...
Reply all
Reply to author
Forward
0 new messages