Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Are there any real development editors out there?

6 views
Skip to first unread message

Qfan

unread,
May 9, 2000, 3:00:00 AM5/9/00
to

Is it just me, or is there anyone else out there that is just
unsatisfied with the program development editor/environments that are
out today. I have been a full time programmer for about 10 years now
and have been primarily using Windows for most of that time (some Unix
and some VAX). Recently I began using Emacs and XEmacs for Windows and
think these editors are by far the best that I've tried so far. As a
matter of fact, I suspect with some perseverance I could make either
of these do much of what I want them to do. But I'm still not happy. I
have seen some things that make me want significantly more from a
development environment.

It seems to me that the best existing Windows development environments
give users a glimpse of what it would be like to have a really helpful
environment to develop code in. They, by no means, deliver, they just
give you a glimpse of some possibilities. As far as Unix goes, these
environments give you the ability to configure/do anything within the
limits of the tools available but don't break any new ground in
actually helping the developer write and understand code. I'm not
trying to start any flame wars as Emacs and their ilk (which I like
very much) can do some amazing things with their preprogrammed
language features and their integration with tools such as
CVS/diff/formatting etc... It's just not enough to make me happy.

What I want to know is why, in this day and age, can't I have the
following features in an editing environment:

1) Cross platform and open source with the possible exception of a
custom user interface for each platform supported. I want a graphical
user interface that allows multiple windows and dialogs so I can
display data in various fonts and use icons etc.. to display
information intuitively and efficiently.
2) Fast. Did I say fast? I mean FAST. No lagging after hitting
keystrokes, no waiting for searches to complete, no waiting 10 minutes
for the damn thing to start because I want to type a 1 line email.
3) Virtually unlimited file sizes and line lengths (without
sacrificing significant performance).
4) Multiuser. Allow multiple people to edit the same file at the same
time. This isn't so important but this leads into number 5.
5) Fully multi-threaded, so that I can have tons of user agents
examining parsing and hyperlinking my document as I type with NO
interference to my editing. I want instant gratification while I'm
typing. NO slow batch mode agents allowed.
6) Separate the document from the views using a communication protocol
and the Model View Controller paradigm and which should allow multiple
view ports to view the same document at the same time. I also want
this to happen so that I can run the editing server and the interface
on different machines over a network. I'm still not willing to
sacrifice speed either. This should allow multiple user interface
clients to be developed using any suitable language.
7) Nice object oriented, flexible, powerful, standard embedded
extension language(s) running on the server side. I'm leaning toward
Python but any extension language should be ok if the back end is done
properly.
8) Extensive hyperlinking and tagging facilities inside the document
engine.
9) Support for advanced display techniques like slicing and folding. I
also want support for folding using user definable display criteria
like all functions accessing a specific class member.
10) A pattern matching engine that is more flexible than regular
expressions. I'm not sure but maybe something like LPM. Something that
allows language parsing grammars to be developed dynamically and
entirely (or almost entirely) using the pattern matching engine.

With all of the above features, advanced language sensitive agents
that are actually helpful and display information in an intuitive
manner should be able to be developed. Helper modules for
differencing, revision control, compiling, debugging, and integration
with other development environments should also be possible. I know
that a document management back end that is capable of doing the above
is possible because I'm in the process doing it right now. I'll never
be able to develop the smart agent stuff alone or even some of the
advanced user interface parts but I can supply a cross platform back
end capable of supporting the above features. For now I'm writing the
back end for two reasons, for the fun of it and so that I can have an
embeddable back end to other programs needing the ability to flexibly
modify text files. Are there any serious programmers out there
interested in developing a next generation development environment or
am I on crack?


Mark A. Odell

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On Tue, 09 May 2000 12:53:42 -0500, Qfan <Qf...@dontbother.replying.com> wrote:

I don't know if you're on crack but look at www.premia.com for
CodeWright. I'm just a satisfied user of 6 years.

Qfan

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On Tue, 09 May 2000 18:36:30 GMT, Mark A. Odell <ma...@embeddedfw.com>
wrote:

Unfortunately I am doing more and more development on Unix systems
(and loving every minute of it). I need cross platform capability and
I believe CodeWright is Windows only.


Jim Wilson

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On Tue, 09 May 2000 12:53:42 -0500, Qfan
<Qf...@dontbother.replying.com> wrote:

>
>Is it just me, or is there anyone else out there that is just
>unsatisfied with the program development editor/environments that are
>out today. I have been a full time programmer for about 10 years now
>and have been primarily using Windows for most of that time (some Unix
>and some VAX). Recently I began using Emacs and XEmacs for Windows and
>think these editors are by far the best that I've tried so far. As a
>matter of fact, I suspect with some perseverance I could make either
>of these do much of what I want them to do. But I'm still not happy. I
>have seen some things that make me want significantly more from a
>development environment.

[snip]

You're most certainly going to hear from the vi an emacs contingent
but have you personally checked out the other editors? CodeWright,
CRiSP, SlickEdit, Zeus, MultiEdit and MED are all pretty decent. Most
of what you seek is probably included right out of the box, but if not
they can be 'extended' to include custom features. While they all have
their own strengths and weaknesses they are actually pretty impressive
editors.

Regards,
Jim Wilson
(Cheap spam protection in place: remove the 'SpamThis'
from my email ID to reply to me personally)

Mark A. Odell

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On Tue, 09 May 2000 14:37:21 -0500, Qfan <Qf...@dontbother.spaming.me> wrote:

>Unfortunately I am doing more and more development on Unix systems
>(and loving every minute of it). I need cross platform capability and
>I believe CodeWright is Windows only.

Indeed it is, you have my sympathies. But, for very good multi
platform there is Visual SlickEdit at http://www.slickedit.com/unix.htm

- Mark


Qfan

unread,
May 9, 2000, 3:00:00 AM5/9/00
to


I have checked out Zeus, and am currently looking at SlickEdit. I even
used SlickEdit on QNX years ago and thougt it was very good. Part of
my problem is that I don't just want the basic features I am
interested in a test bed for radically new text editor/development
features that I have swirling around in my head. Open sourcing it is
also very attractive to me as it seems if the license were open enough
companies like Borland, Cygnus/RedHat or others might be interested
in embedding it into their commercial products and possibly devoting
some resources to it.

Cross Platform capability is also a requirement for me and that alone
excludes numerous editor/environments.


Paul Jackson

unread,
May 9, 2000, 3:00:00 AM5/9/00
to

When I go inside editors to hack on them to add some
little special feature I want (usually something I
liked from some other editor in my past), I find that
there is poor separation of display, editing and
buffer handling, and poor documentation of the key
editing data structures.

I'd like to see a couple of things in my dream editor
(well, 4 or 5 or ...):

1) Not just undo/redo, but multiway undo/redo support in
the core engine -- like multiple parallel branches of
development in RCS or SCCS, only on the keystroke
(or "keyphrase") editing level.

2) A public file format, documented, for the editor
buffer, so that other tools, knowing only that format,
could be written if desired, that could access a live
editing session (both read and write). Separate out
the "drivers" that read and write external files, so
that someone else can easily add the ability to do
things like edit via ftp or edit piped streams or
whatever.

3) Clean buffer core engine, that allows for multiple
concurrent readers and writers on the same editor
buffer file. To support the multiway undo/redo,
it would have to support cheap insertions and deletions
within the middle of a large file. Perhaps the
SCCS s.file format, with embedded change lines,
is the best model I know for how the next layer (4)
might want to represent changes here. Or perhaps the
sort of "journaling file system" structures one sees
in XFS (SGI's file system) or Reiserfs, tuned more
to bytes than blocks.

4) On top of that, a clean, likely C language (though
C++ by someone who has done enough C++ to know what
not to use when would be ok too) core editing library,
with a well-documented API displaying such basic
edit ops as needed to code all manner of buffer
display, seaching and modifying. This editing core
would use the underlying buffer core in (3) to
keep track of all past variants of the file, in
support of the undo + multiple redo feature.

5) Embed this C-like editing API into a Python module.

6) On top of your choice of (4) or (5) various user
interfaces, including command line stream editing
(sed), scrolling tty line editing (ed), termcap
style gui editing (vi), basic X11 gui editing
(like Rob Pike's Sam, which has both command line
and X11 style editing, in two related windows, on
the same buffer at the same time), and various
good looking gui toolkit interfaces.

Some of these user interfaces, especially in their
earlier incarnations, would likely not expose all
of the horse power of the core editing engine, such
as in particular the multiway undo/redo.

7) Various other tools, resembling diff, list, tail -f,
that understood the file buffer format (see (2)),
and could display and manipulate the many versions
of a file (each few keystrokes creates another version)
present there.

All open source code, of course, so that various folks
could contribute easily to various parts. I am partial
to the SCCS-like low level file buffer code, myself.
Perhaps SourceForge would be a good place to host this.

Hmmm ... this could keep someone busy for several days ')

[My favorite editors so far: ed, Rick Davis' zip (SGI),
Rob Pike's sam, Brief. The editor I've spent the most
time hating: Emacs - the Swiss Army Chainsaw.]
--

=======================================================================
I won't rest till it's the best ... Software Production Engineer
Paul Jackson (p...@sgi.com; p...@usa.net) 3x1373 http://sam.engr.sgi.com/pj

CBates

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
I'll throw in a plug for SlickEdit. After using CodeWright from v. 2
through the current version (6.0c), I've found SlickEdit to be head and
shoulders above CodeWright in some important areas. (Important to me,
anyway. Past performance can not guarantee future results, etc.) That
SlickEdit is multi-platform is a bonus; it's punitive pricing is not...

CBates

CBates

unread,
May 9, 2000, 3:00:00 AM5/9/00
to

The Microsoft Editor for Programmers (MEP) started down this road. Every
time you saved a buffer, it saved a copy (or maybe just the diffs, I don't
remember). Unfortunately, it was rather painful getting a previous state of
a file out of the archive, involving a command line utility to extract a
snapshot to a new file, then you have to compare that snapshot with the
current, if that's not what you wanted go get another snapshot, compare,
etc.

It seemed to me the next step would be to somehow browse all the changes you
made in an editing session (or all the snapshots you wanted to stash in this
archive), and allow you to "undo" by "merging" the current file with any or
all of your previous edits on that file. An X-way merge, so to speak.

<schnip>


> 3) Clean buffer core engine, that allows for multiple
> concurrent readers and writers on the same editor

> buffer file. To support the multiway undo/redo...


> Or perhaps the
> sort of "journaling file system" structures one sees
> in XFS (SGI's file system) or Reiserfs, tuned more
> to bytes than blocks.

<and again>


Mark A. Odell

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On Tue, 9 May 2000 17:45:21 -0400, "CBates" <cb...@accessvt.com> wrote:

>I'll throw in a plug for SlickEdit. After using CodeWright from v. 2
>through the current version (6.0c), I've found SlickEdit to be head and
>shoulders above CodeWright in some important areas. (Important to me,
>anyway. Past performance can not guarantee future results, etc.) That
>SlickEdit is multi-platform is a bonus; it's punitive pricing is not...

What's it cost? CodeWrigth is $299, as a professional that doesn't bother
me but if it was cheaper I wouldn't mind. One thing that CW v6.0 does
out of the box is a live, hot parsing of functions, macros, and etc.
if you type

void someFunc(char abc, int xyx)
{
}

as soon as the parsing thread gets some CPU time the function shows
up in the project windows. Very nice when you have 10 functions in
a file. It will list them in order or sorted alphabetically too.

This is the feature I like most about CodeWright. Especially because
it does it out of the box and I'm very lazy.

- Mark

Qfan

unread,
May 9, 2000, 3:00:00 AM5/9/00
to
On 9 May 2000 21:12:58 GMT, p...@sgi.com (Paul Jackson) wrote:

>
>When I go inside editors to hack on them to add some
>little special feature I want (usually something I
>liked from some other editor in my past), I find that
>there is poor separation of display, editing and
>buffer handling, and poor documentation of the key
>editing data structures.
>
>I'd like to see a couple of things in my dream editor
>(well, 4 or 5 or ...):
>
>1) Not just undo/redo, but multiway undo/redo support in
> the core engine -- like multiple parallel branches of
> development in RCS or SCCS, only on the keystroke
> (or "keyphrase") editing level.
>
>2) A public file format, documented, for the editor
> buffer, so that other tools, knowing only that format,
> could be written if desired, that could access a live
> editing session (both read and write). Separate out
> the "drivers" that read and write external files, so
> that someone else can easily add the ability to do
> things like edit via ftp or edit piped streams or
> whatever.
>

>3) Clean buffer core engine, that allows for multiple
> concurrent readers and writers on the same editor

> buffer file. To support the multiway undo/redo,
> it would have to support cheap insertions and deletions
> within the middle of a large file. Perhaps the
> SCCS s.file format, with embedded change lines,
> is the best model I know for how the next layer (4)

> might want to represent changes here. Or perhaps the


> sort of "journaling file system" structures one sees
> in XFS (SGI's file system) or Reiserfs, tuned more
> to bytes than blocks.
>

It's erie how close numbers 2 & 3 is to my thinking.

I am working on a core buffer engine right now that should support
arbitrary readers and writers in a very efficient manner. These
readers and writers could be separate threads reading and writing to
the buffer concurrently. The file/data would be encapsulated into a
single logical buffer. Each buffer as is able to handle up to 2^32
characters per line and 2^32 lines per buffer. The data making up the
file is currently broken up into arbitrarily sized blocks of data (I
am experimenting with 4K chunks), all linked together using doubly
linked lists and balanced binary tree buckets providing entry points
into the buffer list at bucket sized increments (I'm currently
experementing with 250 lines per bucket). At most inserting a single
character whould require shifting 1/2 of a full block to allow room
for the data to be inserted. Simple tests show that my 400Mhz PII
computer can shift 4K bytes of data about 16,000 times a second. For
now this seems reasonable for insertions as this is a worst case
scenario. When files are read in, the data is packed into the blocks
and then are slightly fragmented as editing takes place. It seems
possible that a low priority thread could be run that could repack
these blocks if they became fragmented or memory usage was getting too
large.

Each block of data is structured with the data starting at the
beginning of the buffer and growing downward. At the other end of the
block are sequence headers growing from the bottom of the buffer
toward the top. These sequence headers are responsible for keeping
track of each contigous line of data. Depending on the line length an
arbitrary number of sequences fit into each block. If a line won't fit
into a single block it is continued in the next block. Current tests
show most of my source files are an average of about 20-30 chars per
line, which means that each block currently contains between 100-200
lines. The engine also allows abitrary links to any line/col in the
buffer. All links are maintained as offsets into the current block
only so that when text is inserted, the engine only has to adjust
links that exist in that block. Links currently have a type, unique
Id, position and an arbitrary amount of data attached to it. I'm
planning on implementing cursors using specially typed links which
should allow easy implementation of multiple entities which can be
editing/reading the same buffer at the same time.

Once the data management routines were finished I figured a person
could implement undo facitilites using additional internal buffers
which could contain your multiple parallel mods.

I'll have to do some research into the SCCS file format as I don't
know anything about it. It is intriguing though.

Your numbers 4-7 are very similar to my thoughts on how to extend a
beast such as this.

Qfan

unread,
May 10, 2000, 3:00:00 AM5/10/00
to
On 11 May 2000 01:50:16 GMT, p...@sgi.com (Paul Jackson) wrote:

>
>Interesting ...


>
>|> Each buffer as is able to handle up to 2^32 characters per line
>

>Consider thinking of "lines" not as a low level concept
>affecting your basic buffer structure, but as a higher
>level structure, visible at the editing API, where the
>"line" separator might be any regular expression (or
>other programmably function). At the low level file
>buffering level, think of it is an editable char stream
>(including perhaps 2 to 4 byte wchars?).
>
>Look at Pike's editor Sam for ideas on how to view editing as
>applying to a byte stream, not an array of lines.
>
>I might want to edit a mailbox using the '\n\nFrom '
>pattern (the message separator) instead of the newline
>as the basic record separator.
>
>Let for example, a table mapping line numbers to entry points
>be a higher level optional side table, not part of the basic
>structure.

I used the word line when I perhaps should have used the word
'record'.That's exactly how I was planning on doing it. Using some
regex/function or other user defined equivalent to determine just what
a record is. The basic idea is to have an input stream fed to the
buffer by some arbitrary third party and the regex/function would
allow the buffer to identify and link the start and ends of record
sequences as the data was read in.

I wasn't planning on actually distinguishing between lines or
anything, I just want the subsystem to be able to define links to
begining and end of record sequences so that fast random access is
possible.

I just read and printed out the Sam document this morning. I will be
studying it throughly as I move forward (a lot of good ideas in
there).

>
>|> Once the data management routines were finished I figured a person
>|> could implement undo facitilites using additional internal buffers
>|> which could contain your multiple parallel mods.
>

>Consider moving the multiway undo/redo lower (making them,
>not lines, the basic structure!). For example, an edit that
>changes a char 'x' to two chars 'ab' doesn't shift the block
>tail up 1, and overwrite the 'x' and subsequent newly opened up
>hole with 'ab', but rather such an edit inserts a record by the
>'x', stating that in version N-1, it was an 'x', but in version
>N, it's an 'ab'. This idea is from the SCCS s.file format.
>
>Notice how I've move 'lines' up in the hierarchy of abstractions,
>and versions down.
>

Doing something like that would severly break up the continuity of the
buffer data which would seem to have two adverse side affects. The
first being excessive fragmentation which would make searches and
pattern matching much slower. The other making it impossible to
calculate the location of end of the record. Am I correct?

How would you identify which revision the edits would be labled as?
Would the user say" I'm making edits for revision 'bugfixes'. " and
then any edits after that point would be considerd that version until
it was changed?

Could this also be managed with hyperlinks where a link could be
created, attach some data to it which identifies the revision
information and then point the link to the changed location in the
buffer? I'm not sure if this makes sense or not. With my current idea
of hyperlinks, the links are not actually part of the document, just
external data blobs which point to a specific position in the
document and their positions are maintained by the text engine when
insertions and deletions occur above them.

I'm going to have to study SCCS some more. Revision/Undo/Redo will be
coming up real soon now and if I'm going to invest the effort into
working on a serious text editing engine it had better break new
ground in its implementation.

Can you expand on this some more? This seems like too good of an idea
to leave out.

>Consider reading in files in a separate thread, so that the
>editor can start up quickly on huge files.

I will definately be doing that. I'll probably be eventually starting
the user interface by itself, once it is up and the screen is
displayed, kick start the input thread mechanism. I want instantaneous
startup. Instant gratification for me all the way! :)

I'll probably use pipes or sockets or something like that to hook the
user interface with the background service. Maybe I will run the
backend as a service and leave it running internally all the time who
knows.

>Let some (even
>all) of the bytes in the edit buffer be represented not by the
>actual copy of the bytes, but possibly also by references to
>other bytes, such as "the next 12,345 bytes of this file are the
>first 12,345 bytes of this other named file". The idea for this
>comes from Mark Nudelman's less, which goes as works hard to
>allow access to partially read files, while still reading them.
>(Of course, doing so for editing, not just browsing, is not
>so easy ...).

Are you talking about references to blocks in other memory based
buffers here? If you always read the data behind the window first,
during startup, it will appear to be virtually instantaneous. I did
this on a list box implementation once where row heights were variable
sized. In this case, I calculated the height information behind the
window first (so I could display it) and then sicked a worker thread
on the rest to let it handle the remaining record calculations in the
background. If the window was scrolled into an area that hadn't been
calculated it redirected the worker thread to do that block next while
it waited. Once the thread completed that window region everything
continued as normal. I do anticipate doing something like this. I sure
as hell wouldn't wait for an editor to read in a 100 megs of shit
over a slow connection before I saw my first page of data :)

There is another aspect of this that has been perplexing me lately,
and that is the possibility of using parts of the original file as the
cache. I intend to be able to specify X number of bytes as the maximum
amount of memory to be used for each buffer and if there isn't enough
memory to read the entire file only read a portion of it and allow
viewing/editing of that portion. If another portion is needed some of
it would have to be swapped out so that the other parts could be
accessed. Of course when this happens the user just signed away
his/her rights to high performance global searches and such.

Paul Jackson

unread,
May 11, 2000, 3:00:00 AM5/11/00
to

Interesting ...

|> Each buffer as is able to handle up to 2^32 characters per line

Consider thinking of "lines" not as a low level concept


affecting your basic buffer structure, but as a higher
level structure, visible at the editing API, where the
"line" separator might be any regular expression (or
other programmably function). At the low level file
buffering level, think of it is an editable char stream
(including perhaps 2 to 4 byte wchars?).

Look at Pike's editor Sam for ideas on how to view editing as
applying to a byte stream, not an array of lines.

I might want to edit a mailbox using the '\n\nFrom '
pattern (the message separator) instead of the newline
as the basic record separator.

Let for example, a table mapping line numbers to entry points
be a higher level optional side table, not part of the basic
structure.

|> Once the data management routines were finished I figured a person


|> could implement undo facitilites using additional internal buffers
|> which could contain your multiple parallel mods.

Consider moving the multiway undo/redo lower (making them,


not lines, the basic structure!). For example, an edit that
changes a char 'x' to two chars 'ab' doesn't shift the block
tail up 1, and overwrite the 'x' and subsequent newly opened up
hole with 'ab', but rather such an edit inserts a record by the
'x', stating that in version N-1, it was an 'x', but in version
N, it's an 'ab'. This idea is from the SCCS s.file format.

Notice how I've move 'lines' up in the hierarchy of abstractions,
and versions down.

|> When files are read in, the data is packed into the blocks

Consider reading in files in a separate thread, so that the
editor can start up quickly on huge files. Let some (even


all) of the bytes in the edit buffer be represented not by the
actual copy of the bytes, but possibly also by references to
other bytes, such as "the next 12,345 bytes of this file are the
first 12,345 bytes of this other named file". The idea for this
comes from Mark Nudelman's less, which goes as works hard to
allow access to partially read files, while still reading them.
(Of course, doing so for editing, not just browsing, is not
so easy ...).

--

Joseph Van Valen

unread,
May 11, 2000, 3:00:00 AM5/11/00
to
"Mark A. Odell" <ma...@embeddedfw.com> wrote in message
news:pv1hhskv3rergdps7...@4ax.com...

> On Tue, 9 May 2000 17:45:21 -0400, "CBates" <cb...@accessvt.com> wrote:
... snip ...

>
> What's it cost? CodeWrigth is $299, as a professional that doesn't bother
> me but if it was cheaper I wouldn't mind.

I believe that Visual Slick is in the $300 price range as well.

> One thing that CW v6.0 does out of the box is a live, hot parsing of
> functions, macros, and etc.
> if you type
>
> void someFunc(char abc, int xyx)
> {
> }
>
> as soon as the parsing thread gets some CPU time the function shows
> up in the project windows. Very nice when you have 10 functions in
> a file. It will list them in order or sorted alphabetically too.
>
> This is the feature I like most about CodeWright. Especially because
> it does it out of the box and I'm very lazy.
>
> - Mark

Visual Slick does this also for C(++), Java, and a host of other languages
"out of the box".

Codewright is a little better about adding support for this feature to new
languages that are not covered "out of the box" (like Python, etc.) All you
need is a regular expression to locate the function names. With Visual
Slick, a macro or extension is required. Visual SLick also doesn't have a
snippet window, but it does have an alias feature similar to Codewright's
template features.

Lately though, SlickEdit has been outshining Codewright in quite a few areas
(its stronger intellisense features, java and javadoc support, faster
searching, and smoother large file handling to name a few). With Premia's
recent product diversification and with it being bought out by Starbase, I
wonder if it can ever catch up.

I use both Codewright 6.0c and Visual Slick 5.0b. Up to now, Codewright was
my primary editor, but I finding it harder to have it retain that status.

Have you looked at Visual SLick? If so, what are you seeing in Codewright
that I am missing?

Joseph Van Valen

Paul Jackson

unread,
May 11, 2000, 3:00:00 AM5/11/00
to

|> I believe that Visual Slick is in the $300 price range as well.

Unfortunately, that's per supported Operating System.
So for someone like myself who desires support on
say 3 O.S.'s (Irix, Linux, Windows), this gets expensive.

Steve Kirkendall

unread,
May 11, 2000, 3:00:00 AM5/11/00
to
Qfan wrote:
>
> On 11 May 2000 01:50:16 GMT, p...@sgi.com (Paul Jackson) wrote:
> >Consider thinking of "lines" not as a low level concept
> >affecting your basic buffer structure, but as a higher
> >level structure, visible at the editing API, where the
> >"line" separator might be any regular expression (or
> >other programmably function). At the low level file
> >buffering level, think of it is an editable char stream
> >(including perhaps 2 to 4 byte wchars?).
>
> I used the word line when I perhaps should have used the word
> 'record'.That's exactly how I was planning on doing it. Using some
> regex/function or other user defined equivalent to determine just what
> a record is. The basic idea is to have an input stream fed to the
> buffer by some arbitrary third party and the regex/function would
> allow the buffer to identify and link the start and ends of record
> sequences as the data was read in.
>
> I wasn't planning on actually distinguishing between lines or
> anything, I just want the subsystem to be able to define links to
> begining and end of record sequences so that fast random access is
> possible.

It would be nice if you could change the definition of "line" without
rereading the buffer, though. Elvis' low-level editing engine treats
edit buffers *MOSTLY* as a string of chars. Different display modes
may chop that long string into lines however they like. The normal
display mode chops after each newline, the hex display mode chops it
into 16-byte chunks, the HTML display mode chops it up as it processes
tags. You can switch between display modes instantly.

This makes each display mode responsible for defining what a "line" is.
Each display mode has a move() function which implements line-relative
cursor motions.

I said "mostly" character-oriented, because the low-level mechanism
does offer some support for line searches. This seemed important because
ex commands are strongly oriented toward newline-terminated lines,
regardless of the display mode (or even during initialization when there
is no display mode because the first window hasn't been created yet).
But if you're starting a new editor, then it probably isn't worth the
effort.

> >Consider moving the multiway undo/redo lower (making them,
> >not lines, the basic structure!). For example, an edit that
> >changes a char 'x' to two chars 'ab' doesn't shift the block
> >tail up 1, and overwrite the 'x' and subsequent newly opened up
> >hole with 'ab', but rather such an edit inserts a record by the
> >'x', stating that in version N-1, it was an 'x', but in version
> >N, it's an 'ab'. This idea is from the SCCS s.file format.
>

> Doing something like that would severly break up the continuity of the
> buffer data which would seem to have two adverse side affects. The
> first being excessive fragmentation which would make searches and
> pattern matching much slower. The other making it impossible to
> calculate the location of end of the record. Am I correct?

These logs are always relative to a base version of the file. It is
usually most convenient to make the base version be the CURRENT version,
not the oldest. If you use blocks or a buffer gap to store the current
version of the buffer, in addition to the change log, then you get the
best of both worlds. So you're both correct.

Actually, this brings up an interesting point: This type of log, in
which each entry contains enough information to either undo a change
or redo it, is called an "idempotent" log. And it is critical in the
design of databases. Since your edit buffers will be used by multiple
users almost as though it was a table of one-character records (or longer
records if you ignore that suggestion), you might benefit from adding
a few more traditional database features:

* Idempotent log. This is useful not only for undo/redo, but also for
crash recovery and aborting transactions.

* Transactions. This would ensure that each committed version of the
file is consistent (in the sense that other users won't see your
partially-completed changes). If multiple users are editing the same
buffer, this could be important.

The definition of "transaction" could use some refinement. Each time
you requested a different version (or branch?) of the buffer, the
previous transaction would end and a new one would begin. Or maybe
you would want finer-grained changes, such as vi's "everything you do
in input mode counts as a single change" model.

* Block structure. Many databases use fixed-size blocks to store a
variable number of records. Sometimes they'll add some extra information
at the end of the block, if there's room, describing some significant
features of the records in the block, such as their starting offsets.
You might consider storing offsets to the starts of lines that way.

I'm not sure how you could add branching changes. I suppose you could add
a branch identifier to the change log, but that's only part of the challenge.
You must then come up with some way to allow different people to support
different branches of the same file. And what happens if somebody edits
text which is common to both branches?

Elvis doesn't work that way, though. I wanted an easy way to support vi's
line-undo command (shift-U), so I just designed the buffers in such a way
that they could be easily cloned, by copying just three blocks regardless
of how large the buffer is. Elvis creates a clone whenever you make the
first change on a given line, and also creates clones for a configurable
number of undo steps. Identifying the exact change between versions is
difficult, but I don't need to know exactly what changed so that's okay.

Fun stuff!

Qfan

unread,
May 12, 2000, 3:00:00 AM5/12/00
to
Hey everyone thanks for the comments. I am getting a lot of good
ideas.

I think that I've decided to put to put some real effort into this
editor backend project. I've been interested in writing an editor for
quite some time now and I guess now is as good of a time as any (I
have some time over the next 6 months that I don't have to spend
working on work type projects).

As I mentioned above, If I'm going to do this I want it to combine the
best aspects of the other editors out there and leave the cruft if
possible. It would be extremely helpful to have some peer review from
other people who are experienced in these areas. Especially now, in
the early phases of the project, when I'm trying to figure out the
best approach is to handling the low level structures and what not.

What is the best method of starting a project like this? I want it to
be open source but I don't expect anyone to want to help until they
see something that works. It would probably be easiest for me to work
on the base code by myself until I get the basic document management
code working and then officially release it.

Assuming the back end part was successful in meeting it's design
goals, is anyone out there interested in working on this project at
some point? Any one want to volunteer to be on a peer review board
while I'm fleshing out the back end? Would a LGPL'd editor back end be
of any use to anyone else besides myself?


Definition of back end:

The back end to me would initially be just the core document
management code for a single document buffer. It should encompass
methods/functionality for the following:

* Retrieving and storing the document
* Fast random access to any position in the document
* Fast insertions/modifications to the document, at any point,
regardless of file size.
* Allow multiple threads to read and write to the document
concurrently. This implies either some sort of transaction management
or locking mechanism which allows free form editing of the document by
multiple enties at the same time.
* Implement a comprehensive undo/redo/versioning system for the
buffer.
* Methods of allowing an external entity to map a view into the
document and receive asynchronous notifications and updates of events
thate are pertinent to the parts of the document that the view is
interested in.


Once the back end is complete, steps could be taken to provide a
higher level front end which would manage multiple buffers, views and
users, provide clean streaming type protocol access to these buffers,
allow loadable agents which could access the buffers being edited,
etc...


And finally, one or more user interfaces could be built so people
could get some work done :)


Here are my priorities for the back end, listed in order of importance
to me:

1 - Open source (probably LGPL back end library)
2 - Robust implementation
3 - Cross platform
4 - Support 16 bit Unicode as well as 8 bit ASCII.
5 - VERY FAST!
6 - Multi-threaded and multiuser buffer management
7 - Clean separation of back end from views of the document from the
controlling entities ala MVC.
8 - Able to handle virtually unlimited (larger than available memory)
file sizes. Performance should still be excellent for extremely large
documents.
9 - Clean efficient methods of hooking in new functionality to the
backend so extensions are easy to implement.


I'd like your comments on the above. If you think it is a stupid idea
or maybe I'm not thinking of something important or whatever. Please
let me know.


I started this thread with a news account that I use for the
occasional download/upload of mp3s. which doesn't list my email
address. If you'd like to email me, feel free to do so.


Name: Brian Semrad
Email: bse...@adsoft.netX

PS: remove the X to email me (damn spam bots)

Qfan

unread,
May 12, 2000, 3:00:00 AM5/12/00
to
I meant to post another reply to this thread but it got posted as a
new msg. See my "Comments Anyone?" post below for a continuation of
this topic.

Brian Semrad/Qfan

unread,
May 12, 2000, 3:00:00 AM5/12/00
to
On 13 May 2000 01:07:53 GMT, p...@sgi.com (Paul Jackson) wrote:

>We should reread our DBMS text before proceeding. Jim Gray's
>"Transaction Processing" is certainly my favorite.

I ordered that book just now so I can bone up on some of the ideas you
guys are throwing around. That book costs more than most of the
already high priced computer books that I buy:) Do you have any more
recommendations of other books which might be of value for this
particular project?

>And since
>the basic stuff of our dbms is a byte stream, not a collection
>of structured data, we will have to intuitively grok what we
>read and transform it to our domain.
>
>Then on top of the char-stream-dbms, we build an editor backend
>that supports a high powered C API of suitable 'editor-like'
>ops, such as mark, display, search, replace, or whatever.
>
>Then on top of that we build various frontends (stream, tty,
>termcap, x11, gui toolkits, special purpose utilities, ...)
>

That all sounds find and dandy but is this transactional system going
to cause this beast to act like a batch mode MSSQL editor? <grin> Not
knowing much about databases and transactions (hence the above
purchase) I'm wondering what the performance penalties of something
like this might be. To play devil's advocate here, is all this really
necessary? If you are recording these transactions at a character (or
single sequence of characters) level it's easy to tell if your
transaction is going to fail or not (the character you are getting
ready to modify doesn't exist for example). In this case there is no
backing out to do since you didn't even start making the changes until
all of the change areas are locked down and then you know your
transaction will succeed right?

I'm just now getting a picture of two global search and replace
operations occuring at the same time on a file and some of this
heavier weight atomicity etc... stuff is starting to make some sense.
These transactions allow multiple entities to modify multiple portions
of the database as an all or nothing deal. Am I going in the right
direction with this? Bear with me please, transaction processing and
atabases are new ground for me

I'm not sure just how an editor should act when two people are editing
it at the same time. Does each user see multiple cursors dancing
around and each character typed gets inserted immediately or does each
user make a series of changes and then commit them by pressing a key
(hoping that they don't conflict with another operation). Just what
should happen when one user is cutting a large block of text from the
buffer and another has his cursor in the middle of the text to be cut
and they both hit delete at the same time? would this just cause a
quiet failure for the unfortunate second person or what? I'm getting a
picture of a transaction for each keystroke for each user, with some
of them being single character inserts and others being higher level
commands causing a whole slew of changes (such as a global
substitution).

Paul Jackson

unread,
May 13, 2000, 3:00:00 AM5/13/00
to

|> >Let for example, a table mapping line numbers to entry points
|> >be a higher level optional side table, not part of the basic
|> >structure.
|>
|> I used the word line when I perhaps should have used the word
|> 'record'.That's exactly how I was planning on doing it. Using some

Someone else (Steve Kirkendall) already added a good response
here, suggesting that 'record' (be that 'line' or whatnot)
should not be too low in the layers of abstraction, so that,
for instance, one could dynamically switch between different
record boundaries, instantly.

If 'chunk boundaries' (yet another term for line or record) are
a higher level abstraction, then one could have multiple chunk
definitions (by line, paragraph or chapter, in classic large
documents, for example), each with a table of entry points
(Chapter 17 starts at this offset, ...).

Pushed to the extreme (seems to be the direction we are going)
these lines/records/chunks become a hierarchical outline, with
such operators as expand, collapse, move, copy, ... GrandView
(my favorite outliner of all time, and about the only remaining
DOS program that I keep a working copy of) here we come.

But I could for example edit and append some lines to a file,
_without_ ever waiting long enough for the existing lines in the
file to be parsed (for the existing '\n' chars to be scanned).
Only when I issued a command that required knowing a line
number would I have to wait for the existing file contents to
finish being scanned for line numbers.

Even further, a large file could be edited near its beginning
or end without the bulk of the file _even_ being read until I
asked to write out the new file contents. The temporary editor
buffer file would be a short statement that bytes 1..1000000 come
from this existing disk file, and there is 1 edit changing byte
22 from 'x' to 'ab' (which would result in the editor buffer
file now stating that bytes 1..21 come from the pre-existing
disk file, followed by either 1 byte from that file or (after
the 1st edit) 2 bytes 'ab', followed by another 999977 bytes
available in the pre-existing file as bytes 23..1000000.

The temp editor buffer file need never hold any unchanged bytes
from the original data stream, unless the editor decides to copy
them there as a performance optimization or because it determines
that the input data is only available as a stream, not as some
file you can seek and reread.

|> Doing something like that would severly break up the continuity of the
|> buffer data which would seem to have two adverse side affects. The
|> first being excessive fragmentation which would make searches and
|> pattern matching much slower. The other making it impossible to
|> calculate the location of end of the record. Am I correct?

The breaks need only occur where edits have happened. And so
long as the editing and regex code can use a macro that access
characters inline (except when crossing a break), this seems
like it has a good chance of being suitably fast. It would
mean that a global search, if done after a global replace that
touched a huge file in many places, would be slower than a
global search done before such mass changes, but I doubt that
this must needs be bad enough to be fatal.

And I don't grok your concern about finding the end of the record.

|> How would you identify which revision the edits would be labled as?

Certainly not by asking the user. These revisions are on a
'keyphrase', or what Steve describes as:

vi's "everything you do in input mode

One can at best record the branch structure, when, how much and
what was changed.

|> I'll probably use pipes or sockets or something like that to hook the
|> user interface with the background service. Maybe I will run the
|> backend as a service and leave it running internally all the time who
|> knows.

I'd hope that the primary API of the backend (and its internals)
didn't know whether the frontend code was directly linked
or working via a socket to some simple wrappers linked with
the backend.

The primary API of the backend should be a C-like library
interface, and corresponding Python module, ideally, with any
multithreading (such as for background tasks to scan or pack)
hidden from normal view.

|> There is another aspect of this that has been perplexing me lately,
|> and that is the possibility of using parts of the original file as the
|> cache.

Why just cache - why not taking all unmodified bytes directly
from the primary source (higher level code might intentionally
make that primary source a temp disk copy, as when editing from
a pipe, editing remote ftp files, editing some file that seems
to read dog-slow [perhaps its via nfs to the Mir space station],
the user asked for it ...).

Paul Jackson

unread,
May 13, 2000, 3:00:00 AM5/13/00
to

|> Actually, this brings up an interesting point: This type of
|> log, in which each entry contains enough information to either
|> undo a change or redo it, is called an "idempotent" log.
|> And it is critical in the design of databases. Since your
|> edit buffers will be used by multiple users almost as though
|> it was a table of one-character records (or longer records if
|> you ignore that suggestion), you might benefit from adding a
|> few more traditional database features:

Ah yes - ACID: Atomic, Concurrent, Isolated and Durable.
Good note - thanks.

We're looking to build a 'dbms' suitable for making overlapping
sequences of changes to a byte stream efficiently.

We should reread our DBMS text before proceeding. Jim Gray's

"Transaction Processing" is certainly my favorite. And since


the basic stuff of our dbms is a byte stream, not a collection
of structured data, we will have to intuitively grok what we
read and transform it to our domain.

Then on top of the char-stream-dbms, we build an editor backend
that supports a high powered C API of suitable 'editor-like'
ops, such as mark, display, search, replace, or whatever.

Then on top of that we build various frontends (stream, tty,
termcap, x11, gui toolkits, special purpose utilities, ...)

|> And what happens if somebody edits text which is common to
|> both branches?

Just like source code control systems, you can't change what is,
only add variants.

|> Elvis doesn't work that way, though.

Ah - so you're the author of Elvis. Good to meet you here.

Paul Jackson

unread,
May 13, 2000, 3:00:00 AM5/13/00
to

Hello, Brian.

I'm a little concerned that your 'back end' is too much stuff in
one subsystem. The challenges for the editing engine and the
challenges for the ACID byte-stream dbms are both significant,
and should not be lumped.

Perhaps it would make sense to co-operate openly on the API's
for these (1 or 2) subsystems, working up draft documentation
for each, before (or in parallel with) actual coding.

The byte-stream dbms (designed and written in such a manner that
it's not obvious that it's primary purpose was to support an
editor) seems like a sufficiently challenging leap from what I
already know that likely some prototype proofs of concept code
would be required as part of discovering a suitable API for it.

Given that foundation byte-stream dbms, the next layer, the
editing engine, seems easier to get started to a minimum level
of usefulness, though the versioning interface to it will still
require some inspiration.

|> 5 - VERY FAST!

Yes, but careful that this doesn't blind us to getting
the right structures in place. I would think that much of
the apparent slowness of editors is a combination of too
much runtime startup (reading alot of Lisp code, say, not
to mention any editor in particular) and the user's choice
of UI. I've seen nothing here that would make it difficult
to keep the speed of editing typical modest size text files
dependent on anything other than the choice of GUI (whether
tty, termcap, x11 or toolkit).


|> 9 - Clean efficient methods of hooking in new functionality
|> to the backend so extensions are easy to implement.

Could you hint further at what sorts of new functionality you
are most interested in experimenting with?

Brian Semrad/Qfan

unread,
May 13, 2000, 3:00:00 AM5/13/00
to
On 13 May 2000 00:47:05 GMT, p...@sgi.com (Paul Jackson) wrote:

>Someone else (Steve Kirkendall) already added a good response
>here, suggesting that 'record' (be that 'line' or whatnot)
>should not be too low in the layers of abstraction, so that,
>for instance, one could dynamically switch between different
>record boundaries, instantly.

Yes I agree to all of the above

>Pushed to the extreme (seems to be the direction we are going)
>these lines/records/chunks become a hierarchical outline, with
>such operators as expand, collapse, move, copy,
>

>But I could for example edit and append some lines to a file,
>_without_ ever waiting long enough for the existing lines in the
>file to be parsed (for the existing '\n' chars to be scanned).
>Only when I issued a command that required knowing a line
>number would I have to wait for the existing file contents to
>finish being scanned for line numbers.

yes this reasonable

>
>Even further, a large file could be edited near its beginning
>or end without the bulk of the file _even_ being read until I
>asked to write out the new file contents. The temporary editor
>buffer file would be a short statement that bytes 1..1000000 come
>from this existing disk file, and there is 1 edit changing byte
>22 from 'x' to 'ab' (which would result in the editor buffer
>file now stating that bytes 1..21 come from the pre-existing
>disk file, followed by either 1 byte from that file or (after
>the 1st edit) 2 bytes 'ab', followed by another 999977 bytes
>available in the pre-existing file as bytes 23..1000000.
>
>The temp editor buffer file need never hold any unchanged bytes
>from the original data stream,

The key word being "unless"

> unless the editor decides to copy
>them there as a performance optimization or because it determines
>that the input data is only available as a stream, not as some
>file you can seek and reread.

.
.
.


>The breaks need only occur where edits have happened. And so
>long as the editing and regex code can use a macro that access
>characters inline (except when crossing a break), this seems
>like it has a good chance of being suitably fast. It would
>mean that a global search, if done after a global replace that
>touched a huge file in many places, would be slower than a
>global search done before such mass changes, but I doubt that
>this must needs be bad enough to be fatal.
>

All this caching and read on demand has a serious problem in my book,
and that is when I load a large file I want a screenful of data to
display instantly so that I can get started with whatever I'm going to
do. Then, it had better not stop reading just because I haven't
accessed the next page of the document yet because I can guarantee the
next thing that I will want to do after I type a few characters on the
first page is search for the function at the end of the file and I
don't want to wait if there if there is still memory available to do
additional buffering of that file. I suppose continuing the read is
only a matter of deciding when and what to read though.

>And I don't grok your concern about finding the end of the record.

My concern was in mixing buffer data with revision data in the same
sequence as this would complicate random access to buffer data when
only the start position and length of each sequence is known and some
character in the middle needed to be accessed.

Were you suggesting that the sequence be broken in that case and the
revision info placed outside or at the end of the sequence or
something?

>
>|> How would you identify which revision the edits would be labled as?
>

>Certainly not by asking the user. These revisions are on a
>'keyphrase', or what Steve describes as:
>
> vi's "everything you do in input mode
>
>One can at best record the branch structure, when, how much and
>what was changed.

Most of my experience in revision control/versioning is renaming my
project directory as projectOld and then starting with another copy of
it for the next revision. This is no joke unfortunately. I have been
researching this lately and I know the folley of my ways but this
seems to be the norm for most programmers I know.

>I'd hope that the primary API of the backend (and its internals)
>didn't know whether the frontend code was directly linked
>or working via a socket to some simple wrappers linked with
>the backend.

yes

>
>The primary API of the backend should be a C-like library
>interface, and corresponding Python module, ideally, with any
>multithreading (such as for background tasks to scan or pack)
>hidden from normal view.
>

yes

>Why just cache - why not taking all unmodified bytes directly
>from the primary source (higher level code might intentionally
>make that primary source a temp disk copy, as when editing from
>a pipe, editing remote ftp files, editing some file that seems
>to read dog-slow [perhaps its via nfs to the Mir space station],
>the user asked for it ...).

This seems ok for data that I don't want to read (or can't read) into
memory but I haven't met a byte of data that I didn't want to put in
memory ASAP.


Brian Semrad/Qfan

unread,
May 13, 2000, 3:00:00 AM5/13/00
to
On 13 May 2000 01:39:14 GMT, p...@sgi.com (Paul Jackson) wrote:

>
>Hello, Brian.
>
>I'm a little concerned that your 'back end' is too much stuff in
>one subsystem. The challenges for the editing engine and the
>challenges for the ACID byte-stream dbms are both significant,
>and should not be lumped.

In light of further thought about the transaction system and previous
comments I'm inclined to agree with you on that.

>
>Perhaps it would make sense to co-operate openly on the API's
>for these (1 or 2) subsystems, working up draft documentation
>for each, before (or in parallel with) actual coding.

Sounds good to me as this lowest level of detail is new territory to
me. I will definately need to do some research on transaction
management to complete this task.

>
>The byte-stream dbms (designed and written in such a manner that
>it's not obvious that it's primary purpose was to support an
>editor) seems like a sufficiently challenging leap from what I
>already know that likely some prototype proofs of concept code
>would be required as part of discovering a suitable API for it.

Why is making the implemenation's primary purpose an editing engine
necessarily a bad deal? Is a generic implemenation of this just
another small step from a complete ACID complient data base or
something? I'm too ignorant of this at this point to even know what
the major steps in this are. I assume that there is some fairly
complicated locking mechanisms to write. Is that basically all there
is to it? Normally I'll go for a generic implemenation every time,
even if it takes significantly longer, but I have a friend who is
helping me break that habit somewhat by always questioning if it is
really necessary. Sometimes an intentional shortcut saves lot of time
and can be rewritten later if necessary.

>
>Given that foundation byte-stream dbms, the next layer, the
>editing engine, seems easier to get started to a minimum level
>of usefulness, though the versioning interface to it will still
>require some inspiration.

I'd rather do the hardest part first. In this case it seems to be the
transaction mechanism. I doubt that the editor portion will be useable
in the near future anyway. I was hoping that I could embed the text
engine into other programs soon, even if it were a single tasking only
implementation. I dont' want to sacrifice design for a quick
implemenation though.

It seems reasonable that if we could get the transactions working, a
program comprising multiple threads could be easily written to test
out the implementation. This will also give he much needed performance
statistics on various transactions. Once I'm sure that I can write the
implementation for the transaction API I don't have a problem moving
on but I'd want to be sure I basically knew how I was going to do it
before moving too far forward in the design stage. Unless I miss my
guess, once the design of the transactions portions are complete, the
basic editing engine will be relatively easy to design and then the
rest will be downhill from there.

Previously I was thinking of using a 32 bit unsigned int for a line
number and the same for a column number to access the document but
that doesn't work in the direction this project seems to be going now.
Now I'm thinking of using a 64 bit unsigned int as an absolute
character offset for indexing into the buffer now. This would allow
for near infinte file lengths and use the same amount of memory
consumption for indexes. What do you think of this? The reason I bring
this up is that I believe that the transaction mechanisms will need to
lock various ranges of the buffer down before actually comitting
changes. Am I out in left field here. Should I just shut up until I do
some research on transaction systems?

>
>|> 5 - VERY FAST!
>
>Yes, but careful that this doesn't blind us to getting
>the right structures in place. I would think that much of
>the apparent slowness of editors is a combination of too
>much runtime startup (reading alot of Lisp code, say, not
>to mention any editor in particular) and the user's choice
>of UI. I've seen nothing here that would make it difficult
>to keep the speed of editing typical modest size text files
>dependent on anything other than the choice of GUI (whether
>tty, termcap, x11 or toolkit).
>

It would be nice if I could write a user interface for this editor in
Python and still have it work fast.

>
>|> 9 - Clean efficient methods of hooking in new functionality
>|> to the backend so extensions are easy to implement.
>
>Could you hint further at what sorts of new functionality you
>are most interested in experimenting with?

One of the things that I haven't mentioned much is some of my ideas
for hyperlinking. My thought is that I'd like to be able to attach a
link not inside the data, but to a position in the data and then
attach additional information to the link itself. I'd want the engine
to move these links around if necessary when data is inserted above
the linked position. Then from outside the engine I'd like to be able
to query all links of type X and know where they were currently
residing with respect to my specific record type. I can conceive of
many uses for marked regions in the text where the user interface
would highlight these areas graphically in some manner. They could be
used as bookmarks or breakpoints for an interactive debugger, for
example. More importantly I'd like to be able to write agents which
could parse these buffers and set these links for arbitrary purposes.
I suppose this could be done properly just above the editor primative
level, but I'm not sure. I suppose this could also be done by
inserting special text markers into the buffer and then not displaying
them but this seems like a poor way of implementing bookmarks and
breakpoints. It would also complicate the caching and file operations.

Another is possibly using some internal agents to precalculate font
and text metrics for using multiple font sizes to display text in a
buffer. I'm not sure where I'm going with this but I would definately
not preclude using this engine to write a WYSIWYG editor or page
layout tool.

Another possibility is using this engine for implementing interfaces
to other hardware devices. In place editing of hardware device
register (which may be constantly changing and can be overwritten in
real time) displays such as PLC's (Programmable Logic Controllers)
springs to mind. PLC's are used to control actual hardware devices
such as elevators, conveyors, robots, strappers etc... I write factory
automation software for a living and a multithreaded text engine would
come in extremely handy for this.

I definately want to allow for dynamically installing advanced
language parsers, used in incremental class browsers, syntax
checkers, lint tools etc... I'd like for these agents to be able to
tag/hyper link hot spots for display by the user interface without
mucking up the actual data in the buffer. I don't think I'll be
writing these tools as much as I hope to provide an attractive
environment for people who are doing research into these areas such as
the Harmonia project and others.

Upon rethinking some of the above ideas, I suppose a case could be
made that all of these might all have their place at the editor
primative level or above so low level extensions may not be relevant
anymore. A robust transaction mechanism might alleviate the need to
insert low level agents into the mix. As usual I'll know what I want
to do after it's too late to do anything about it.

Paul Jackson

unread,
May 15, 2000, 3:00:00 AM5/15/00
to
Brian wrote:
|> That all sounds find and dandy but is this transactional system going
|> to cause this beast to act like a batch mode MSSQL editor? <grin>

well, no (though I' sure you knew that, and hoped that
I knew that).

SQL is a query language for relational databases, which are
(roughly) arrays of homogeneous structures (tables of records).

We're not considering here an editor for tables of records.

We are considering an editor for mutating byte sequences.

Steve (earlier in this thread) wrote:
|> Block structure. Many databases use fixed-size blocks to
|> store a variable number of records. Sometimes they'll add some
|> extra information at the end of the block, if there's room,

We must be leary of applying too directly the structures used
by 'many databases'. Not only (as noted above) are we dealing
with mutating byte sequences, not tables of records, but also
our performance profile is different.

A mature dbms has to deal with such things as using
direct raw disk io and careful disk layout in order to
efficiently manage, say, 100 Gbytes of data with only 1
Gbyte of main memory.

An editor should be 90% percent focused on editing files
that fit entirely in main memory, and if it also happens
to reliably edit larger files, without substantially more
disk head motion than simple linear scan algorithms would
suggest, that's gravy, but it shouldn't be the primary
determinant of the low level data structures.

Someone (can't seem to find the reference) mentioned:
|> [ ... something about reading the file into memory ... ]

Don't confused 'reading into memory' with 'reading into the
current process's address space'.

Try the following experiment, on some system that hasn't
had its files in /usr/include accessed for a few hours:

$ cd /usr/include
$ time grep blahblahblah * 2>/dev/null
$ time grep blahblahblah * 2>/dev/null

I just did this (on SuSE 6.3). The first time was 0.82
seconds total. The second time was 0.03 total seconds.
The first time, I heard disk motion. Not the second time.

Obviously, the second time was approx 20x faster because
the files were 'in memory' (though not in the second grep's
memory, prior to that grep being spawned).

For speed on (the typical) file to be edited that fits in
memory, it's more important that we get the file into memory
than that it be copied into some array or structure within
the editors memory space. The high order performance bit is
disk versus memory. The editor can use an ordinary (as in
"stdio") buffering library to read the contents of the file
to be edited from the file system buffers, on demand, with
suitable (or near suitable) performance.

--

Imagine the following:

1) The editor never copies original data from the files
to be edited into the editor temp file. Rather it either
works directly from the original file to be edited, treating
it as read-only, or it works from a plain copy of that input
file (useful for slow or changing input files). But even if
it's a copy, it doesn't mutate that copy in place.
2) Basically all that's in the editor temp file is a 'script'
(as in 'sed') that if applied to the original data, will
produce the edited result. When we invoke the 'write'
file out option within the editor, we are essentially
asking for this 'script' to be applied against the input
data to produce the edited result. Also the editor temp
file has references to the original (or copy of) data
files, noting which external file bytes go where in the
mutating byte stream we are composing in the editor
(by composing the 'script' commands to generate it, not
by directly assembling it).
3) The 'script' in this editor temp file is not stored in the
order that we issued editing commands, but rather in the
order of the original data to which it applies (that is,
it's a one-pass script, just like 'sed'). The script order
resembles that of the SCCS s.file format, in the way that it
entwines multiple revisions.
4) We have a low level i/o library that lets us efficiently write
into the middle of a file -- not just overwriting, but also
extending and deleting, efficiently (not forcing the tail
of the script to be shifted over each time). That is, the
low level i/o library supports a byte stream with insert
and delete operators, in addition to the usual append at the
end and write over operators. These low level insert and
delete operators would _not_ directly correspond to any such
similarly named commands at the User Interface. Rather, they
(insert and delete) would be used to add new 'script' commands.
5) These 'script' commands would be the most basic editing
commands. Simple delete character range and insert character
string at offset. The 'insert' and 'delete' in the previous
sentence are _neither_ the UI insert/delete, nor the low
level io insert/delete. For example, if the user, at the UI,
issues a command to replace 'x' with 'ab' at current offset
1234 bytes into the file, this could turn into the 'script'
commands to (1) delete 1 byte at offset 1234, and (2) insert
the 2 bytes 'ab' at offset 1234. In turn, the low level io
'insert' would be used twice, to insert each of these two
script commands, into the editor temp file.
6) Given this, the size of the editor temp file is more of a
function of how many edit commands have been issued by the
user in that session, not so much of the size of the file to
be edited.
7) The format of the editor temp file is open and parsable by
other tools. Think XML here. Yes, XML. Steve's idempotent
log entries become additional XML phrases, inserted into the
midst of the existing XML 'script' (growing the editor temp
file in the middle, not just at the end, thanks to the low
level io library).
8) Each command added to the script creates a new 'version'
of the file being edited. Each user issued command that
creates a new variant of the text (rather than just browsing
what's there already) causes 1 to several new script commands
to be added. For example, a global search-and-replace of
'x' for 'ab', given 20 instances of 'x' scattered through
the file being edited, might cause 40 new script commands
to be added (20 deletes and 20 inserts). Adding these 40
script commands would cause 40 low level inserts. Undo,
redo and locking against parallel edits occurs on the
granularity of the higher, user-issued commands, not on the
finer granularity of the low level io inserts/deletes.
So the high level code needs 'micro-session' (per user
issued command, roughly) support from the low level code
for locking and versioning (the basis of undo/redo).
9) Undo doesn't add to the 'script' but just changes the
'current version' to something other than the most recently
created version.
10) We represent in the editor temp file not just the script
that can generate each variant of the file being edited, but
also such concepts as marks, dot (current selected text,
as in Sam) and cursor position. These meta-data (marks, dot,
and such) are versioned, just like the edit commands to
change data. That is, we keep track of the history of
selection and motion, as well as of editor commands to
change text, and can undo/redo a mixed stream of text
manipulation, motion and selection correctly.

--

The value of Jim Gray's work is that he has been in dbms work
since the early days, and understands why and how choices were
made, as dbms history unfolded, to solve the problems they
were facing. We're solving a different problem (albeit with
some similarities). We'll need different solutions. The bulk
of his work details the solutions needed in transactional dbms
land, and is not directly relevant to us. We must ask much
the same questions as arise in the earlier, motivational parts
of his book. We will get different answers.

--

It may well be that the editor you have in mind, Brian,
and the one I have in mind, are not the same editor.

If so, then that's ok and good luck to you. The world needs
a couple more editors (about as much as it needs one more
editor <grin>).

Paul Jackson

unread,
May 15, 2000, 3:00:00 AM5/15/00
to

Brian wrote:
|> All this caching and read on demand has a serious problem in my book,
|> and that is when I load a large file I want a screenful of data to
|> display instantly ...

The read-on-demand doesn't get in the way of instant display
of the first page. Runtime interpreters with slow startup,
big gui's, and insisting on reading the entirety of a large
file before displaying any of it get in the way. The read-
on-demand is actually _similar_ to your desire to read the
first page and show it, NOW, approach.

Paul Jackson

unread,
May 15, 2000, 3:00:00 AM5/15/00
to

Brian wrote:
|> The key word being "unless"

No, not as I intended it. I intended that regardless
of whether we were working from a copy of the file, or
from the original, we would treat that copy or original
as read-only. Whether we used the original or copy is
driven simply by such trade offs as:

1) we want copies of stuff that's hard or slow to get,

2) we don't want copies of stuff that's too big to fit
on our disk, or that would cause us to waste too much
time and disk i/o bandwidth making copies for little use, and

3) making copies provides the desired (because users are
used to it) semantics that changes made to an underlying
file after an editor session has begun are not noticed in
that editor session (unless explicitly requested, as with
a user's 're-read' command, or unless actually caused by
that editor session, such as writing the file back out.)

Paul Jackson

unread,
May 15, 2000, 3:00:00 AM5/15/00
to

Brian wrote:
|> One of the things that I haven't mentioned much is some of
|> my ideas for hyperlinking. ... marked regions ... interactive
|> debugger ... agents ... for arbitrary purposes ... font and text
|> metrics ... WYSIWYG editor or page layout tool ... interfaces
|> to other hardware devices ... In place editing of hardware
|> device register ... dynamically installing advanced language

|> parsers, used in incremental class browsers, syntax checkers,
|> lint tools etc...


This feels to me like feature overreach.

Qfan

unread,
May 16, 2000, 3:00:00 AM5/16/00
to
On 15 May 2000 19:34:15 GMT, p...@sgi.com (Paul Jackson) wrote:

>
>Brian wrote:
>|> One of the things that I haven't mentioned much is some of

>|> my ideas for hyperlinking. ... marked regions ... interactive
>|> debugger ... agents ... for arbitrary purposes ... font and text
>|> metrics ... WYSIWYG editor or page layout tool ... interfaces

>|> to other hardware devices ... In place editing of hardware
>|> device register ... dynamically installing advanced language


>|> parsers, used in incremental class browsers, syntax checkers,
>|> lint tools etc...
>
>

>This feels to me like feature overreach.

I don't actually plan on doing all of these things and certainly not
all myself. I was merely listing some possibilities. I'm of the mind
that if the back end design is done properly for a basic text editor
the same product (probably only the back end) might be useful as the
basis for all of the above without any other changes. In particular, I
mean that a basic engine for modifying text buffers of arbirtrary
sizes would be useful in many places.

Qfan

unread,
May 16, 2000, 3:00:00 AM5/16/00
to
On 15 May 2000 17:13:12 GMT, p...@sgi.com (Paul Jackson) wrote:

>The read-on-demand doesn't get in the way of instant display
>of the first page. Runtime interpreters with slow startup,
>big gui's, and insisting on reading the entirety of a large
>file before displaying any of it get in the way. The read-
>on-demand is actually _similar_ to your desire to read the
>first page and show it, NOW, approach.

Yes, you are correct.


Paul Jackson

unread,
May 16, 2000, 3:00:00 AM5/16/00
to
I (pj) had written:

|>This feels to me like feature overreach.

Brian replied:
|> ... I was merely listing some possibilities. ... a basic engine for


|> modifying text buffers of arbirtrary sizes would be useful in many places.

Yes - that makes sense.

Qfan

unread,
May 16, 2000, 3:00:00 AM5/16/00
to
On 15 May 2000 17:05:59 GMT, p...@sgi.com (Paul Jackson) wrote:

>Brian wrote:
>|> That all sounds find and dandy but is this transactional system going
>|> to cause this beast to act like a batch mode MSSQL editor? <grin>
>
>well, no (though I' sure you knew that, and hoped that
>I knew that).

Yes

>We must be leary of applying too directly the structures used
>by 'many databases'. Not only (as noted above) are we dealing
>with mutating byte sequences, not tables of records, but also
>our performance profile is different.

I agree also

>Don't confused 'reading into memory' with 'reading into the
>current process's address space'.

>disk versus memory. The editor can use an ordinary (as in


>"stdio") buffering library to read the contents of the file
>to be edited from the file system buffers, on demand, with
>suitable (or near suitable) performance.

Yes this is true as the file system caching has a huge impact on disk
performance from reading the files.

That sounds like an excellent way of dealing with changes and
versioning but I have a couple of useage habits and goals which might
make storing the edited files only as a set of changes awkward.

First of all I tend to keep my primary editing environment up for
weeks on end without closing it and do a significant amount
modifications to some of the files opened in it. I might do the
equivalent of spending a whole day working with a couple of files
only, just to end up modifying rewriting them completely several
times over. This frequently occurs when I'm fleshing out some new
designs. I would think that a backend which maintains my edits only
with change information would eventually become ineficient when so
many changes are made. I'm not sure if this is true as I don't have
any real world performance info. I suppose everytime you save the file
these fragmented blocks could be defragged, is this correct?

Also, Many times I know that I will never need revision info for a
group of files that I'm working on until I reach a milestone (such as
getting it to compile or even working). I might like to branch at
that point and then begin keeping change information after than. In my
case I might want the versioning off as much as I would have it on.

I realize that the fragmentation of a couple of majorly over-edited
files shouldn't be a huge problem but this brings me to the real
reason I'm considering doing this at all. I am looking for an editor
which can support dynamic code analyzers. It's my primary goal to
build a text editor which can house significant amounts of agents
which are incrementally examining the current changed contents of the
files I'm editing as well as ones I haven't changed. I'm looking for
as much help as I can get in quickly understanding the code that I'm
looking at/writing. Class information, useage statistics, parameter
auto completion, slicing, basically the hole kabob. I don't have all
of the knowledge to build all of these tools yet but I'm intensly
dissappointed with the level of actual help in writing code that I get
from modern development environments today. In order for tools to be
developed that can do this, they (agents) need full and efficient
access to the changed versions of the buffers while they are being
edited. The areas being edited/looked at are precisely where I need
all available help at that given moment. These agents could also make
use of a good hyperlinking system for tagging code blocks with special
information which isn't actually part of the edited text (to be used
by the user interface (to display language specific information in an
intuitive manner).

Most of these lexer type tools are extremely inefficient, require
forward and backward examination of the current buffer data, have
algoritims that require intense amounts of CPU or all of the above to
run. Add that to the fact that I want it to be incremental/dynamic in
nature. I don't need parameter or code completion information 5
seconds after I've typed it in myself. I want suggestions as soon as I
know I don't know what I need to type. These agents/lexers are one of
the primary reasons that I like the thoughts of a text engine that
doesn't fragment the buffers being edited and in fact that it could be
tuned to defragment the buffer blocks in the background as it runs.

This is why I seem to keep harping on efficiency. I need a system that
allows reasonably efficient editing while providing an extremely
efficient environment for concurrent agents to help me
write/understand code just a little better and faster.

The cool thing about all of this is that it seems to me that if I can
pull off the development of an environment that is conducive to these
agents, all of the other features seem relatively easy to build from
there.


Paul J, based on the above ideas, what are your thoughts on where I'm
going with this. Does this diverge too far from your interests?

Paul Jackson

unread,
May 16, 2000, 3:00:00 AM5/16/00
to

Excellent explanation of what you have in mind, and it's a nice
contrast to my way of working.

I tend to fire up tiny editors every few minutes, to make a few
changes, then quit, staying almost always at the shell prompt.
And I tend to use really 'dumb' editors.

If I can invent a storage format that provides the speed you
need (multiple tools accessing each byte in the current version
of the file, with very few cycles per char access cost, and
with thousands of changes made over a day or three in the same
session), and still fits my radically different working style
and goals, then we've got something.

It's head scratching time ... lets see if I can find a plausible
solution. The key design element is the format of the editor
temp buffer file. It may take me a few days, to either get
excited by a possible answer, or to become discouraged and
wander off.

scot...@my-deja.com

unread,
May 30, 2000, 3:00:00 AM5/30/00
to
Hello,

Give CRiSP editor a try (http://www.vital.com). Their personal copy
price is about $150.00.

In article <pv1hhskv3rergdps7...@4ax.com>,

> What's it cost? CodeWrigth is $299, as a professional that doesn't
bother

> me but if it was cheaper I wouldn't mind. One thing that CW v6.0 does


> out of the box is a live, hot parsing of functions, macros, and etc.
> if you type
>
> void someFunc(char abc, int xyx)
> {
> }
>
> as soon as the parsing thread gets some CPU time the function shows
> up in the project windows. Very nice when you have 10 functions in
> a file. It will list them in order or sorted alphabetically too.
>
> This is the feature I like most about CodeWright. Especially because
> it does it out of the box and I'm very lazy.
>
> - Mark
>


Sent via Deja.com http://www.deja.com/
Before you buy.

0 new messages