Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Lists vs Vectors; specifically in a text editor

65 views
Skip to first unread message

nor...@example.com

unread,
Jun 25, 2015, 12:39:23 PM6/25/15
to
I'm new to C++, and took on writing a VI-like editor as my "break the seal"
project.

After 6k lines of code, I have an editor that stumbles along, and handles most
things; it's not pretty, and I'll be rewriting it again to incorporate all the
stuff I've learned since March.

Early on in the design, I made the choice to use lists rather than vectors
to hold the working set of lines. That is, in a ten line file, there are
ten list elements, each effectively holding a std::string.

I based this decision on the run-time characteristics of the VI command
"g/^/.m0" -- tag each line and move it to the beginning of the file,
effectively reversing the order of the lines in the file.

Stroustrup's talk (mentioned in another thread somewhere in this newsgroup)
made me want to revisit the decision to use lists.

I'm wondering if in this specific domain (an editor), lists are in fact the
better choice over vectors as the holder-of-all-lines?

Thanks in advance,
-RK

--
Robert Krten

Visit me at http://www.ironkrten.com

Message has been deleted

Richard

unread,
Jun 25, 2015, 1:18:29 PM6/25/15
to
[Please do not mail me a copy of your followup]

nor...@example.com spake the secret code
<Q8Wix.1$FC...@fx23.iad> thusly:

>Stroustrup's talk (mentioned in another thread somewhere in this newsgroup)
>made me want to revisit the decision to use lists.
>
>I'm wondering if in this specific domain (an editor), lists are in fact the
>better choice over vectors as the holder-of-all-lines?

There are two kinds of forces at work here: design and performance.

The design forces revolve around the readability of your code. Is
there any benefit to comprehension in choosing one container over
another? For std::list<std::string> vs. std::vector<std::string>,
there probably isn't going to be much difference in comprehension in
choosing one over the other. This is particularly true if your code is
making use of iterators and a typedef for the container. In other
words, if you have code like:

typedef std::list<std::string> line_container_t;

void munge_lines(line_container_t::iterator begin,
line_container_t::iterator end)
{
// do stuff with lines through iterators
}

Then switching from list to vector is just a matter of changing the
typedef and possibly adjusting a few places where you directly interact
with list specific members.

The performance forces are how fast the code runs. This can only be
determined by running your code against some sort of test case and
measuring the results. The surprising thing about all these talks in
the past few years at C++ conferences over "list vs. vector" or "map
vs. vector" is that the performance analysis shows that vector does
much, much better than most people expect. This appears to be true for
both short lists and long lists, short vectors and long vectors.

The costs for traversing up the computer memory hierarchy have been
getting bigger over time. Whereas before a cache miss was undesirable
but tolerable, today it can absolutely kill your performance.
<https://en.wikipedia.org/wiki/Memory_hierarchy>
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

nor...@example.com

unread,
Jun 25, 2015, 1:35:35 PM6/25/15
to
Richard <legaliz...@mail.xmission.com> wrote:
> [Please do not mail me a copy of your followup]
>
> nor...@example.com spake the secret code
> <Q8Wix.1$FC...@fx23.iad> thusly:
>
>>Stroustrup's talk (mentioned in another thread somewhere in this newsgroup)
>>made me want to revisit the decision to use lists.
>>
>>I'm wondering if in this specific domain (an editor), lists are in fact the
>>better choice over vectors as the holder-of-all-lines?
>
> There are two kinds of forces at work here: design and performance.
>
> The design forces revolve around the readability of your code. Is
> there any benefit to comprehension in choosing one container over
> another? For std::list<std::string> vs. std::vector<std::string>,
> there probably isn't going to be much difference in comprehension in
> choosing one over the other. This is particularly true if your code is
> making use of iterators and a typedef for the container. In other
> words, if you have code like:
>
> typedef std::list<std::string> line_container_t;
>
> void munge_lines(line_container_t::iterator begin,
> line_container_t::iterator end)
> {
> // do stuff with lines through iterators
> }

Yes, this is exactly how the code is structured. The one interesting
tradeoff is that I can no longer really use "line numbers" in the editor
(when using lists) because there's no direct index into the list, whereas
with a vector it would be directly supported.

> Then switching from list to vector is just a matter of changing the
> typedef and possibly adjusting a few places where you directly interact
> with list specific members.
>
> The performance forces are how fast the code runs. This can only be
> determined by running your code against some sort of test case and
> measuring the results. The surprising thing about all these talks in
> the past few years at C++ conferences over "list vs. vector" or "map
> vs. vector" is that the performance analysis shows that vector does
> much, much better than most people expect. This appears to be true for
> both short lists and long lists, short vectors and long vectors.
>
> The costs for traversing up the computer memory hierarchy have been
> getting bigger over time. Whereas before a cache miss was undesirable
> but tolerable, today it can absolutely kill your performance.
> <https://en.wikipedia.org/wiki/Memory_hierarchy>

I will try both, then, and see how they perform. Vectors would certainly
be easier for maintaining "dot" (the current line in ED/EX/VI parlance),
whereas today "dot" is maintained as an iterator (effectively like a pointer).

Thanks,

Chris M. Thomasson

unread,
Jun 25, 2015, 2:25:11 PM6/25/15
to
> wrote in message news:Q8Wix.1$FC...@fx23.iad...

> I'm new to C++, and took on writing a VI-like editor as my "break the
> seal"
> project.

> After 6k lines of code, I have an editor that stumbles along, and handles
> most
> things; it's not pretty, and I'll be rewriting it again to incorporate all
> the
> stuff I've learned since March.

[...]

FWIW, you just might be interested in:

https://en.wikipedia.org/wiki/Rope_(data_structure)

;^)

Christian Gollwitzer

unread,
Jun 25, 2015, 2:27:09 PM6/25/15
to
Am 25.06.15 um 18:39 schrieb nor...@example.com:
> Early on in the design, I made the choice to use lists rather than vectors
> to hold the working set of lines. That is, in a ten line file, there are
> ten list elements, each effectively holding a std::string.
>
>[...]
> I'm wondering if in this specific domain (an editor), lists are in fact the
> better choice over vectors as the holder-of-all-lines?

Look up a gap buffer. AFAIK that is a data structure optimized for a
speed/size tradeoff used in text editors.

If you ever need an editing component in a real project, use a library
like Scintilla to do that. Real editors consist of many man-years of
effort to make them fast, memory-efficient and pleasant to use.

Christian

Chris M. Thomasson

unread,
Jun 25, 2015, 2:28:32 PM6/25/15
to
> "Chris M. Thomasson" wrote in message
> news:mmhh1q$as8$1...@speranza.aioe.org...

> > [noreply]:wrote in message news:Q8Wix.1$FC...@fx23.iad...
Here is a paper on the idea:

http://www.cs.rit.edu/usr/local/pub/jeh/courses/QUARTERS/FP/Labs/CedarRope/rope-paper.pdf

Hope is it of interest to you.

:^)

nor...@example.com

unread,
Jun 25, 2015, 2:40:09 PM6/25/15
to
Thanks! So much to learn :-)

I'm thinking that the rope concept can be used for the lists of lines, rather than
just the contents of each line.

Cheers,

nor...@example.com

unread,
Jun 25, 2015, 2:42:01 PM6/25/15
to
Christian Gollwitzer <auri...@gmx.de> wrote:
> Am 25.06.15 um 18:39 schrieb nor...@example.com:
>> Early on in the design, I made the choice to use lists rather than vectors
>> to hold the working set of lines. That is, in a ten line file, there are
>> ten list elements, each effectively holding a std::string.
>>
>>[...]
>> I'm wondering if in this specific domain (an editor), lists are in fact the
>> better choice over vectors as the holder-of-all-lines?
>
> Look up a gap buffer. AFAIK that is a data structure optimized for a
> speed/size tradeoff used in text editors.

Yes, but this (gap buffer) is at the *string* level; my main data structure problems
(so far) are at the *line* level -- i.e., how to organize the collection of
std::strings that make up the content of the file being editted.

> If you ever need an editing component in a real project, use a library
> like Scintilla to do that. Real editors consist of many man-years of
> effort to make them fast, memory-efficient and pleasant to use.

For sure. This is purely as a learning exercise :-)

Cheers,

Mr Flibble

unread,
Jun 25, 2015, 2:51:52 PM6/25/15
to
Hi,

A good data structure for an editor is something like my container
called a "segmented array". A segmented array has relative fast (O(lg
N)) performance for random inserts/erases which is essential for an editor.

For more information go to http://www.i42.co.uk/stuff/segmented_array.htm

neolib::segmented_array<char> would be ideal.

/Flibble

Richard

unread,
Jun 25, 2015, 3:59:27 PM6/25/15
to
[Please do not mail me a copy of your followup]

nor...@example.com spake the secret code
<xZWix.2$oa...@fx01.iad> thusly:

>Richard <legaliz...@mail.xmission.com> wrote:
>> typedef std::list<std::string> line_container_t;
>>
>> void munge_lines(line_container_t::iterator begin,
>> line_container_t::iterator end)
>> {
>> // do stuff with lines through iterators
>> }
>
>Yes, this is exactly how the code is structured. The one interesting
>tradeoff is that I can no longer really use "line numbers" in the editor
>(when using lists) because there's no direct index into the list, whereas
>with a vector it would be directly supported.

Have you considered using a value type that explicitly represents a
line of text and the file line number on which it resides?

typedef std::pair<unsigned, std::string> line_t;
typedef std::list<line_t> line_container_t;

>I will try both, then, and see how they perform. Vectors would certainly
>be easier for maintaining "dot" (the current line in ED/EX/VI parlance),
>whereas today "dot" is maintained as an iterator (effectively like a pointer).

In addition to looking at rope classes (a good suggestion), there is
another possibility. You might also want to take a look at "string
views". The idea here is to have a class that represents a view into
some string data held elsewhere but doesn't actually hold the string.
There is a standards proposal paper from 2013 on this:
<https://isocpp.org/files/papers/N3762.html>

I wrote up something similar to a string view that I called a "string
chain": <https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb>.
In my case I was parsing an email file and wanting to get a a single
thing that I could manipulate that would be all the text associated
with a header field. (Header fields can be split across multiple
physical lines with intervening whitespace, but the whitespace is not
part of the header field value.) In my case the source data is a
memory-mapped file, so I simply needed to scan through it looking for
interesting segments to join together into a whole unit that I called a
chain.

nor...@example.com

unread,
Jun 25, 2015, 4:09:12 PM6/25/15
to
Very interesting!

Why is everyone suggesting that the editor be character oriented, instead
of line oriented?

I would have thought something more like:

neolib::segmented_array<string>

?

I've been going on the model of "line oriented" because in reality, VI
is a full screen front end to a line-oriented underlay (ED/EX)...

?

Cheers,

nor...@example.com

unread,
Jun 25, 2015, 4:13:28 PM6/25/15
to
Richard <legaliz...@mail.xmission.com> wrote:
> [Please do not mail me a copy of your followup]
>
> nor...@example.com spake the secret code
> <xZWix.2$oa...@fx01.iad> thusly:
>
>>Richard <legaliz...@mail.xmission.com> wrote:
>>> typedef std::list<std::string> line_container_t;
>>>
>>> void munge_lines(line_container_t::iterator begin,
>>> line_container_t::iterator end)
>>> {
>>> // do stuff with lines through iterators
>>> }
>>
>>Yes, this is exactly how the code is structured. The one interesting
>>tradeoff is that I can no longer really use "line numbers" in the editor
>>(when using lists) because there's no direct index into the list, whereas
>>with a vector it would be directly supported.
>
> Have you considered using a value type that explicitly represents a
> line of text and the file line number on which it resides?
>
> typedef std::pair<unsigned, std::string> line_t;
> typedef std::list<line_t> line_container_t;

I did, and rejected it on the basis that if I do a deletion then I
have to go and renumber all the lines (after the deletion). Same
with an insertion / move / copy. The pathological case "g/^/.m0"
would be a disaster.

>>I will try both, then, and see how they perform. Vectors would certainly
>>be easier for maintaining "dot" (the current line in ED/EX/VI parlance),
>>whereas today "dot" is maintained as an iterator (effectively like a pointer).
>
> In addition to looking at rope classes (a good suggestion), there is
> another possibility. You might also want to take a look at "string
> views". The idea here is to have a class that represents a view into
> some string data held elsewhere but doesn't actually hold the string.
> There is a standards proposal paper from 2013 on this:
> <https://isocpp.org/files/papers/N3762.html>
>
> I wrote up something similar to a string view that I called a "string
> chain": <https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb>.
> In my case I was parsing an email file and wanting to get a a single
> thing that I could manipulate that would be all the text associated
> with a header field. (Header fields can be split across multiple
> physical lines with intervening whitespace, but the whitespace is not
> part of the header field value.) In my case the source data is a
> memory-mapped file, so I simply needed to scan through it looking for
> interesting segments to join together into a whole unit that I called a
> chain.

More food for thought, thanks!

Cheers,

Mr Flibble

unread,
Jun 25, 2015, 4:40:28 PM6/25/15
to
No, that wouldn't be a good idea. A custom data structure based on
segmented_array<char> that also has updatable iterators to lines within
that data structure would be more appropriate.

>
> I've been going on the model of "line oriented" because in reality, VI
> is a full screen front end to a line-oriented underlay (ED/EX)...

Editors can also have paragraphs and pages not just lines.

/Flibble

Jorgen Grahn

unread,
Jun 26, 2015, 3:20:30 PM6/26/15
to
Or google "The Craft of Text Editing: Emacs for the Modern World".
That's what Colin Plumb told me to do, when I asked about data
structures for a text editor, 20 years ago ...

(I never finished that text editor.)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
0 new messages