Incorrect linesAdded in notification

Mike Lischke

unread,

Jan 19, 2012, 7:41:52 AM1/19/12

to scintilla...@googlegroups.com

Hey Neil,

I just saw that the reported number of lines added in an SCI_SETTEXT call is constantly one off. That's true for the first notification which tells about lines removed as well as the following text insertion notification. Is that a bug or has it so by intention?

Mike
--
www.soft-gems.net

Neil Hodgson

unread,

Jan 19, 2012, 9:38:19 PM1/19/12

to scintilla...@googlegroups.com

Mike Lischke:

> I just saw that the reported number of lines added in an SCI_SETTEXT call is
> constantly one off. That's true for the first notification which tells about
> lines removed as well as the following text insertion notification. Is that
> a bug or has it so by intention?

Seems correct to me. The code involved is very simple, counting the
lines before and after the change and then reporting the difference.
Perhaps you are counting lines differently to Scintilla. For Scintilla
an empty document contains 1 line which is empty. An empty file has to
have a line for the caret to appear on.

Neil

Mike Lischke

unread,

Jan 20, 2012, 3:26:41 AM1/20/12

to scintilla...@googlegroups.com

Hmm, ok, that's something to consider, but imagine you have 10 lines in your document and remove all of them, then I'd expect linesAdded to contain -10 not -9. The single line you mentioned is (logically) added after all of the previous lines have been removed. It is not so that only 9 lines are removed and one is just cleared or so. In actual code it might be done differently but when I clear such a document I remove 10 lines, regardless what Scintilla additionally does for its internal state.

Mike
--
www.soft-gems.net

Neil Hodgson

unread,

Jan 22, 2012, 8:42:25 PM1/22/12

to scintilla...@googlegroups.com

Mike Lischke:

> Hmm, ok, that's something to consider, but imagine you have 10 lines in your
> document and remove all of them, then I'd expect linesAdded to contain -10 not -9.
> The single line you mentioned is (logically) added after all of the previous lines
> have been removed. It is not so that only 9 lines are removed and one is just
> cleared or so.

You can imagine it being added in later if that is the model you
want to present but its not how Scintilla works. A document always
contains a line. If you want to expose an interface that works your
way then you will have to add some scaffolding to present that facade
over Scintilla.

> In actual code it might be done differently but when I clear such a document I
> remove 10 lines, regardless what Scintilla additionally does for its internal state.

There is no command in Scintilla to 'remove 10 lines'. What you are
doing is removing a block of text. That removal has consequences for
the number of lines.

Neil

Mike Lischke

unread,

Jan 23, 2012, 3:29:40 AM1/23/12

to scintilla...@googlegroups.com

> You can imagine it being added in later if that is the model you
> want to present but its not how Scintilla works. A document always
> contains a line. If you want to expose an interface that works your
> way then you will have to add some scaffolding to present that facade
> over Scintilla.
>
>> In actual code it might be done differently but when I clear such a document I
>> remove 10 lines, regardless what Scintilla additionally does for its internal state.
>
> There is no command in Scintilla to 'remove 10 lines'. What you are
> doing is removing a block of text. That removal has consequences for
> the number of lines.

I'm sorry, but I fail to follow your logic. It simply sounds weird to me. When I have a parser attached to my document which relies on the number of lines to parse when they are added and Scintilla fails to report the correct number then something in Scintilla is not correct. It doesn't matter for this particular function how Scintilla internally represents that (lines vs a block etc.). The notification carries a count of lines that have been added and that should reflect what really happened.

I'll probably create a workaround for that behavior by adding 1 to the reported count and hope that's the only needed adjustment. Having to know how Scintilla internally works to get correct behavior violates the isolation rule IMO.

Thanks anyway, Neil.

Mike
--
www.soft-gems.net

Lex Trotman

unread,

Jan 23, 2012, 4:56:01 AM1/23/12

to scintilla...@googlegroups.com

Neil, to check my understanding, would it be fair to say that
Scintilla is reporting how many end of line marks are added/removed
from the buffer, a ten line file having nine end of line marks?

This means you always get an integer line count even if I delete two
and a half lines.

Cheers
Lex

>
> Thanks anyway, Neil.
>
> Mike
> --
> www.soft-gems.net
>

> --
> You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
> To post to this group, send email to scintilla...@googlegroups.com.
> To unsubscribe from this group, send email to scintilla-inter...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scintilla-interest?hl=en.
>

Neil Hodgson

unread,

Jan 23, 2012, 3:50:20 PM1/23/12

to scintilla...@googlegroups.com

Mike Lischke:

> ... and Scintilla fails to

> report the correct number then something in Scintilla is not correct.
> It doesn't matter for this particular function how Scintilla internally
> represents that (lines vs a block etc.). The notification carries a
> count of lines that have been added and that should reflect what
> really happened.

Scintilla is trying to present a consistent model of a document. It
doesn't matter what the internal implementation is that makes that
model work. I suspect that you want a model where the single character
document containing "x" contains 1 line and the empty document ""
contains 0 lines. Possibly you mean something else where the number of
lines reported in the insertion and deletion notifications is not the
same as the difference in the number of lines between the current
state and the previous state.

> I'll probably create a workaround for that behavior by adding 1
> to the reported count and hope that's the only needed adjustment.

That is unlikely to produce consistent results.

Lex Trotman:

> Neil, to check my understanding, would it be fair to say that
> Scintilla is reporting how many end of line marks are added/removed
> from the buffer, a ten line file having nine end of line marks?

That is a consequence of the number of lines always being 1 more
than the number of line end marks, where a line end mark may be CR, LF
or CR+LF. It wasn't the purpose of the current behaviour.

> This means you always get an integer line count even if I delete two
> and a half lines.

Returning a fractional number would require defining what 'two and
a half lines' means and there are multiple candidates.

Neil

Mike Lischke

unread,

Jan 24, 2012, 3:17:29 AM1/24/12

to scintilla...@googlegroups.com

Hey Neil,

thanks for your time.

> Scintilla is trying to present a consistent model of a document. It
> doesn't matter what the internal implementation is that makes that
> model work. I suspect that you want a model where the single character
> document containing "x" contains 1 line and the empty document ""
> contains 0 lines. Possibly you mean something else where the number of
> lines reported in the insertion and deletion notifications is not the
> same as the difference in the number of lines between the current
> state and the previous state.

What I mean is fairly simple and honestly, I'm surprised that it needs so much explanation. What I'm after is this:

When I set the text of a scintilla document consisting of 10 lines then I want that linesAdded contains a 10. Similar for removals and additions. That's all.

>
>> I'll probably create a workaround for that behavior by adding 1
>> to the reported count and hope that's the only needed adjustment.
>
> That is unlikely to produce consistent results.

I'm all ears for a better idea.

>> Neil, to check my understanding, would it be fair to say that
>> Scintilla is reporting how many end of line marks are added/removed
>> from the buffer, a ten line file having nine end of line marks?
>
> That is a consequence of the number of lines always being 1 more
> than the number of line end marks, where a line end mark may be CR, LF
> or CR+LF. It wasn't the purpose of the current behaviour.
>
>> This means you always get an integer line count even if I delete two
>> and a half lines.
>
> Returning a fractional number would require defining what 'two and
> a half lines' means and there are multiple candidates.

As someone who has also written a fairly complex editor with syntax highlighting and Unicode support more than 10 years ago I might be able to give some valuable input: In fact, for the editor itself there is no such thing like a line break, line end terminator or whatever you wanna call it. The line separator is purely a thing to help parsing text input and export it later when needed. Internally the editor works with arrays of lines (and I think Scintilla uses a similar approach). So, a line here is not text that is ended by a line break but an entry in the line array. Hence you cannot remove half a line as removing a line means to remove it entirely from the line array. Even making it empty doesn't remove it. With this paradigm it is extremely easy to tell how many lines have been added/removed/touched.

Mike
--
www.soft-gems.net

Lex Trotman

unread,

Jan 24, 2012, 4:26:20 AM1/24/12

to scintilla...@googlegroups.com

[...]

Hi Mike,

> As someone who has also written a fairly complex editor with syntax highlighting and Unicode support more than 10 years ago I might be able to give some valuable input:
> In fact, for the editor itself there is no such thing like a line break, line end terminator or whatever you wanna call it. The line separator is purely a thing to help parsing text
> input and export it later when needed. Internally the editor works with arrays of lines (and I think Scintilla uses a similar approach).

IIUC Scintilla uses a gap buffer, lines are as you say above, a
parsing entity imposed on top.

> So, a line here is not text that is ended by a line break but an entry in the line array.

In Scintilla I don't add or delete line array entries, I add or delete
a number of bytes. If the bytes I delete or add happen to have line
endings in them then Scintilla will add or remove line entries, but
that is invisible to me.

> Hence you cannot remove half a line as removing a line means to remove it entirely from the line array. Even making it empty doesn't remove it.

Correct, and therefore the last line cannot be removed, since you
can't take its non-existent line end away.

> With this paradigm it is extremely easy to tell how many lines have been added/removed/touched.

Except that as I said above, with Scintilla you can't remove lines,
only sequences of characters, they are only lines by virtue of the
fact that they are ended by a line end sequence, but the (possibly
empty) last line is not ended by a line end sequence.

Cheers
Lex

Mike Lischke

unread,

Jan 24, 2012, 9:13:38 AM1/24/12

to scintilla...@googlegroups.com

Hey Lex,

interesting discussion :-)

>> So, a line here is not text that is ended by a line break but an entry in the line array.
>
> In Scintilla I don't add or delete line array entries, I add or delete
> a number of bytes. If the bytes I delete or add happen to have line
> endings in them then Scintilla will add or remove line entries, but
> that is invisible to me.

The bytes you add is just the transport format used to describe what comes in or goes out (like an abstraction). However from the user's point of view as well as internally by the editor you have individual lines and usually no line breaks. As I said line breaks are neither for the inner working of the editor nor the user as such relevant. Even if the user presses the return key he doesn't enter a line break but tells the editor start a new line, ergo a new entry in said line array. The line end code is added when the text is read out later for the pure purpose to be able to restore the line structure later.

>> Hence you cannot remove half a line as removing a line means to remove it entirely from the line array. Even making it empty doesn't remove it.
>
> Correct, and therefore the last line cannot be removed, since you
> can't take its non-existent line end away.

That's a brave conclusion :-) Since there are no line ends in the editor they have no relevance for the ability to remove lines. Lines are removed when the user presses the back-delete key while the cursor is at the beginning of a line or the normal delete key when it is at the end (corner cases not considered here).

Regards,

Mike
--
www.soft-gems.net

Neil Hodgson

unread,

Jan 24, 2012, 3:38:46 PM1/24/12

to scintilla...@googlegroups.com

Mike Lischke:

> When I set the text of a scintilla document consisting of 10 lines then
> I want that linesAdded contains a 10. Similar for removals and additions.
> That's all.

I wonder if the issue here is that you want the linesAdded field to
reflect the number of lines in the added/removed text instead of the
change to the state of the document.

The main purpose of the linesAdded field is to allow synchronizing
the application's view of the document with the document as it changes
state. The application may be maintaining data about lines in the
document and needs to move that data or tag it with another line
number when lines are added or removed. For example, if there is a
breakpoint on line 2 and then line 1 is deleted then the application
will (probably) want to move the breakpoint to line 1.

Neil

Lex Trotman

unread,

Jan 24, 2012, 5:57:28 PM1/24/12

to scintilla...@googlegroups.com

On Wed, Jan 25, 2012 at 1:13 AM, Mike Lischke
<mike.l...@googlemail.com> wrote:
> Hey Lex,

Hi Mike,

>
> interesting discussion :-)

Yes.

Line ends is interesting to me because in Geany we had a bug report
because Scintilla does not support the Uncode U+2028 line separator.
Having seen how it works and after Neil's advice on performance
impact, I don't think we are going to do anything unless the OP
provides a patch acceptable to Neil.

>
>>> So, a line here is not text that is ended by a line break but an entry in the line array.
>>
>> In Scintilla I don't add or delete line array entries, I add or delete
>> a number of bytes. If the bytes I delete or add happen to have line
>> endings in them then Scintilla will add or remove line entries, but
>> that is invisible to me.
>
> The bytes you add is just the transport format used to describe what comes in or goes out (like an abstraction). However from the user's point of view as well as internally by the editor you have individual lines and usually no line breaks.

1. My mental model doesn't match this, my model has line breaks as
characters in it.

2. the Scintilla code's mental model also has line breaks in it, it
just memmove()s the bytes to the buffer, including any line break
sequences, then it scans for them and adds the extra structure to help
render as separate lines on the display.

> As I said line breaks are neither for the inner working of the editor nor the user as such relevant.

AFAICT wrong on both counts, as above. (I'm not claiming all users,
but one counter example is all I need :)

> Even if the user presses the return key he doesn't enter a line break but tells the editor start a new line, ergo a new entry in said line array. The line end code is added when the text is read out later for the pure purpose to be able to restore the line structure later.

That may be the model your previous editor used, but it isn't my
mental model, nor, AFAICT the way Scintilla works. I guess we have
different backgrounds here.

>
>>> Hence you cannot remove half a line as removing a line means to remove it entirely from the line array. Even making it empty doesn't remove it.
>>
>> Correct, and therefore the last line cannot be removed, since you
>> can't take its non-existent line end away.
>
> That's a brave conclusion :-) Since there are no line ends in the editor they have no relevance for the ability to remove lines. Lines are removed when the user presses the back-delete key while the cursor is at the beginning of a line or the normal delete key when it is at the end (corner cases not considered here).

But it is the corner case that we are talking about :) when all the
content has been removed from the buffer, the cursor still exists on a
line on the screen, and as Neil said that last "line" can never be
removed.

Cheers
Lex

>
> Regards,

Mike Lischke

unread,

Jan 25, 2012, 3:23:10 AM1/25/12

to scintilla...@googlegroups.com

Hey Lex,

>> The bytes you add is just the transport format used to describe what comes in or goes out (like an abstraction). However from the user's point of view as well as internally by the editor you have individual lines and usually no line breaks.
>
> 1. My mental model doesn't match this, my model has line breaks as
> characters in it.

That's why things here are more complicated as they need to be and require such a long discussion to make that clear.

>
> 2. the Scintilla code's mental model also has line breaks in it, it
> just memmove()s the bytes to the buffer, including any line break
> sequences, then it scans for them and adds the extra structure to help
> render as separate lines on the display.

This parsing process is just a transformation to prepare the text from its transportation format into the inner Scintilla structures. After that line breaks are no longer needed. They are just (simple) markup, not content.

>
>> As I said line breaks are neither for the inner working of the editor nor the user as such relevant.
>
> AFAICT wrong on both counts, as above. (I'm not claiming all users,
> but one counter example is all I need :)

I see. So we agree not to agree :-D

> But it is the corner case that we are talking about :) when all the
> content has been removed from the buffer, the cursor still exists on a
> line on the screen, and as Neil said that last "line" can never be
> removed.

I guess that's what I have to live with anyway, but actually this is not the point here. If the design is to have always at least one line in the editor so be it. What I have trouble with is that the reported count of added/removed lines does not correspond to the action the user did, but reflects inner states of Scintilla which are not at all relevant to the user. And my task is to tell the user/parser that 10 lines where added when he has added 10 lines.

Regards,

Mike
--
www.soft-gems.net

Lex Trotman

unread,

Jan 25, 2012, 6:19:21 AM1/25/12

to scintilla...@googlegroups.com

Hi Mike,

[...]

> That's why things here are more complicated as they need to be and require such a long discussion to make that clear.
>

Things are simple for me, since my models matches reality, oh ok, my
reality, see below.

[...]

> I see. So we agree not to agree :-D
>

No problem :)

[...]

>
> I guess that's what I have to live with anyway, but actually this is not the point here. If the design is to have always at least one line in the editor so be it. What I have trouble with is that the reported count of added/removed lines does not correspond to the action the user did, but reflects inner states of Scintilla which are not at all relevant to the user. And my task is to tell the user/parser that 10 lines where added when he has added 10 lines.

I think you have given yourself a headache, simple examples as I (and
I believe Scintilla) see them:

All three start with an empty buffer.

1. Add "abc", did this add a line? I see just adding characters to the
line that was already visible on the screen and the cursor still on
that line, line count reported = 0

2. Add "abc\n", did this add one or two lines? I see adding one line
to the existing line and the cursor is on the original line, line
count reported = 1

3. Add "abc\ndef", did this add one or two lines? I see adding one
line plus some characters to the existing line and the cursor is still
on the original line after the 'f', line count reported = 1

By the model your user/parser has, only 2 is correct, the others are
wrong. But always adding one to the reported count will then make 2
wrong. But then the answers change again if you insert those ranges
of text in the middle of an existing line. So all in all you have a
headache.

The Geany solution is to not tell the user a delta line count. For
your parser, I guess it depends on how it works. I wonder if the range
of changed text or lines would be more useful? After all I can change
an existing line without adding extra ones and the parser should
consider that change.

Cheers
Lex

Neil Hodgson

unread,

Jan 25, 2012, 7:17:31 PM1/25/12

to scintilla...@googlegroups.com

Mike Lischke:

> This parsing process is just a transformation to prepare the text from
> its transportation format into the inner Scintilla structures. After that
> line breaks are no longer needed. They are just (simple) markup, not
> content.

Scintilla supports mixed line ends where a file may contain "\r",
"\n", and "\r\n" line ends. It preserves and can display those line
ends. In the future, it could support other line ends such as U+2028.
To me, the line end bytes are an integral part of the document. A
document, like the file it was loaded from is a sequence of octets.
Scintilla then derives data structures representing lines but the
sequence of octets remains the primary source of data.

Neil

Reply all

Reply to author

Forward