Scintilla background processing

127 views
Skip to first unread message

Mike Lischke

unread,
Oct 22, 2013, 9:31:06 AM10/22/13
to scintilla...@googlegroups.com
Hey Neil,

I haven't found any discussion related to that with a quick search, so let me ask here: what would you think of letting Scintilla allow to change certain states from a background thread? Say, I have a big script file and want to add markers for each statement + error markers for syntax errors. Currently this is only possible from the main thread which adds a visual hick-up  when opening/pasting large text.

Additionally, have you ever thought about letting the lexers do their job in a background thread? For large files there can go by like 10+ secs until the lexer has done lexing the entire file. By letting it do that in the background we can let the user more quickly interact with a just opened file. What makes this worse is that lexing is only done when I bring a line into view that has not been lexed yet. So even if I wait 1 minute after loading it will not speed up lexing.

Neil Hodgson

unread,
Oct 22, 2013, 7:27:22 PM10/22/13
to scintilla...@googlegroups.com
Mike Lischke:

> I haven't found any discussion related to that with a quick search, so let me ask here: what would you think of letting Scintilla allow to change certain states from a background thread? ...

GUI toolkits often restrict threading. Events must, almost always, be (initially) handled on a designated GUI thread which is generally the main thread of the process. Some platforms allow drawing on another thread but its uncommon to implement, there are often limitations and threaded calls receive less testing and may have consequent bugs. Documentation of threading limitations is poor. Platforms change threading capabilities over time - GTK+ deprecated threaded GUI operations in version 3.6.

Some of the major time sinks in Scintilla are when drawing or preparing for drawing. There are some candidates for threaded code in these areas. For example, it might be possible to divide measuring text up between threads, but the font objects and measurement contexts provided by the platform and needed for this may not be thread safe. Thus there would have to be a platform opt-in to threaded measurement and a non-threaded code path. Text measurement relies strongly on two caches (PositionCache and LineLayoutCache) and access to these would have to be synchronized with mutexes or similar, which would reduce performance.

Because of the complexity, uncertain performance gains, and having reasonable performance for most uses, I've not implemented any threading inside Scintilla. The only work I've done was to make loading and saving on background threads work reasonably well since I/O latencies can be large.

> Additionally, have you ever thought about letting the lexers do their job in a background thread?

Yes, a lot. Also moving lexing into a separate tightly sandboxed process for security. There are 111 lexers delivered with Scintilla and these differ in complexity, quality and degree of testing. The only security advisory ever received for Scintilla was for a lexer.

For a threaded lexer/folder access to the text, style information, and fold state must be made safe for both the lexer and other potentially concurrent activities like drawing. It may be possible to synchronize these uses by locking some or all of the state of Scintilla but there are costs to locking and the granularity of locking may cause performance problems in either the lexer or the other code. Lexers would have to handle situations where text was changed before their current position causing them to abort and restart.

Folding is more difficult than lexing since each change in fold level can be and normally is reported to the container. This allows the container to ensure its folding rules are followed such as showing sections which have had their fold headers removed. The report is done as a normal platform-mediated event so is delivered on the GUI thread. It may be possible to buffer fold level changes and deliver them as a block after fold level discovery has occurred.

Instead of locking Scintilla data structures while lexing, it may be possible (particularly when using a separate process) to make a copy of the text being lexed plus some related information and send this to the lexer; have the lexer/folder use this to produce block of styles and a block of fold levels; then send the results back to the main thread which checks that no changes occurred while the lexing was performed and integrate the results. For copying, you would want to limit the amount copied since only a limited range should be lexed each time and the document may be many megabytes. However, copying is made more difficult since lexers may backtrack before the range being lexed to discover additional state information.

Threaded lexing is a big messy job and its unlikely I'll implement it unless a customer needs it enough to pay for the work.

> For large files there can go by like 10+ secs until the lexer has done lexing the entire file. By letting it do that in the background we can let the user more quickly interact with a just opened file. What makes this worse is that lexing is only done when I bring a line into view that has not been lexed yet. So even if I wait 1 minute after loading it will not speed up lexing.

The application can ask for ranges beyond current visibility to be lexed with SCI_COLOURISE. Implement an idler/timer and ask for an incremental lex in each call.

The 'Responsive Scrolling' feature of OS X 10.9 works (partly) by asking the view to draw not-yet-visible rectangles in idle time, which also causes additional lexing in Scintilla. Scintilla can not yet turn on responsive scrolling, mostly because it doesn't invalidate these speculative rectangles when required by lexing but there are also issues with the animated find indicator and line versus pixel scrolling.

> Say, I have a big script file and want to add markers for each statement + error markers for syntax errors. Currently this is only possible from the main thread which adds a visual hick-up when opening/pasting large text.

A technique that may work but is dependent on the particulars of your operation is to divide the operation into work that can be performed in the background and the minimal work that changes Scintilla. The implementation of spell-checking in SciTE for OS X works with a loop that copies the text from a particular number of lines; performs the spell checking on another thread producing a vector of mistake ranges; then, on the main thread, sets an indicator for each mistake range. This works well since the spell checking calls are much slower than setting indicators in Scintilla.

Neil

Mike Lischke

unread,
Oct 24, 2013, 6:21:53 AM10/24/13
to scintilla...@googlegroups.com

Hey Neil,


 Some of the major time sinks in Scintilla are when drawing or preparing for drawing. There are some candidates for threaded code in these areas. For example, it might be possible to divide measuring text up between threads, but the font objects and measurement contexts provided by the platform and needed for this may not be thread safe. Thus there would have to be a platform opt-in to threaded measurement and a non-threaded code path. Text measurement relies strongly on two caches (PositionCache and LineLayoutCache) and access to these would have to be synchronized with mutexes or similar, which would reduce performance.

Well, I did not mean to do (part of) the rendering in a thread. I was talking about doing some state changes via a separate thread and let this then render normally via the main thread. For instance adding markers does not need to happen in the main thread. However it is clear that manipulating states in a thread needs some locking mechanism. Maybe it would just help if I could apply a set of markers in one call to Scintilla? I have a situation with big files containing like 60K commands so I need to add 60K markers (+ error markup if needed). Doing 60K (actually 120K since I need SCI_MARKERGET also to combine existing markers with the new one) calls Scintilla is certainly slowing down processing.


 For a threaded lexer/folder access to the text, style information, and fold state must be made safe for both the lexer and other potentially concurrent activities like drawing.

Usually concurrency is only a problem for multiple write accesses. In cases like this where you have a single writer (the styler) + one or more readers it's by far not as critical as it seems. It can only be a problem if multiple values combined have a meaning but threading can be interrupted between two writes. I can imagine that writing the styling byte is not such a case and even if it is it would normally only be a small visual irritation that is corrected after a very short time.

And I have seen editors that do exactly that. They start out with a standard styling and after a short while elements are converted to their final styling in parallel to normal operations like editing.

I can also imagine a split approach. Do the styling of the visible content in the main thread but continue with the rest in a worker thread. That should certainly bring any friction between threads down to a minimum.


It may be possible to synchronize these uses by locking some or all of the state of Scintilla but there are costs to locking and the granularity of locking may cause performance problems in either the lexer or the other code. Lexers would have to handle situations where text was changed before their current position causing them to abort and restart.

Not necessarily. Lexers can continue with the line even if the text changed. That's only a very short visual mis-styling. I know editors that do it this way. They do not immediately style new text but only after a short pause. This is especially important when the styling depends on further context (like being a keyword in a specific context or not).



 Folding is more difficult than lexing since each change in fold level can be and normally is reported to the container. This allows the container to ensure its folding rules are followed such as showing sections which have had their fold headers removed. The report is done as a normal platform-mediated event so is delivered on the GUI thread. It may be possible to buffer fold level changes and deliver them as a block after fold level discovery has occurred.

Also here: fold the visible part in the main thread, but do everything currently not visible in a background thread. Or at least make it safe to be used by a background thread if devs want that.


For large files there can go by like 10+ secs until the lexer has done lexing the entire file. By letting it do that in the background we can let the user more quickly interact with a just opened file. What makes this worse is that lexing is only done when I bring a line into view that has not been lexed yet. So even if I wait 1 minute after loading it will not speed up lexing.

 The application can ask for ranges beyond current visibility to be lexed with SCI_COLOURISE. Implement an idler/timer and ask for an incremental lex in each call.

Sounds interesting. Can I call SCI_COLOURISE in a background thread (only for the currently not visible editor part)?


Say, I have a big script file and want to add markers for each statement + error markers for syntax errors. Currently this is only possible from the main thread which adds a visual hick-up  when opening/pasting large text.

 A technique that may work but is dependent on the particulars of your operation is to divide the operation into work that can be performed in the background and the minimal work that changes Scintilla. The implementation of spell-checking in SciTE for OS X works with a loop that copies the text from a particular number of lines; performs the spell checking on another thread producing a vector of mistake ranges; then, on the main thread, sets an indicator for each mistake range. This works well since the spell checking calls are much slower than setting indicators in Scintilla.

Yes, this is what I already do. I start a thread every time a text change occured to scan statement borders and find errors. Unfortunately, due to the multi-line-comment and other syntactic elements that may change the following structure I have to do this for the entire text. Once this is done I call the main thread to run my markup task. And this task is then blocking any user input until done which is irritating if you have a big file where the actual work takes several seconds in which the user can start typing or just scrolling in the document, just to be blocked out of the sudden.

However, I'm already planning for some optimizations to lower the number of actual changes (e.g. only change statement markers for ranges that changed by text change etc.), so I'm certainly not only relying on Scintilla to do the hard work. But it is for sure a valid concern how to make use of threads also in the context of text editing (also in the light that there are still critics that programmers don't use threading as often as it would make sense, under-utilizing so modern hardware).

Mike
--
www.soft-gems.net


Neil Hodgson

unread,
Oct 24, 2013, 7:36:37 PM10/24/13
to scintilla...@googlegroups.com
Mike Lischke:

> Maybe it would just help if I could apply a set of markers in one call to Scintilla? I have a situation with big files containing like 60K commands so I need to add 60K markers (+ error markup if needed). Doing 60K (actually 120K since I need SCI_MARKERGET also to combine existing markers with the new one) calls Scintilla is certainly slowing down processing.

If you are concerned about the user being blocked during the changes, then perform batches of changes during idle time, perhaps prioritising changes that affect currently visible lines.

SCI_MARKERADD and SCI_MARKERDELETE leave other markers alone.

> Usually concurrency is only a problem for multiple write accesses. In cases like this where you have a single writer (the styler) + one or more readers it's by far not as critical as it seems. It can only be a problem if multiple values combined have a meaning but threading can be interrupted between two writes.

The reader has to see a consistent view of the data structures which contain multiple elements. The threads may interleave in such a way that the writer has caused the gap to move or the text to be reallocated but the fields referring to that state have not yet been completely updated. The reader (lexer) may then use the inconsistent state and read beyond the allocation causing a crash or read freed and overwritten memory.

> I can also imagine a split approach. Do the styling of the visible content in the main thread but continue with the rest in a worker thread. That should certainly bring any friction between threads down to a minimum.

Maybe but having two active lexers could create contention over style writing.

> Not necessarily. Lexers can continue with the line even if the text changed. That's only a very short visual mis-styling.

Its work that is not needed and will have to be redone, slowing the delivery of correct styling.

> Sounds interesting. Can I call SCI_COLOURISE in a background thread (only for the currently not visible editor part)?

Not in a useful way since the lexing has to be done on the main thread. You can have another thread request styling but that call will have to be marshalled onto the main thread. On Windows, SendMessage will perform the marshalling.

> But it is for sure a valid concern how to make use of threads also in the context of text editing (also in the light that there are still critics that programmers don't use threading as often as it would make sense, under-utilizing so modern hardware).

Threading is a source of complexity and bugs so should only be used when there is a significant benefit.

Neil

Matthew Brush

unread,
Oct 24, 2013, 9:17:28 PM10/24/13
to scintilla...@googlegroups.com
On 13-10-24 03:21 AM, Mike Lischke wrote:
>
> [snip]
>
> And I have seen editors that do exactly that. They start out with a
> standard styling and after a short while elements are converted to
> their final styling in parallel to normal operations like editing.
>

GtkSourceView does this extremely well. It never blocks no matter how
large the file is. It fills in the view incrementally and highlights it
incrementally without ever freezing the GUI and making you wait before
you can use it.

Cheers,
Matthew Brush

Neil Hodgson

unread,
Oct 25, 2013, 5:08:42 PM10/25/13
to scintilla...@googlegroups.com
Matthew Brush:

> GtkSourceView does this extremely well. It never blocks no matter how large the file is. It fills in the view incrementally and highlights it incrementally without ever freezing the GUI and making you wait before you can use it.

You could implement an option that allows Scintilla to paint unlexed text. There are several issues that would need to be addressed: how the lexing is scheduled; what the appearance of unlexed text should be (current style bytes or a fixed style); and how the scheduling is prioritised against other tasks. The prioritising is more interesting when wrapping is on since wrapping is also a background task and it prioritises the currently visible text and depends on that text being lexed.

Neil

Matthew Brush

unread,
Oct 26, 2013, 4:20:35 PM10/26/13
to scintilla...@googlegroups.com
I only know a little about GTK+ and very little about Scintilla or
GtkSourceView's implementations but...

- how the lexing is scheduled
I'd say in low-priority idle handlers (ex. g_idle_add or
custom GSource) on the main loop, in chunks small enough
not to block the UI.
- what the appearance of unlexed text should be
On the first pass, STYLE_DEFAULT, on subsequent passes
whatever the previous styles were.
- how the scheduling is prioritized against other tasks
In low/idle priority, after more important stuff like
all the other events pending in the main/message loop, including
updating the rest of the application's user-interface.

I'm not sure about line wrapping, it makes things way more complicated.

I don't really know enough about Scintilla internals to add this option
myself, I've only ever used the thin user API. I only mentioned
GtkSourceView because it seems to accomplish the goal being discussed to
not block while display/lexing text, and IIUC without using threads,
which would make things super complicated as you discussed previously.

Cheers,
Matthew Brush
Reply all
Reply to author
Forward
0 new messages