Re: [scintilla] Parallel layout

Neil Hodgson

unread,

Feb 2, 2022, 3:53:42 AM2/2/22

to Scintilla mailing list, scite-i...@googlegroups.com

Me:

I wrote a proposal to multithread layout along with an implementing patch on SourceForge. This achieves a better than 4x speed up for wide lines on a 4 core 8 thread processor with Windows 10 and DirectWrite.

https://sourceforge.net/p/scintilla/feature-requests/1427/

The parallel layout feature has been committed. The application has to ask for it with the SCI_SETLAYOUTTHREADS(int threads) API as the default is single threaded. The number of threads is capped by the hardware concurrency of the CPU = number of cores, doubled if hyper-threaded.

Sometimes it is faster to use fewer than the maximum number of threads but this is mostly in cases which are very fast anyway, such as when the file is (almost) all ASCII and the ASCII Monospace feature is active.

It is available on macOS, Win32 when using DirectWrite, and GTK (except on Win32 and macOS). The Pango library used for text by GTK is not supposed to be thread-safe on Win32. Enabling threaded layout is determined with the SC_SUPPORTS_THREAD_SAFE_MEASURE_WIDTHS support flag which can also be checked by the application (SCI_SUPPORTSFEATURE). Other platform layers should return true for SC_SUPPORTS_THREAD_SAFE_MEASURE_WIDTHS when their implementations of MeasureWidths and MeasureWidthsUTF8 are reentrant and thread-safe. There was minimal performance impact making macOS and DirectWrite thread-safe but there is a small (maybe 5%) cost on GTK due to recreating more state on each call.

Qt is a complex case as it is layered over various graphics APIs that may or may not be thread-safe.

SciTE has a threads.layout property which defaults to 16 which should be maximum concurrency on most machines.

Threaded layout will commonly use less energy than single threaded as the computer 'races to sleep’ but there are rare scenarios where more energy could be used.

https://en.wikichip.org/wiki/race-to-sleep

The current implementation works on very wide lines only. One future area of investigation is how to use threading for narrow (<150 byte) lines that are normal in source code, perhaps by processing lines on separate threads.

At this point, its most important to find any problems particularly on older and less common setups. Additional checks can be added to disable threading when it could fail.

Only tested on moderately parallel systems (4 cores, with and without hyper threading) so it will be interesting to see how it handles wider CPUs. There are some changes to locking that could be tried if benefits drop off with more parallelism.

The committed changes can be examined either in the repositories

hg clone http://hg.code.sf.net/p/scintilla/code scintilla
hg clone http://hg.code.sf.net/p/scintilla/scite
or from
https://www.scintilla.org/scite.zip Source
https://www.scintilla.org/wscite.zip Windows executable (64-bit)

Neil

Mitchell

unread,

Feb 2, 2022, 1:28:51 PM2/2/22

to scite-i...@googlegroups.com

Hi Neil,

On Wed, 2 Feb 2022 19:53:37 +1100
"'Neil Hodgson' via scite-interest" <scite-i...@googlegroups.com> wrote:

> Me:
>
> > I wrote a proposal to multithread layout along with an implementing patch on SourceForge. This achieves a better than 4x speed up for wide lines on a 4 core 8 thread processor with Windows 10 and DirectWrite.
> >
> > https://sourceforge.net/p/scintilla/feature-requests/1427/
>
> The parallel layout feature has been committed. The application has to ask for it with the SCI_SETLAYOUTTHREADS(int threads) API as the default is single threaded. The number of threads is capped by the hardware concurrency of the CPU = number of cores, doubled if hyper-threaded.
>
> Sometimes it is faster to use fewer than the maximum number of threads but this is mostly in cases which are very fast anyway, such as when the file is (almost) all ASCII and the ASCII Monospace feature is active.
>
> It is available on macOS, Win32 when using DirectWrite, and GTK (except on Win32 and macOS). The Pango library used for text by GTK is not supposed to be thread-safe on Win32.

Do you have literature on this limitation? I'm very tempted to test this on both Win32 and macOS.

Cheers,
Mitchell

Mitchell

unread,

Feb 2, 2022, 1:42:36 PM2/2/22

to scite-i...@googlegroups.com

Never mind, I saw the gitlab link in the ticket.

> --
> You received this message because you are subscribed to the Google Groups "scite-interest" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scite-interes...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/scite-interest/20220202132833.f76a5d999de18a358f69b7e9%40triplequasar.com.

Cheers,
Mitchell

Neil Hodgson

unread,

Feb 2, 2022, 5:49:35 PM2/2/22

to scite-i...@googlegroups.com

Mitchell:

> I'm very tempted to test this on both Win32 and macOS.

There’s some possibility of it working on macOS as all Unixes have more in common than they do with Win32. I haven’t seen a definitive statement on macOS multithreaded Pango like the one about Win32 from the main Pango developer Behdad Esfahbod.

The technique used for multi-threading Pango is to use thread-local resources and avoid locking. macOS and Linux share more threading infrastructure (like pthreads) so macOS may be able to reuse the Linux work unless it uses Linux-specific APIs. OTOH, macOS GTK seems to get even less attention than Win32 GTK.

Neil

Reply all

Reply to author

Forward