[scintilla:feature-requests] #1569 Unicode 16.0

Zufu Liu

unread,

Oct 18, 2025, 8:45:42 PMOct 18

to scintill...@googlegroups.com

[feature-requests:#1569] Unicode 16.0

Status: open
Group: Initial
Labels: unicode
Created: Sun Oct 19, 2025 12:45 AM UTC by Zufu Liu
Last Updated: Sun Oct 19, 2025 12:45 AM UTC
Owner: Neil Hodgson
Attachments:

CaseConvert-1019.diff (2.9 kB; text/plain)

Unicode data can be updated to Unicode 16.0 (Python 3.14) or 17.0 (Python 3.15 alpha 1).

Download Windows embeddable package (64-bit) from https://www.python.org/downloads/windows/, add scintilla\scripts into python314._pth (or python315._pth), then run generating scripts using the new python.exe.

size forsymmetricCaseConversionRanges (in CaseConvert.cxx) can be reduced by half after merge range length/pitch (always less than 255) with lower/upper (max Unicode only requires 3 bytes):
(lower << 8, range length), (upper << 8, range pitch), e.g. 0x0061'1A,0x0041'01,.

Sent from sourceforge.net because scintill...@googlegroups.com is subscribed to https://sourceforge.net/p/scintilla/feature-requests/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/scintilla/admin/feature-requests/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.

Zufu Liu

unread,

Oct 18, 2025, 8:46:04 PMOct 18

to scintill...@googlegroups.com

labels: unicode --> unicode, Scintilla, lexilla

[feature-requests:#1569] Unicode 16.0

Status: open
Group: Initial
Labels: unicode Scintilla lexilla

Neil Hodgson

unread,

Nov 3, 2025, 7:04:37 PMNov 3

to scintill...@googlegroups.com

status: open --> accepted
Comment:

Committed data changes with [5ce570] and also in Lexilla with https://github.com/ScintillaOrg/lexilla/commit/2362ea7cb2066608b59c1a16e46a0864639b4307

The main part of the script updates is replacing 4-int tuple symmetric cases with 2-int tuples by joining bits together. To me, this is making the code more difficult to understand and change with only a small payback in size.

There are some other minor changes that may be beneficial.

The first is hoisting and simplifying the surrogate test. However, it only succeeds for low surrogates and lets high surrogates through. This actually doesn't matter as Python is OK with lone surrogates and no surrogates have upper, lower, or fold counterparts. The surrogate test isn't needed - I expect I thought it was incorrect to try lone surrogates.

There are some uses of more explicit formatting and more descriptive variable names that are positive.

[feature-requests:#1569] Unicode 16.0

Status: accepted

Group: Initial
Labels: unicode Scintilla lexilla

Zufu Liu

unread,

Nov 4, 2025, 5:28:42 AMNov 4

to scintill...@googlegroups.com

it only succeeds for low surrogates and lets high surrogates through.

copy & paste error, correct would be 0xD800 <= ch <= 0xDFFF (merge the two tests or from UniConversion.h).

The surrogate test isn't needed

Indeed, result is same.

There are some uses of more explicit formatting

These (string concatenation or percent formatting) could be found with pylint, e.g.

C0209: Formatting a regular string which could be an f-string (consider-using-f-string)

[feature-requests:#1569] Unicode 16.0

Status: accepted

Group: Initial
Labels: unicode Scintilla lexilla

Created: Sun Oct 19, 2025 12:45 AM UTC by Zufu Liu

Last Updated: Tue Nov 04, 2025 12:04 AM UTC
Owner: Neil Hodgson
Attachments:

Reply all

Reply to author

Forward