[feature-requests:#1569] Unicode 16.0
Status: open
Group: Initial
Labels: unicode
Created: Sun Oct 19, 2025 12:45 AM UTC by Zufu Liu
Last Updated: Sun Oct 19, 2025 12:45 AM UTC
Owner: Neil Hodgson
Attachments:
Unicode data can be updated to Unicode 16.0 (Python 3.14) or 17.0 (Python 3.15 alpha 1).
Download Windows embeddable package (64-bit) from https://www.python.org/downloads/windows/, add scintilla\scripts into python314._pth (or python315._pth), then run generating scripts using the new python.exe.
size forsymmetricCaseConversionRanges (in CaseConvert.cxx) can be reduced by half after merge range length/pitch (always less than 255) with lower/upper (max Unicode only requires 3 bytes):
(lower << 8, range length), (upper << 8, range pitch), e.g. 0x0061'1A,0x0041'01,.
Sent from sourceforge.net because scintill...@googlegroups.com is subscribed to https://sourceforge.net/p/scintilla/feature-requests/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/scintilla/admin/feature-requests/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
[feature-requests:#1569] Unicode 16.0
Status: open
Group: Initial
Labels: unicode Scintilla lexilla
Committed data changes with [5ce570] and also in Lexilla with https://github.com/ScintillaOrg/lexilla/commit/2362ea7cb2066608b59c1a16e46a0864639b4307
The main part of the script updates is replacing 4-int tuple symmetric cases with 2-int tuples by joining bits together. To me, this is making the code more difficult to understand and change with only a small payback in size.
There are some other minor changes that may be beneficial.
The first is hoisting and simplifying the surrogate test. However, it only succeeds for low surrogates and lets high surrogates through. This actually doesn't matter as Python is OK with lone surrogates and no surrogates have upper, lower, or fold counterparts. The surrogate test isn't needed - I expect I thought it was incorrect to try lone surrogates.
There are some uses of more explicit formatting and more descriptive variable names that are positive.
[feature-requests:#1569] Unicode 16.0
Status: accepted
Group: Initial
Labels: unicode Scintilla lexilla
it only succeeds for low surrogates and lets high surrogates through.
copy & paste error, correct would be 0xD800 <= ch <= 0xDFFF (merge the two tests or from UniConversion.h).
The surrogate test isn't needed
Indeed, result is same.
There are some uses of more explicit formatting
These (string concatenation or percent formatting) could be found with pylint, e.g.
C0209: Formatting a regular string which could be an f-string (consider-using-f-string)
[feature-requests:#1569] Unicode 16.0
Status: accepted
Group: Initial
Labels: unicode Scintilla lexilla
Created: Sun Oct 19, 2025 12:45 AM UTC by Zufu Liu
Last Updated: Tue Nov 04, 2025 12:04 AM UTC
Owner: Neil Hodgson
Attachments: