[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Thu Jul 03, 2025 02:05 PM UTC
Owner: Neil Hodgson
Attachments:
Some ideas to speed up CreateFoldMap() in ScintillaWin.cxx.
1. WideCharLenFromMultiByte + WideCharFromMultiByte can be change to a single call ::MultiByteToWideChar(codePage, MB_ERR_INVALID_CHARS, pair.chars, 2, codePoint, _countof(codePoint));, where codePoint has size 2 or 4 (each byte converted to two surrogate pairs).
2. MultiByteFromWideChar can be called with std::wstring_view(wFolded, charsConverted) to eliminate the temporary string.
3. CaseFolderDBCS::Fold() should output ch and ch2 when pair[0] is zero (unconverted, e.g. ch2 is not trail byte):
const char *pair = foldingMap.at(ind).chars;
if (pair[0]) {
folded[lenOut++] = pair[0];
folded[lenOut++] = pair[1];
} else {
folded[lenOut++] = ch;
folded[lenOut++] = ch2;
}
Sine DBCS encoding has only hundred (100 - 200) case sensitive characters (Latin, Greek, Cyrillic, etc.), further optimization could only handle blocks (lead bytes) that has case sensitive characters.
for lead byte generated with attached script:
std::initializer_list<unsigned char> leadBytes;
switch (codePage) {
case cp932:
leadBytes = {0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0xFA};
break;
case cp936:
leadBytes = {0xA2, 0xA3, 0xA6, 0xA7};
break;
case cp949:
leadBytes = {0xA1, 0xA3, 0xA5, 0xA7, 0xA8, 0xA9, 0xAC};
break;
case cp950:
leadBytes = {0x88, 0xA2, 0xA3, 0xC7, 0xC8};
break;
default:
leadBytes = {0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE};
break;
}
for (const unsigned char byte1 : leadBytes) {
global CodePageToFoldMap cpToFoldMap; may has problem as std::map is not nothrow default constructable (Visual C++ has a warning for it). it could be replaced with std::vector as only has max 5 entries (most time only one - the system ANSI code page). when use above leadBytes code, CreateFoldMap() is very fast, it may not worth to cache the result globally.
Sent from sourceforge.net because scintill...@googlegroups.com is subscribed to https://sourceforge.net/p/scintilla/feature-requests/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/scintilla/admin/feature-requests/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Due to vacation, it's unlikely I will progress new issues before mid-September.
At around 4ms on an older machine, FoldMap creation time doesn't appear to be significant. Caching avoids memory cost of multiple allocations.
As here are only five DBCS code pages, for most application only one (the system DBCS code page for East Asian) is active used,
I think CodePageToFoldMap can be changed to std::vector<std::pair<int, FoldMap>>.
std::vector is noexcept default constructible but std::map is not, using std::vector may benefit load/unload Scintilla.dll (avoid running complex constructor/destructor?).
For Notepad4, I just put FoldMap into CaseFolderDBCS, I think 64KB extra memory per document is not a thing, open 1000 documents in DBCS is just 64MB.
The size for FoldMap could be reduced, as the script DBCSFoldMap.py shows, for each DBCS code page, only at most eight blocks (lead bytes) contains case sensitive characters (less than 200 in total).
struct FoldMap {
// offset to blockList, indexed by lead byte, if the value is zero, then no case mapping.
uint16_t offset[128];
// each block has 256 DBCSPair, at most eight blocks.
std::array<DBCSPair, 8*256> blockList;
//std::vector<DBCSPair> blockList;
// index = offset[lead & 0x7f];
// mapping = index ? blockList[index - 1 + trail] : zero;
};
CreateFoldMap() could be optimized (cp932 is 2x faster than other code pages):
1. merge the two MultiByteToWideChar() calls, since the expected result length is 1, a buffer of 2 or 4 is enough to catch all failure.
2. write back to foldingMap[index] can be delayed only when here is case mapping, zero means no change.
3. change char chars[2]; to char chars[2]{}; ensure FoldMap is zero initialized, no read of uninitialized when ch2 = mixed[i]; isn't trail byte.
Attachments:
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Thu Jul 03, 2025 10:46 PM UTC
Owner: Neil Hodgson
Attachments:
std::vectoris noexcept default constructible butstd::mapis not
Throwing inside DllMain may be unpleasant although its unlikely. Possibly from memory exhaustion. A reasonable implementation of std::vector<std::pair<int, FoldMap>> would be OK.
to
char chars[2]{};ensureFoldMapis zero initialized
If I'm understanding code behaviour correctly, the std::map inserter is ensuring initialization to zeros and other uses provide values. It wouldn't hurt to be more explicit.
Other changes are less of a concern or apparent benefit to me.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Tue Sep 30, 2025 10:38 PM UTC
Owner: Neil Hodgson
Attachments:
Change CodePageToFoldMap to std::vector (but with custom structure other than std::pair to avoid create large temporary on stack).
Other changes have minor speed or code size optimization,
CreateFoldMap():
https://github.com/zufuliu/notepad4/blob/main/scintilla/win32/ScintillaWin.cxx#L3076
CaseFolderDBCS::Fold():
https://github.com/zufuliu/notepad4/blob/main/scintilla/win32/ScintillaWin.cxx#L3122
and when ch2 not trail byte:
const char *pair = foldingMap.at(ind).chars;
if (pair[0]) {
folded[lenOut++] = pair[0];
folded[lenOut++] = pair[1];
} else {
folded[lenOut++] = ch;
folded[lenOut++] = ch2;
}
Attachments:
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Wed Nov 05, 2025 09:52 AM UTC
Owner: Neil Hodgson
Attachments:
std::mapis not nothrow default constructable
Seems only has problem for MSVC, see https://godbolt.org/z/3EGa5TMaT, the default constructor (dynamic initializer for cpToFoldMap_V0) does a 64 KB allocation, while libc++ and libstdc++ just zero some bytes. it's some sort of cost for app not even use DBCS encoding.
In CaseFolderDBCS() constructor:
if (!DBCSHasFoldMap(cp)) {
CreateFoldMap(cp, DBCSGetMutableFoldMap(cp));
}
foldingMap = DBCSGetFoldMap(cp);
The three lookups can be merged into one, e.g. by declare CreateFoldMap() as extern or pass it to DBCSGetFoldMap() function, e.g.
const FoldMap *DBCSGetFoldMap(int codePage, CreateFoldMapFunc CreateFoldMap) {
// Constructs if needed
const auto [it, inserted] = cpToFoldMap.try_emplace(codePage);
if (inserted) {
CreateFoldMap(codePage, &it->second);
}
return &it->second;
}
CodePageToFoldMap can also be changed to std::unique_ptr<std::map<int, FoldMap>> to avoid default initialization allocation.
Attachments:
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Wed Nov 05, 2025 12:54 PM UTC
Owner: Neil Hodgson
Attachments:
Can't strongly recall the reasoning for the convoluted API but it was likely from wanting a clear way to add thread safety with locks if needed for background searching. Sharing implementation between platforms was also a goal.
Adding another test inside CaseFolderDBCS::Fold would likely have worse performance than filling in those entries.
Since the loop is 1:1 input:output, and the output is commonly conservatively over-allocated, the bounds checks can probably be moved outside the loop. There's a minor hassle with an odd final lead byte but that could possibly be handled after the loop.
One benefit of FoldMap is that the core behaviour, folding DBCS text can be made platform-independent even when the creation is per-platform. It may be possible to expose FoldMap to the DBCS branch of Document::FindText in a way that the compiler will be able to inline.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Thu Nov 06, 2025 10:56 AM UTC
Owner: Neil Hodgson
Attachments:
Since the loop is 1:1 input:output, and the output is commonly conservatively over-allocated, the bounds checks can probably be moved outside the loop. There's a minor hassle with an odd final lead byte but that could possibly be handled after the loop.
pcf->Fold() is only used inside Document::FindText(), and is over-allocated, boundary checks inside Fold() can all be omitted.
constexpr size_t maxFoldingExpansion = 4; for DBCS can also be omitted. for UTF-8, maybe it can be reduced, e.g. use maxExpansionCaseConversion instead?
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Sun Nov 23, 2025 11:29 PM UTC
Owner: Neil Hodgson
Attachments:
FoldMap only covers cases where folding produces the same length as input. This wasn't certain before FoldMap was implemented but it turned out to be true in all tested successful cases. While the DBCS encodings include Greek letters, there are only the basic letters, not the more complex cases.
pcf->Fold() is only used inside Document::FindText(), and is over-allocated, boundary checks inside Fold() can all be omitted.
Still want some tests in case it is called in other circumstances. Maybe protect against lenMixed>sizeFolded as a precondition. Still needs a check somewhere for an incomplete final lead byte.
@@ -2862,22 +2862,25 @@
size_t CaseFolderDBCS::Fold(char *folded, size_t sizeFolded, const char *mixed, size_t lenMixed) {
// This loop outputs the same length as input as for each char 1-byte -> 1-byte; 2-byte -> 2-byte
+ assert(lenMixed <= sizeFolded);
size_t lenOut = 0;
for (size_t i = 0; i < lenMixed; i++) {
- const ptrdiff_t lenLeft = lenMixed - i;
const char ch = mixed[i];
- if ((lenLeft >= 2) && DBCSIsLeadByte(cp, ch) && ((lenOut + 2) <= sizeFolded)) {
+ if (((i+1) < lenMixed) && DBCSIsLeadByte(cp, ch)) {
i++;
const char ch2 = mixed[i];
const uint16_t ind = DBCSIndex(ch, ch2);
const char *pair = foldingMap->at(ind).chars;
+ assert(pair[0]);
+ assert(pair[1]);
folded[lenOut++] = pair[0];
folded[lenOut++] = pair[1];
- } else if ((lenOut + 1) <= sizeFolded) {
+ } else {
const unsigned char uch = ch;
folded[lenOut++] = mapping[uch];
}
}
+ assert(lenOut == lenMixed);
return lenOut;
}
Can handle the case where the byte after a lead byte isn't a trail byte by moving code inside CreateFoldMap before the DBCSIsTrailByte test:
@@ -2813,10 +2813,10 @@
if (DBCSIsLeadByte(codePage, ch1)) {
for (unsigned char byte2 = highByteLast; byte2 >= minTrailByte; byte2--) {
const char ch2 = byte2;
+ const DBCSPair pair{ ch1, ch2 };
+ const uint16_t index = DBCSIndex(ch1, ch2);
+ (*foldingMap)[index] = pair;
if (DBCSIsTrailByte(codePage, ch2)) {
- const DBCSPair pair{ ch1, ch2 };
- const uint16_t index = DBCSIndex(ch1, ch2);
- (*foldingMap)[index] = pair;
const std::string_view svBytes(pair.chars, 2);
const int lenUni = WideCharLenFromMultiByte(codePage, svBytes);
if (lenUni == 1) {
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Mon Nov 24, 2025 10:43 AM UTC
Owner: Neil Hodgson
Attachments:
BothFold() and FindText() loops don't check DBCS trail byte (unlike UTF-8), filling foldingMap with any value for invalid bytes should not mater, invalid bytes rarely match.
A more correct/robust implement would like UTF-8 CaseConvertString():
Character has no conversion so copy the input to output.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Mon Nov 24, 2025 10:29 PM UTC
Owner: Neil Hodgson
Attachments:
not the more complex cases.
Some code pages (949, 1361) contains complex cases, e.g. ß to ss, but the output is still same 2-bytes length.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Tue Nov 25, 2025 10:16 AM UTC
Owner: Neil Hodgson
Attachments:
Improved behaviour when lead byte followed by non trail byte with [2f4c2f].
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Tue Nov 25, 2025 02:31 PM UTC
Owner: Neil Hodgson
Attachments:
The condition could be simplified by ensure mixed is NUL-terminated (const char *search parameter) or larger than lenMixed (local const char bytes).
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Wed Nov 26, 2025 09:47 PM UTC
Owner: Neil Hodgson
Attachments:
It may be possible to expose
FoldMapto the DBCS branch ofDocument::FindTextin a way that the compiler will be able to inline.
I tried following in Notepad4 but abandoned due to binary size increment (not figured out why).
1. move CaseFolderDBCS into CaseFolder.h.
2. cast pcf.get() to CaseFolderDBCS pointer.
3. use following method to replace bytes folding:
const char *FoldDualByte(unsigned char lead, unsigned char trail) const noexcept {
const uint16_t ind = DBCSIndex(lead, trail);
const char *pair = foldingMap.data()[ind].chars;
return pair;
}
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Mon Dec 01, 2025 11:42 AM UTC
Owner: Neil Hodgson
Attachments:
but abandoned due to binary size increment (not figured out why)
OK. Tracing some DBCS examples, I haven't seen the current Fold be prominent, so this may not be an area that will repay more effort.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Mon Dec 01, 2025 11:52 AM UTC
Owner: Neil Hodgson
Attachments:
Committed the Fold simplification with [750626].
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Wed Dec 03, 2025 06:20 AM UTC
Owner: Neil Hodgson
Attachments:
Commit [8b3f2c] uses a vector for FoldMaps. Similar to patches above but without passing a function since that's a technique that has made evolution more difficult in the past.
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Wed Dec 03, 2025 10:32 PM UTC
Owner: Neil Hodgson
Attachments:
Yeah, pass function is good. not sure whether it has security or related problem, so it's declared as reference in the patch.
using CreateFoldMapFunc = void (&)(int codePage, FoldMap *foldingMap);
[feature-requests:#1564] Optimize CreateFoldMap
Status: open
Group: Initial
Labels: Scintilla encoding dbcs win32
Created: Thu Jul 03, 2025 02:05 PM UTC by Zufu Liu
Last Updated: Thu Dec 04, 2025 02:23 AM UTC
Owner: Neil Hodgson
Attachments: