A branch of current Scintilla implementing Unicode line ends and substyles (rainbow identifiers) can be found at
https://bitbucket.org/nyamatongwe/unicodelineends
I intend to commit these to the main repository soon after 3.2.4 is released. These features will initially be 'provisional'. That is, the API they present may change before it becomes permanent. The features may even be removed if there is a major problem. Applications that wish to avoid using provisional APIs will be able to define a preprocessor symbol that will hide the API definitions.
This is new code so there may be bugs. Each API is documented in ScintillaDoc but the documentation is, as always, brief.
The branch contains 16 commits with the first 11 for Unicode line ends and the last 5 for substyles. The two features touch closely related pieces of code so are not truly independent. Some of this is deliberate as they both add methods to ILexer and I didn't want to support more variants of ILexer than necessary. Each commit should build and run. Committing in increments should make it easier to check for correctness. When these are committed to the mainline, there may be some reordering and merging of commits.
The most likely change to cause trouble is that StyleContext now decodes all the bytes in a UTF-8 encoded character as one character instead of as multiple bytes. This should make it easier for lexers to treat particular non-ASCII characters as syntactically significant.
A SciTE patch to allow experimentation with substyles is attached. A set of properties for this is:
unicode.line.ends=1
substyles.cpp.11=2
substylewords.11.1.$(file.patterns.cpp)=CharacterSet LexAccessor SString WordList
substylewords.11.2.$(file.patterns.cpp)=std map string vector
style.cpp.11.1=fore:#AA00EE
style.cpp.11.2=fore:#EE00AA
style.cpp.75.1=$(style.cpp.75),fore:#663388
style.cpp.75.2=$(style.cpp.75),fore:#883366
substyles.cpp.17=1
style.cpp.17.1=$(style.cpp.17),fore:#00AAEE
substylewords.17.1.$(file.patterns.cpp)=random
Neil