Hi Lou,
To put the editor in UTF-8 mode, you get an instance and apply:
And I didn't convert anything...it just goes into a mode where it interprets the bytes in that byte array correctly.
CwScintillaEditor>>setCodePage: 65001
"How about my idea (maybe crazy idea) to subtract something from each chunk (3 or 4 bytes) to bring it down to a range we can work with?"
That's not going to work. Except *mostly* for the first 128 chars, UTF-8 encoded bytes are not relatable to code pages. Again, its just a value
assigned by a group of folks, so you can't really do anything mathematical to it to accomplish much with regards to conversion.
The first 4 bytes in your example is the UTF-8 encoded value for Mathmatical Bold Capital M (
https://www.compart.com/en/unicode/U+1D40C)
It's UTF-8 value is different than its UTF-16 value which is different from its Unicode Scalar value. And they don't relate...its just a value.
"I am just trying to hack my way around it for my limited case until you guys do the real conversion."
- We won't be doing any conversion. As a first step, what we are doing is offering a UnicodeString container that could ingest those UTF-8 encoded bytes and do interesting things with them and showing them in our editor which we will switch to UTF-8 mode. This means a UnicodeString will need to convert itself to UTF-8 when it hands Scintilla bytes to display.
Next steps would be upgrading our CFS APIs to use them, followed by switching our APIS on Windows from narrow to wide APIs. At that point, UTF-8 would still require conversion to UTF-16 if you are going to be showing them in table cells that are Windows widgets. But the new Unicode support library gives you a first class container and plenty of easy APIs to make that feasible.
- Seth