representations for non-ASCII code points. Scintilla converts the
>
> -Brian
>
>> On May 24, 2016, at 11:04 PM, Brian Griffin <
bgriffin...@gmail.com> wrote:
>>
>>>
>>> On May 24, 2016, at 10:50 PM, Lex Trotman <
ele...@gmail.com> wrote:
>>>
>>> On 25 May 2016 at 15:03, Brian Griffin <
bgriffin...@gmail.com> wrote:
>>>> On May 24, 2016, at 9:51 PM, Paul K <
paulc...@gmail.com> wrote:
>>>>
>>>>
>>>> Hi Brian,
>>>>
>>>>> From the results I'm seeing, it appears as if Scintilla does not really
>>>>> understand utf8 strings. What am I missing? Is there more detailed
>>>>> documentation on string handling and interpretation in Scintilla?
>>>>
>>>> I think you need to be more specific about what you are trying to do and the
>>>> results you see (and how they deviate from the expected results).
>>>>
>>>>
>>>> When adding utf8 strings containing multi-byte codes (anything outside of
>>>> simple latin), the indexing no longer works and deleting characters deletes
>>>> the individual bytes instead of the whole character. Of course one part of a
>>>> character is removed, the rendering goes sour.
>>>
>>> This is the documented behaviour, see
>>>
http://www.scintilla.org/ScintillaDoc.html#TextRetrievalAndModification.
>>> In particular be aware that the term "character" throughout the
>>> documentation means "byte" not code point. Indexing is by bytes, and
>>> UTF-8 uses more than one byte for code points over 127.
>>
>> That's all well and good, but what does SC_CP_UTF8 mean in that context. UTF8 is (variable) multibyte. If Scintilla is only bytes, then how does it support UTF8?
>>
>> -Brian
>