Better caret placement for ligatures and Thai & Indic languages with DirectWrite

89 views
Skip to first unread message

Neil Hodgson

unread,
Nov 16, 2015, 1:59:24 AM11/16/15
to Scintilla mailing list
   A recent message from ’sn' bought up the topic of ligatures.

   Here is a picture of text drawn in the Candara font by DirectWrite which substitutes ligatures for some character sequences such as “ffi”. There are two copies of SciTE with the left being the current release and the right containing some changes.

   Line 1 shows some normal Roman text where each character is drawn independently.
   Line 2 has some sequences where some characters are joined together automatically by substituting ligatures. DirectWrite will only do this when the font supports it. In English, ligatures are a slightly nicer way to draw some character sequences but are not generally regarded as necessary. In other languages, they are more important.
   Line 3 is the Devanagari ddhrya-ligature (द् + ध् + र् + य = द्ध्र्य) used for Sanskrit as mentioned on the Wikipedia page about ligatures.
   Line 4 is an emoji for Cyclone and was included to ensure emoji wasn’t broken by the code changes.
   Line 5 is Thai. In Thai, vowels may appear above, below, left, or right of their consonant but commonly appear above. So the cap is the vowel and the hook-with-ring is the consonant.


   The problem is that Scintilla has been using the position at the end of the cluster (ligature) as the end position of each character in the ligature. Thus when the caret is before ‘ffi’, pushing the right arrow moves the caret after the ‘i’. Right again and the caret stays in the same location and so on for one more right arrow. This can be confusing and makes it difficult to select individual characters.

   A possible solution is to divide the cluster width up evenly between all the characters so that pressing cursor keys can be seen.  This may not be completely accurate in English since ‘f’ will be wider than ‘i’ but its close enough.

   For Thai, it would be great if the character area was divided vertically and a horizontal caret appear between the consonant and vowel but this would be very difficult to implement and I don’t know of any editors that can do this. Moving the caret horizontally into the character area can help.

   This blog entry talks about the issue although it mentions further refinements for Thai and Indic that I haven’t considered fully and which would require deeper changes in Scintilla.

   The red boxes in the illustration are from a script that places an indicator behind every second character. The caret can be positioned on either side of every box. The right window is changed to apportion space evenly between the characters and, to me, appears much better.

   GDI on Windows does not support ligatures so should have no problems. Other platforms may already be returning partial cluster positions (IIRC GTK+ does this) but there may be more work to do on others. Cocoa doesn’t handle the Thai or Sanskrit text the same as the new DirectWrite code: there are 2 internal positions in the Sanskrit instead of 6, no middle position for the Thai, and it doesn’t use ligatures for English although I haven’t tried using many fonts.

   The text appearing is

abcdefghij
ffi fj fl ff ffl fh fb
द्ध्र्य
🌀
อี

   The code to set the indicators from Lua in SciTE is:

function onOff()
       editor.IndicatorCurrent = 3
       editor.IndicStyle[3] = INDIC_ROUNDBOX
       editor.IndicFore[3] = 0x0000A0
       local lengthDocument = editor.Length
       editor:IndicatorClearRange(0, lengthDocument)
       local i = 0
       while i < lengthDocument do
               local a = editor:PositionAfter(i)
               local b = editor:PositionAfter(a)
               editor:IndicatorFillRange(i,a-i)
               i = b
       end
end

   Attached is a patch with the proposed code changes.

   Neil
clusters.patch

Neil Hodgson

unread,
Nov 20, 2015, 7:47:40 PM11/20/15
to scintilla...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages