Base Consonant:
Vowel Sign (Virama):
Consonant (Repha):
Dependent Vowel Sign:
In summary, the Unicode sequence for "श्री" involves a base consonant followed by a virama, another consonant that is rendered as a repha, and a dependent vowel sign. The UTF-8 encoding ensures each character is correctly represented in byte form, which the rendering engine interprets to display the correct combined character form.
I hope someone in reportlab is reading these forum posts and has an idea how to improve the rendering of these complex composite characters in some Asian languages. We do have some working like "កាំ" (kâm) - a sequence of a Base Consonant, a dependent vowel and a diacritical mark.
My interpretation was confirmed by Copilot - the Virama is used to conjunct consonants. Here is the shorter answer from Copilot:
The word “श्री” is a ligature used in the Devanagari script, which is used to write Hindi, Sanskrit, and several other South Asian languages. It’s a combination of two characters: “श” and “्री”.
In Unicode, each character has a unique identifier known as a code point. The Unicode code points for “श” and “्री” are as follows:
So, the sequence for “श्री” would be U+0936, U+094D, U+0930, U+0940. This sequence represents the individual characters that make up the ligature. When these Unicode points are rendered in the correct sequence, they form the ligature “श्री”.
Again, let's hope someone from reportlab reads these comments here.
Matthias