changing search behavior in XTF for Tibetan Unicode

8 views
Skip to first unread message

gerry

unread,
Sep 13, 2011, 10:02:56 AM9/13/11
to XTF Developer list
We are trying to build a digital library consisting of Tibetan
Buddhist literature all entered using Tibetan Unicode. We have tried
using XTF and are quite impressed with its capabilities.

We have noticed one issue that we would like help with. Tibetan text
uses a special intersyllabic character called a tsheg. It looks like
a little triangle. Each Tibetan word consists of one or more syllables
divided by tshegs. There is no space character to delimit word
boundaries in Tibetan and words are recognized through context.
Tibetan sentences consist of a number of syllables strung together
using tshegs -- again there are no visible spaces. Here is the Unicode
table information for the tsheg character.

U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG

When we try to use the search function in xtf, we notice that it will
not recognize partial words. For example, in English it recognizes
africa but will not recognize the leading letters afr. When we try to
search for Tibetan words, since the space character is missing, the
search typically ends up failing.

Is there a way that we can configure (or modify) XTF so that it will
recognize the
tsheg character as a word boundary character? This will allow us to
search in Tibetan texts for a string of syllables that makes up a
Tibetan word.

Thanks for any help you can offer.

--Gerry Wiener
Reply all
Reply to author
Forward
0 new messages