gerry
unread,Sep 13, 2011, 10:02:56 AM9/13/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to XTF Developer list
We are trying to build a digital library consisting of Tibetan
Buddhist literature all entered using Tibetan Unicode. We have tried
using XTF and are quite impressed with its capabilities.
We have noticed one issue that we would like help with. Tibetan text
uses a special intersyllabic character called a tsheg. It looks like
a little triangle. Each Tibetan word consists of one or more syllables
divided by tshegs. There is no space character to delimit word
boundaries in Tibetan and words are recognized through context.
Tibetan sentences consist of a number of syllables strung together
using tshegs -- again there are no visible spaces. Here is the Unicode
table information for the tsheg character.
U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG
When we try to use the search function in xtf, we notice that it will
not recognize partial words. For example, in English it recognizes
africa but will not recognize the leading letters afr. When we try to
search for Tibetan words, since the space character is missing, the
search typically ends up failing.
Is there a way that we can configure (or modify) XTF so that it will
recognize the
tsheg character as a word boundary character? This will allow us to
search in Tibetan texts for a string of syllables that makes up a
Tibetan word.
Thanks for any help you can offer.
--Gerry Wiener