I'm also interested to know what is the longest unbroken sequence of
joining letters -- what I would call a lettergroup and Tom Milo calls a
fusion-- in Persian.
This is to assist me in working out maximum context lengths for some
OpenType Layout features. Thanks.
JH
--
Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com
Car le chant bien plus que l'association d'un texte
et d'une mélodie, est d'abord un acte dans lequel
le son devient l'expression d'une mémoire, mémoire
d'un corps immergé dans le mouvement d'un geste
ancestral. - Marcel Pérès
Well, it depends a lot on your orthography. Certain orthographies, for
example, say that numbers should be joined with a non-joiner-if-needed
if they are followed by the suffix ساله (-years-old). So considering
the 10^15 limit set for spelling numbers in the Persian locale
requirements document here
<http://www.farsiweb.ir/mediawiki/images/3/3b/Locale-12.pdf>, a rather
long example in such an orthography could be something like:
چهارصدوپنجاهوچهارهزاروچهارصدوپنجاهوچهارمیلیاردوچهارصدوپنجاهوچهارمیلیونوچهارصدوپنجاهوچهارهزاروچهارصدوپنجاهوچهارساله
That's 120 characters, six of which are ZWNJs.
> I'm also interested to know what is the longest unbroken sequence of joining
> letters -- what I would call a lettergroup and Tom Milo calls a fusion-- in
> Persian.
That needs some good parsing. But there are famous examples of
unbroken joining, like قسطنطنیه or نستعلیق. For Nastaliq exercise,
people like to join words that would make them hard to right, so they
could show off their skill. A famous examples is ایحجتخدا (joined
version of ای حجت خدا).
> This is to assist me in working out maximum context lengths for some
> OpenType Layout features. Thanks.
I suggest not limiting your font if possible. It's sad to see a new
word getting invented and not being able to get rendered in a font.
Roozbeh
>> I'm also interested to know what is the longest unbroken sequence of joining
>> letters -- what I would call a lettergroup and Tom Milo calls a fusion-- in
>> Persian.
> That needs some good parsing. But there are famous examples of
> unbroken joining, like قسطنطنیه or نستعلیق. For Nastaliq exercise,
> people like to join words that would make them hard to right, so they
> could show off their skill. A famous examples is ایحجتخدا (joined
> version of ای حجت خدا).
Thanks. These are helpful.
>> This is to assist me in working out maximum context lengths for some
>> OpenType Layout features. Thanks.
> I suggest not limiting your font if possible. It's sad to see a new
> word getting invented and not being able to get rendered in a font.
Some kind of limit is unavoidable with the kind of things I am doing.
OpenType contextual lookup structure is pretty simplistic. What I try to
do is ensure that my contexts exceed anticipated real-world maximums.
Oh, I just found the most famous example:
منمشتعلعشقعلیمچهکنم
That's 19 letters. People try to write that in Nastaliq to show
mastery of the art. It is joint version of:
من مشتعل عشق علیم چه کنم
which means: "I am burning in Ali's love, what should I do?" The word
is even mentioned in the Persian Wikipedia in its joint form.
I also went through the whole Persian wikipedia to see what I can
find. I used Hooman's مستضعفینی (9-letter) as the starting point,
searching for continuous joint sequences of at least 10 letters. I
couldn't find anything longer that was Persian. But I found these from
other languages written in the Arabic script.
From Arabic (all 10-letter):
ليستخلفنهم (Perhaps the longest such sequence in the Koran. From 24:55.)
الفلسطينيين (The Palestinians)
القسطنطينية (Constantinople)
From Uighur (11-letter):
لىختېنشتېين (Liechtenstein)
ئېينشتېينىي (Einsteinium)
فىلىپىلىكلەرگە (Philippians?)
From Azerbaijani (11-letter):
آلتمیشینجیسی (sixtieth)
Hoping these help. (Well, at least I enjoyed finding them :))
Roozbeh
فلسطینیها (Palestinians)
فمینیستها (Feminists)
قسطنطینیه (a leser used spelling of Constantinople)
مکسیلیتین (Mexiletine)
نیهیلیستی (Nihilistic)
کلیتمنسترو (Clytemnestra)
سیمپلیسیوس (Simplicius)
And a few others...
Roozbeh
فیلیپینیها (Filipinos)
Roozbeh
We are talking joint sequences here. فیلیپینیهایی is a 10-letter joint
sequnce plus a 2-letter one. No improvement.
Roozbeh
Oh, that's what John was originally asking for. In his own words: "the
longest unbroken sequence of joining letters". Technically a sequence
of dual-joining letters ending in a dual-joining or a right-joining
letter.
Roozbeh
Roozbeh, thank you in particular for finding the sample words from other
languages. I wonder if you, or anyone else, has a source for long words
in Urdu?
Using common techniques of lengthening, it could be elongated a bit:
اگزیستانسیالیستهایتان
Sample sentence for Connie (!):
شما هم بروید با این اگزیستانسیالیستهایتان! گندش را در آوردهاند!
Roozbeh
Running my joint sequence tool on the Urdu Wikipedia, I get these
12-letter joint sequences:
اسپیسیفیکیشنز (specifications)
کمپیٹیبیلیٹی (compatibility)
Also, I have a feeling there are longer joint sequences to be found
for Turkic languages written in Arabic (Uighur, Azerbaijani, Turkmen,
Kazakh, Uzbek, Kirghiz, ...) because of the grammatical nature of
these languages.
Roozbeh
For example, these are 15-letter joint sequences from the Uighur
wikipedia. I don't know their meanings, but they appear valid:
بىسىمئىشلىتىشىدە
يىغىننىڭيېپىلىش
Roozbeh
> On Sat, Aug 7, 2010 at 11:05 AM, John Hudson <jo...@tiro.ca> wrote:
>> Roozbeh, thank you in particular for finding the sample words from other
>> languages. I wonder if you, or anyone else, has a source for long words in
>> Urdu?
> Running my joint sequence tool on the Urdu Wikipedia, I get these
> 12-letter joint sequences:
> اسپیسیفیکیشنز (specifications)
> کمپیٹیبیلیٹی (compatibility)
Excellent. Many thanks.
> Also, I have a feeling there are longer joint sequences to be found
> for Turkic languages written in Arabic (Uighur, Azerbaijani, Turkmen,
> Kazakh, Uzbek, Kirghiz, ...) because of the grammatical nature of
> these languages.
Quite possibly. My immediate needs are only Arabic, Persian and Urdu.
ZWNJ before "haa"?
b
kthxbye
behdad
Well, it depends on the orthography, as I said in my first reply. Some
orthographies require this to be connected (like Iran University
Press's, I believe), and some prefer it disconnected (I think the most
official orthography, the Persian Academy's, would require a ZWNJ
there since the word would be spelled with too many "teeth"
otherwise).
Roozbeh
--
--
http://persian-computing.org/
http://groups.google.com/group/persian-computing/
---
You received this message because you are subscribed to the Google Groups "Persian Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to persian-comput...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.