Longest Persian word

4,969 views
Skip to first unread message

John Hudson

unread,
Aug 3, 2010, 6:57:57 PM8/3/10
to Persian Computing
Does anyone happen to know, or be able to suggest candidates for, the
longest word in the Persian language?

I'm also interested to know what is the longest unbroken sequence of
joining letters -- what I would call a lettergroup and Tom Milo calls a
fusion-- in Persian.

This is to assist me in working out maximum context lengths for some
OpenType Layout features. Thanks.

JH

--

Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com

Car le chant bien plus que l'association d'un texte
et d'une mélodie, est d'abord un acte dans lequel
le son devient l'expression d'une mémoire, mémoire
d'un corps immergé dans le mouvement d'un geste
ancestral. - Marcel Pérès

Roozbeh Pournader

unread,
Aug 4, 2010, 4:53:14 PM8/4/10
to John Hudson, Persian Computing
On Tue, Aug 3, 2010 at 3:57 PM, John Hudson <jo...@tiro.ca> wrote:
> Does anyone happen to know, or be able to suggest candidates for, the
> longest word in the Persian language?

Well, it depends a lot on your orthography. Certain orthographies, for
example, say that numbers should be joined with a non-joiner-if-needed
if they are followed by the suffix ساله (-years-old). So considering
the 10^15 limit set for spelling numbers in the Persian locale
requirements document here
<http://www.farsiweb.ir/mediawiki/images/3/3b/Locale-12.pdf>, a rather
long example in such an orthography could be something like:
چهارصدوپنجاه‌وچهارهزاروچهارصدوپنجاه‌وچهارمیلیاردوچهارصدوپنجاه‌وچهارمیلیون‌وچهارصدوپنجاه‌وچهارهزاروچهارصدوپنجاه‌وچهارساله

That's 120 characters, six of which are ZWNJs.

> I'm also interested to know what is the longest unbroken sequence of joining
> letters -- what I would call a lettergroup and Tom Milo calls a fusion-- in
> Persian.

That needs some good parsing. But there are famous examples of
unbroken joining, like قسطنطنیه or نستعلیق. For Nastaliq exercise,
people like to join words that would make them hard to right, so they
could show off their skill. A famous examples is ایحجتخدا (joined
version of ای حجت خدا).

> This is to assist me in working out maximum context lengths for some
> OpenType Layout features. Thanks.

I suggest not limiting your font if possible. It's sad to see a new
word getting invented and not being able to get rendered in a font.

Roozbeh

John Hudson

unread,
Aug 4, 2010, 10:17:31 PM8/4/10
to Roozbeh Pournader, Persian Computing
Roozbeh wrote:

>> I'm also interested to know what is the longest unbroken sequence of joining
>> letters -- what I would call a lettergroup and Tom Milo calls a fusion-- in
>> Persian.

> That needs some good parsing. But there are famous examples of
> unbroken joining, like قسطنطنیه or نستعلیق. For Nastaliq exercise,
> people like to join words that would make them hard to right, so they
> could show off their skill. A famous examples is ایحجتخدا (joined
> version of ای حجت خدا).

Thanks. These are helpful.

>> This is to assist me in working out maximum context lengths for some
>> OpenType Layout features. Thanks.

> I suggest not limiting your font if possible. It's sad to see a new
> word getting invented and not being able to get rendered in a font.

Some kind of limit is unavoidable with the kind of things I am doing.
OpenType contextual lookup structure is pretty simplistic. What I try to
do is ensure that my contexts exceed anticipated real-world maximums.

Connie Bobroff

unread,
Aug 6, 2010, 4:34:47 AM8/6/10
to John Hudson, Roozbeh Pournader, Persian Computing
There are also multi-word adjectives like "man-dar-aavorde" which are derived from actual *sentences* where the spaces between words change to ZWNJ when the sentence becomes one "word." (I put hyphens in this example.) There are many such examples taken from Arabic phrases and sentences as well as Persian. Of course, there are no rules about changing space to ZWNJ.
Also, we made some alphabet practice tests with "blocks" of letters

Behnam

unread,
Aug 6, 2010, 4:54:51 PM8/6/10
to Connie Bobroff, John Hudson, Roozbeh Pournader, Persian Computing
I think قسطنطنیه is a good example. The idea is a sequence of characters that each and every of them need to take a contextual shape. ZWNJ or characters such as alef or daal break this kind of sequence that John is looking for.
-b

Hooman Mehr

unread,
Aug 7, 2010, 2:58:36 AM8/7/10
to John Hudson, Persian Computing, Connie Bobroff, Roozbeh Pournader, Behnam
One of the cases which can produce some of the longest continuously joined sequences is any word that is an arabic plural (dual) noun with base form: مستفعلین.

One such word that became common for a while in Iran is مستضعفین. It could even accept a yah suffix to become مستضعفینی. Which is the longest sequence that I know that can actually legitimately appear in modern day Persian text. It is a sequence of nine joined letters.

- Hooman Mehr

Roozbeh Pournader

unread,
Aug 7, 2010, 5:45:14 AM8/7/10
to John Hudson, Persian Computing
On Wed, Aug 4, 2010 at 7:17 PM, John Hudson <jo...@tiro.ca> wrote:
> Thanks. These are helpful.

Oh, I just found the most famous example:
منمشتعلعشقعلیمچهکنم
That's 19 letters. People try to write that in Nastaliq to show
mastery of the art. It is joint version of:
من مشتعل عشق علیم چه کنم
which means: "I am burning in Ali's love, what should I do?" The word
is even mentioned in the Persian Wikipedia in its joint form.

I also went through the whole Persian wikipedia to see what I can
find. I used Hooman's مستضعفینی ‎(9-letter) as the starting point,
searching for continuous joint sequences of at least 10 letters. I
couldn't find anything longer that was Persian. But I found these from
other languages written in the Arabic script.

From Arabic (all 10-letter):
ليستخلفنهم ‎(Perhaps the longest such sequence in the Koran. From 24:55.)
الفلسطينيين (The Palestinians)
القسطنطينية (Constantinople)

From Uighur (11-letter):
لىختېنشتېين (Liechtenstein)
ئېينشتېينىي (Einsteinium)
فىلىپىلىكلەرگە (Philippians?)

From Azerbaijani (11-letter):
آلتمیشینجیسی (sixtieth)

Hoping these help. (Well, at least I enjoyed finding them :))

Roozbeh

Roozbeh Pournader

unread,
Aug 7, 2010, 6:06:27 AM8/7/10
to John Hudson, Persian Computing
... and some Persian words with the same joined sequence length as
مستضعفینی‎ (9-letter):

فلسطینیها (Palestinians)
فمینیستها (Feminists)
قسطنطینیه (a leser used spelling of Constantinople)
مکسیلیتین (Mexiletine)
نیهیلیستی (Nihilistic)
کلیتمنسترو (Clytemnestra)
سیمپلیسیوس (Simplicius)

And a few others...

Roozbeh

Roozbeh Pournader

unread,
Aug 7, 2010, 6:08:24 AM8/7/10
to John Hudson, Persian Computing
OK, here's a valid 10-letter one in Persian I just made:

فیلیپینیها (Filipinos)

Roozbeh

Connie Bobroff

unread,
Aug 7, 2010, 6:12:24 AM8/7/10
to Roozbeh Pournader, John Hudson, Persian Computing
Why not make it 12 letters?!
فیلیپینیهایی
'some Filipinos'

Roozbeh Pournader

unread,
Aug 7, 2010, 6:14:03 AM8/7/10
to Connie Bobroff, John Hudson, Persian Computing
On Sat, Aug 7, 2010 at 3:12 AM, Connie Bobroff <con...@gmail.com> wrote:
> Why not make it 12 letters?!
> فیلیپینیهایی
> 'some Filipinos'

We are talking joint sequences here. فیلیپینیهایی is a 10-letter joint
sequnce plus a 2-letter one. No improvement.

Roozbeh

Connie Bobroff

unread,
Aug 7, 2010, 6:16:52 AM8/7/10
to Roozbeh Pournader, John Hudson, Persian Computing
Please define "joint sequence!"

Roozbeh Pournader

unread,
Aug 7, 2010, 6:27:36 AM8/7/10
to Connie Bobroff, John Hudson, Persian Computing
On Sat, Aug 7, 2010 at 3:16 AM, Connie Bobroff <con...@gmail.com> wrote:
> Please define "joint sequence!"

Oh, that's what John was originally asking for. In his own words: "the
longest unbroken sequence of joining letters". Technically a sequence
of dual-joining letters ending in a dual-joining or a right-joining
letter.

Roozbeh

Connie Bobroff

unread,
Aug 7, 2010, 6:32:29 AM8/7/10
to Roozbeh Pournader, John Hudson, Persian Computing
Thanks. I somehow did not "see" the alef was breaking it up. Sorry!
Message has been deleted
Message has been deleted

John Hudson

unread,
Aug 7, 2010, 2:05:55 PM8/7/10
to Persian Computing
Thank you so much to everyone who assisted with my hunt. It is great to
have such good examples of long lettergroups, and also great to see that
my font handles them nicely. The font is still in development and under
NDA, but I'll be sure to post some specimens of these long words once I
am able to do so.

Roozbeh, thank you in particular for finding the sample words from other
languages. I wonder if you, or anyone else, has a source for long words
in Urdu?

Roozbeh Pournader

unread,
Aug 7, 2010, 2:37:35 PM8/7/10
to Sina Siadatnejad, Persian Computing
On Sat, Aug 7, 2010 at 4:12 AM, Sina Siadatnejad <sia...@gmail.com> wrote:
> اگزيستانسياليزم (existentialism)
> It was the longest I found among 21759 common dictionary words.

Using common techniques of lengthening, it could be elongated a bit:

اگزیستانسیالیستهایتان

Sample sentence for Connie (!):

شما هم بروید با این اگزیستانسیالیستهایتان! گندش را در آورده‌اند!‏

Roozbeh

Roozbeh Pournader

unread,
Aug 7, 2010, 2:42:46 PM8/7/10
to John Hudson, Persian Computing
On Sat, Aug 7, 2010 at 11:05 AM, John Hudson <jo...@tiro.ca> wrote:
> Roozbeh, thank you in particular for finding the sample words from other
> languages. I wonder if you, or anyone else, has a source for long words in
> Urdu?

Running my joint sequence tool on the Urdu Wikipedia, I get these
12-letter joint sequences:

اسپیسیفیکیشنز (specifications)
کمپیٹیبیلیٹی (compatibility)

Also, I have a feeling there are longer joint sequences to be found
for Turkic languages written in Arabic (Uighur, Azerbaijani, Turkmen,
Kazakh, Uzbek, Kirghiz, ...) because of the grammatical nature of
these languages.

Roozbeh

Roozbeh Pournader

unread,
Aug 7, 2010, 2:50:04 PM8/7/10
to John Hudson, Persian Computing
On Sat, Aug 7, 2010 at 11:42 AM, Roozbeh Pournader <roo...@gmail.com> wrote:
> Also, I have a feeling there are longer joint sequences to be found
> for Turkic languages written in Arabic (Uighur, Azerbaijani, Turkmen,
> Kazakh, Uzbek, Kirghiz, ...) because of the grammatical nature of
> these languages.

For example, these are 15-letter joint sequences from the Uighur
wikipedia. I don't know their meanings, but they appear valid:

بىسىمئىشلىتىشىدە
يىغىننىڭيېپىلىش

Roozbeh

John Hudson

unread,
Aug 7, 2010, 3:43:06 PM8/7/10
to Roozbeh Pournader, Persian Computing
Roozbeh wrote:

> On Sat, Aug 7, 2010 at 11:05 AM, John Hudson <jo...@tiro.ca> wrote:
>> Roozbeh, thank you in particular for finding the sample words from other
>> languages. I wonder if you, or anyone else, has a source for long words in
>> Urdu?

> Running my joint sequence tool on the Urdu Wikipedia, I get these
> 12-letter joint sequences:

> اسپیسیفیکیشنز (specifications)
> کمپیٹیبیلیٹی (compatibility)

Excellent. Many thanks.

> Also, I have a feeling there are longer joint sequences to be found
> for Turkic languages written in Arabic (Uighur, Azerbaijani, Turkmen,
> Kazakh, Uzbek, Kirghiz, ...) because of the grammatical nature of
> these languages.

Quite possibly. My immediate needs are only Arabic, Persian and Urdu.

Behdad Esfahbod

unread,
Aug 9, 2010, 5:24:21 PM8/9/10
to Roozbeh Pournader, John Hudson, Persian Computing
On 08/07/10 06:08, Roozbeh Pournader wrote:
> OK, here's a valid 10-letter one in Persian I just made:
>
> فیلیپینیها (Filipinos)

ZWNJ before "haa"?

b

Behdad Esfahbod

unread,
Aug 9, 2010, 5:25:38 PM8/9/10
to Roozbeh Pournader, John Hudson, Persian Computing
If you compile a list, that would be useful for testing Nastaliq fonts.

kthxbye
behdad

Roozbeh Pournader

unread,
Aug 9, 2010, 6:55:23 PM8/9/10
to Behdad Esfahbod, John Hudson, Persian Computing
On Mon, Aug 9, 2010 at 2:24 PM, Behdad Esfahbod <beh...@behdad.org> wrote:
>> فیلیپینیها (Filipinos)
>
> ZWNJ before "haa"?

Well, it depends on the orthography, as I said in my first reply. Some
orthographies require this to be connected (like Iran University
Press's, I believe), and some prefer it disconnected (I think the most
official orthography, the Persian Academy's, would require a ZWNJ
there since the word would be spelled with too many "teeth"
otherwise).

Roozbeh

erfan.f...@gmail.com

unread,
Nov 20, 2015, 1:44:24 PM11/20/15
to Persian Computing
I think you should consider that some may type meaning-less words, just for fun or a kid may play with keyboard!
So, any length of unbroken letters may be created. If you have in time rendering, it is a real problem.

I'm really interested to see the results of your works. Would you please show us?
thanks
Erfan

Nasser Hadjloo

unread,
Nov 21, 2015, 4:41:05 AM11/21/15
to erfan.f...@gmail.com, Persian Computing
Hi guys

In Dinkard, we have a word Amarenidarihatar we can pronaunce it like عماریندارهاتر it means به طرز محاسبه کرانه تر

despite the fact that it is a Pahlavi word , it is maybe the longest unique Persian word I know in all OldPersian Language, Avesta, Pahlavi and Farsi words, there are other long words too, but they are in conjunction with prefix or post-fix, 

All in all I couldn't understand you are looking for one word, or a word with post or prefixes, and by Farsi you mean only current Farsi or you are looking for word in older Farsi versions too

Regards



--
Nasser Hadjloo
http://www.Hadjloo.ir

--
--
http://persian-computing.org/
http://groups.google.com/group/persian-computing/

---
You received this message because you are subscribed to the Google Groups "Persian Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to persian-comput...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages