Fwd: Announcing Persian Databases for MT and TTS

27 views
Skip to first unread message

John Hudson

unread,
Nov 16, 2017, 1:15:17 PM11/16/17
to Persian Computing
Of possible interest to Persian language technology developers. I can't speak to the quality of these resources or endorse them, but I've been impressed by Jack's work in CJK dictionaries.

J.



-------- Forwarded Message --------
Subject: Announcing Persian Databases for MT and TTS
Date: Thu, 16 Nov 2017 17:06:55 +0900
From: Jack Halpern <cjk_...@cjki.org>
To: jo...@tiro.ca


Hello again,

This is Jack Halpern from The CJK Dictionary Institute (CJKI,
http://www.cjk.org). I hope this email finds you well.

As you know we have spent many years compiling and refining our Database of
Arab Names (DAN), which covers over 6.5 million romanized Arab names and
variants, along with its companion Database of Arab Names in Arabic (DANA),
which covers several hundred thousand Arabic script variants.

We are now pleased to announce that we have employed the same in-depth
linguistic knowledge behind DAN and DANA, in partnership with a team of
native-speaking Persian linguists, to launch two new lexical resources for
Persian:

Database of Persian Names (DPN)
---------------------------------------------------
Our Database of Persian Names (DPN) covers approximately 27,000 individual
Persian personal names, with currently over 440,000 romanized variants. As
you can see in the data sample at the link below, each variant is given a
confidence rank indicating the relative likelihood that the variant will be
encountered in the real world.	

DPN is currently being expanded in size and will continue to grow over the
next several months. Please see the following page contains more details
about this database:

http://www.cjk.org/cjk/samples/dpn.htm

Persian Phonetic Corpus (PPC)
---------------------------------------------------
Our ever-expanding Persian Phonetic Corpus includes with phonetic
annotation, which can be on a sentence level, on a word level, or on both.
Since the annotation is in IPA, it is an accurate representation of the
phonetic realization of the Persian text, rather than a phonemic
representation, making it ideal for text-to-speech applications. Please see
the following for more information including samples:

http://www.cjk.org/cjk/samples/percorp.htm

Both of these Persian resources are unique and I'm confident that they can
contribute to your language technology. Furthermore, as with all of our
data resources, these are not shrink-wrapped products but rather resources
that can be tailored to your specific needs and budgets.

Perhaps we can have a phone conference in the next week or two to discuss
these Persian resources or any other language resources you may require.

I look forward to hearing from you.

Regards, Jack Halpern
	CEO, The CJK Dictionary Institute, Inc. 
	http://www.cjk.org Phone: +81-48-473-3508  

--




Reply all
Reply to author
Forward
0 new messages