phone transcription for Turkish

187 views
Skip to first unread message

ahmet

unread,
Nov 4, 2016, 5:33:22 AM11/4/16
to kaldi-help
Hi,I am trying to create a model with Turkish Yes(Evet) and No(Hayır).What should I write to lexicon.txt or where can I find phone transcriptions? 

Thanks in advance
Ahmet

Daniel Povey

unread,
Nov 4, 2016, 5:39:11 PM11/4/16
to kaldi-help
If you are training only on "yes" and "no" data, actually the phone transcriptions won't matter because there is no overlap between the words.  You could just arbitrarily name the phones like this:

evet e v e t
hayir h a y i r
or
event evet1 evet2 evet3 evet4
hayir hayir1 hayir2 hayir3 hayir4 hayir5

Dan



--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ahmet

unread,
Nov 10, 2016, 10:04:13 AM11/10/16
to kaldi-help
Thank you .But if I want to add some words, is there any application that gives phone transcriptions?
Best Regards
Ahmet

Jan Trmal

unread,
Nov 10, 2016, 11:21:46 AM11/10/16
to kaldi-help
Turkish is quite phonetic language, so you can train a graphemic system. Or you can do that and add some ad-hoc rules to fix the most obvious errors, if you speak that language.
Otherwise do not bother -- we tried both phonetic and graphemic lexicon system for turkish for babel and the difference was negligible, IIRC (say 42 % vs 42.3 %. I don't remember the numbers but it wasn't a concern at all).
y.


--

Jan Trmal

unread,
Nov 10, 2016, 12:15:05 PM11/10/16
to kaldi-help
When I said "phonetic language", what I meant is that there is a (mostly) systematic 1:1 mapping between the graphemes and the phonemes. 
y.

Danijel Korzinek

unread,
Nov 11, 2016, 3:34:21 AM11/11/16
to kaldi-help
Not sure if anyone here is familiar with Turkish, per-se. If you have a (reasonable sizable) phonetic dictionary, you can try and train a system that will be able to approximate transcriptions for words that are not in the dictionary.

As an example on how to train such a system, you can to look in the ./kaldi/egs/librispeech/s5/local/g2p/train_g2p.sh script. You will probably have to do some minor modifications to use a Turkish dictionary instead of cmudict.

If you want to do this, you will also need to go to the tools directory and install Sequitur G2P:  cd tools; ./extras/install_sequitur.sh

You can also Google around the Turkish internet (most likely it will be easier for you than anyone else here) if someone doesn't provide an already trained G2P system for Turkish (be it statistical or rule-based).

ahmet

unread,
Nov 11, 2016, 3:58:24 AM11/11/16
to kaldi-help
Thank you for your answers.I will try them.
Ahmet

Nickolay Shmyrev

unread,
Nov 12, 2016, 3:09:53 AM11/12/16
to kaldi-help
Hi all

If you plan to work on Turkish LVCSR there is a good work on Turkish phonetic dictionary done here:

ahmet

unread,
Nov 14, 2016, 1:26:20 AM11/14/16
to kaldi-help
Thank you so much.

Ahmet
Reply all
Reply to author
Forward
0 new messages