Kaldi for dummies questions

858 views
Skip to first unread message

Patrice Yemmene

unread,
Jan 3, 2017, 10:50:42 PM1/3/17
to kaldi...@googlegroups.com, Chiang, Chi-Lung
Hello,

I have a few questions about the Kaldi for dummies tutorial, and hope that someone would be able to assist me. They are about the language data section:

- In the examples listed, each file begins with !SIL sil and <UNK> spn; are these required? What do they mean? If they are required, I am curious why they are
- Files with titles  "phone", "silence phones" and "non silence phones" ...: Does the word "phone' here refer to the linguistic terms "phoneme", or  "sound" (phonetic sound).

Thank you kindly for your consideration

Patrice Yemmene
Masters of Science in Software Engineering candidate
University of Saint Thomas - Minnesota

Daniel Galvez

unread,
Jan 4, 2017, 1:58:18 AM1/4/17
to kaldi-help, cch...@stthomas.edu
Hello,

1) I assume you're talking about the lexicon.txt file. The lines in those files are space delimited, where the first token is a word and the following tokens are its phones. So !SIL is a word and sil is a phone. I'm actually not sure whether this has special meaning, as far as whether the word !SIL will show up in decoded texts. <UNK> has special meaning. Decoded results contain the <UNK> word when no other word in the vocabulary of the ASR system is found to be likely. spn is a stand-in fake phoneme to mean a spoken sound. You don't have to use <UNK> as your out of vocabulary word necessarily. Whatever is in your oov.txt is the out of vocabulary word, but oov.txt conventionally contains <UNK>.

2) It's been a long time since my phonetics class, but in ASR "phones" do not necessarily have to be phonemes. Something you will say in the egs/wsj/s5/ for example, is that there are three phones for each phoneme: a beginning phone, an intermediate phone, and an end phone, depending on where that phoneme occurs in the word. In practice, the phones do end up being phonemes (maybe with some small deviations from IPA) because most lexicons (like CMUdict) are written using linguistic phonemes.

Patrice Yemmene

unread,
Jan 7, 2017, 9:54:26 AM1/7/17
to kaldi-help, cch...@stthomas.edu
This was helpful.

Thank you very much.

Patrice
Reply all
Reply to author
Forward
0 new messages