Need help understanding phones.txt

ModifiedDuck

unread,

Apr 6, 2017, 1:48:35 PM4/6/17

to kaldi-help

I think I have a pretty good understanding of how kaldi works at this point, what files need to be created, what scripts to use etc. I'm not a complete noob is what I'm saying.

One thing I'm having trouble with is the phones.txt file. I think I know what phones are in general, but what must be put in the phones.txt file (data/lang directory)? Where are those coming from, do I choose them from somewhere (like the International Phonetic Alphabet) or is it something else entirely? For instance, in egs/yesno, it only uses 2 phones (Y N) even though the spoken Hebrew words for 'yes' and 'no' clearly use more than that. And in egs/voxforge, I'm seeing a ridiculous ammount of phones, 167 I think, even though I read that the English language uses around 40 different phones. And in voxforge a lot of them seem complex, as in they have suffixes, using other letters. What is that about?

I'd really appreciate if someone could clear this up for me, I haven't found anything in the documentation that really answers this. Thank you in advance.

Daniel Povey

unread,

Apr 6, 2017, 2:09:36 PM4/6/17

to kaldi-help

The phones don't have to coincide with a linguistice notion of phones. Read the HTK Book to understand how they relate to the HMMs.

The voxforge setup probably has word-position-dependent phones- they encode the word-boundary information in addition to the actual phones.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ModifiedDuck

unread,

Apr 6, 2017, 2:50:22 PM4/6/17

to kaldi-help

Thank you for the fast reply.

Even though they don't have to be, they still could be, right? Right now, I'm trying to train a small vocabulary system (~25 words) and I'm not sure how to create the phones.txt file. My first thought was to simply break down all my potential words to their phonetic representation, according to the IPA, and run with that. Can I expect at least reasonably good results this way or do I need to study some more and structure the problem in a different way? I'm reading the HTK Book at the moment, it's a bit over my head but I'll manage eventually.

Daniel Povey

unread,

Apr 6, 2017, 2:53:03 PM4/6/17

to kaldi-help

Yes, they could be real phones, and using IPA phones is a very reasonable choice, however you will probably find it easiest to use an ASCII representation of them rather than Unicode. [Unicode would work, though, if you encode as UTF-8].

I think ARPABET is common choice of ASCII phone-set representation for US English, but I don't follow this aspect of things very closely

On Thu, Apr 6, 2017 at 11:50 AM, ModifiedDuck <iraklisg...@gmail.com> wrote:

Thank you for the fast reply.

Even though they don't have to be, they still could be, right? Right now, I'm trying to train a small vocabulary system (~25 words) and I'm not sure how to create the phones.txt file. My first thought was to simply break down all my potential words to their phonetic representation, according to the IPA, and run with that. Can I expect at least reasonably good results this way or do I need to study some more and structure the problem in a different way? I'm reading the HTK Book at the moment, it's a bit over my head but I'll manage eventually.

--

ModifiedDuck

unread,

Apr 6, 2017, 3:10:37 PM4/6/17

to kaldi-help

Thank you so much, that's exactly what I was thinking! ASCII is definitely the way to go, I was looking at this to do the trick, Unicode is going to be more trouble than worth. I'm not working on US English right now, but Greek, so Unicode is going to be necessary eventually, but I think it's better to do the conversion from ASCII to Unicode (Greek) after kaldi has done it's job. Easier to manage that way.

Daniel Povey

unread,

Apr 6, 2017, 3:27:26 PM4/6/17

to kaldi-help

Your choice, but for the actual text (in words.txt) we normally use the most natural representation, i.e. unicode where necessary. Kaldi actually deals very little with words.txt at the C++ level, it doesn't really do string manipulations on its contents where the unicode would be a problem.

On Thu, Apr 6, 2017 at 12:10 PM, ModifiedDuck <iraklisg...@gmail.com> wrote:

Thank you so much, that's exactly what I was thinking! ASCII is definitely the way to go, I was looking at this to do the trick, Unicode is going to be more trouble than worth. I'm not working on US English right now, but Greek, so Unicode is going to be necessary eventually, but I think it's better to do the conversion from ASCII to Unicode (Greek) after kaldi has done it's job. Easier to manage that way.

--

Reply all

Reply to author

Forward