implementation of GOP with Kaldi

Hunt Rui

unread,

Jul 9, 2019, 5:51:36 PM7/9/19

to kaldi-help

Hi Dan,

I am trying to implement GOP with Kaldi. The procedures I am taking are listed below:

1) Force-align and use ali-to-phones to get phones

2) Use 2-gram phone decoder to get the phones and use lattice-to-post and post-to-phone-post to get the posterior prob of phones

3) Use get-post-on-ali to obtain the prob of each phone at frame level, and average the prob of each phone from all of its corresponding frames.

Does this make sense? I also found that the phones decoded from my 2-gram phone decoder don't make sense sometimes, not all the time. Actually I used the acoustic model trained from TED and built HCLG graph from 2-gram phones of TED speech materials, and use nnet3-latgen-faster to decode it. Should I use 3-gram instead?

Also I tried to implement GOP with another approach by calculating decodable.LogLikelihood(frame, tid) for each frame, and aggregated it, and the number doesn't look right. I don't know whether this approach makes sense or not.

I have struggled over this for some time, and your time and help is well appreciated.

Thanks,

Andy

Daniel Povey

unread,

Jul 9, 2019, 5:54:47 PM7/9/19

to kaldi-help

I am trying to implement GOP with Kaldi. The procedures I am taking are listed below:
1) Force-align and use ali-to-phones to get phones
2) Use 2-gram phone decoder to get the phones and use lattice-to-post and post-to-phone-post to get the posterior prob of phones
3) Use get-post-on-ali to obtain the prob of each phone at frame level, and average the prob of each phone from all of its corresponding frames.

Does this make sense? I also found that the phones decoded from my 2-gram phone decoder don't make sense sometimes, not all the time. Actually I used the acoustic model trained from TED and built HCLG graph from 2-gram phones of TED speech materials, and use nnet3-latgen-faster to decode it. Should I use 3-gram instead?

Sounds reasonable. I think you'll have to use a stronger LM; phones tend to be acoustically quite ambiguous so the

LM needs to be reasonable, to get anything accurate.

Also I tried to implement GOP with another approach by calculating decodable.LogLikelihood(frame, tid) for each frame, and aggregated it, and the number doesn't look right. I don't know whether this approach makes sense or not.

I have struggled over this for some time, and your time and help is well appreciated.

log-likelihoods are quantities that, in general, won't even sum to one; and their values can be very different depending what type of model you have, whether you have a decision tree, the the prior, etc. Getting something that makes sense is complicated. If I were you I would focus on your first approach.

Dan

Thanks,
Andy

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c4a24cd9-5f53-4cc4-9d4f-3f17dd6c1b30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hunt Rui

unread,

Jul 9, 2019, 6:25:55 PM7/9/19

to kaldi...@googlegroups.com

Hi Dan,

I got it, and thank you so much for your comments.

Andy

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuySWQ2rTLwu9wEaE%2B7QiyNXiuRakO7S21%3D3QTjN9cUUdvw%40mail.gmail.com.

achintyaha

unread,

Jul 12, 2019, 7:48:37 AM7/12/19

to kaldi-help

Hi Andy,

I am also working on evaluating pronunciation (I'm a beginner) so just wanted to know about the accuracy of GOP implementation with kaldi?

Thanks

Reply all

Reply to author

Forward