Confidence scores of Forced-alignment result

Jackie

unread,

Jun 24, 2021, 1:27:01 PM6/24/21

to kaldi-help

Dear all,

I used a pre-trained model to do forced-alignment with some non-native data with only orthographic transcriptions.

The transcriptions don't contain the annotation of pronunciation errors made by non-native speakers.

I'd like to know whether it's possible to obtain confidence scores from the alignment results?

If it's feasible, could anyone give me some suggestions/comments about how to do this?

Many thanks in advance!

Daniel Povey

unread,

Jun 27, 2021, 8:58:26 AM6/27/21

to kaldi-help

You could look at the egs/gop_speechocean762 recipe.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/15f8eccd-f6cc-42c8-ba12-09b3af402d94n%40googlegroups.com.

John Locke

unread,

Oct 16, 2021, 4:36:09 PM10/16/21

to kaldi-help

Hello Dan,

What I noticed in the gop_speechocean762 recipe is that the training dataset has pronunciation ratings/scores. Do you know if I could use compute-gop binary with a model, let's say trained with the WSJ recipe without any change (no pronunciation ratings)?

Sorry for the dumb question, there is very little documentation I could find on compute-gop.

Daniel Povey

unread,

Oct 17, 2021, 3:25:47 AM10/17/21

to kaldi-help, Junbo Zhang

Junbo (cc'd) may be able to answer.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bdd38917-4745-483f-a7fa-16ebf2d335c7n%40googlegroups.com.

Junbo Zhang

unread,

Oct 18, 2021, 2:42:14 AM10/18/21

to kaldi-help

The compute-gop binary does not require the pronunciation rating labels. As the inputs of that binary, you just need a) the output prob of the neural network, b) the alignment.

Yes, stage 8 of egs/gop_speechocean762/s5/run.sh uses a "text-phone.int" file, but that stage is just for generating the alignment.

For WSJ corpus, you can use your way (for example, stage 2 of egs/wsj/s5/run.sh) to get the alignment.

To learn the traditional NN-GOP algorithm, you may want to read the paper https://www.researchgate.net/publication/270596198.

John Locke

unread,

Oct 19, 2021, 7:49:34 AM10/19/21

to kaldi-help

Hello,

That is interesting and confirms my assumptions. I have one more question: do you know how much is the model influenced by clean training data (without noise, clear recordings, etc) versus noisy training data? I assume the GOP would be better evaluated if all training data was clean.

Reply all

Reply to author

Forward