Huge difference between SER and WER

559 views
Skip to first unread message

mrinalini kannan

unread,
Feb 29, 2016, 4:52:06 AM2/29/16
to kaldi-help
Hi

I have been working on kaldi for past few weeks. I have no access to LDC databases except TIMIT. I am trying to build a recognition system using my own data in English. The data contains 1 hr of speech and around 1000 utterances. I have developed the monophone and triphone systems using sample scripts from rm. However on decoding I got the following results:

Monophone system: WER= 38.8% SER=95%
Triphone system: WER=38.04% SER=91%.

I tried increasing insertion penalty upto 2.0 but could see a difference of only 0.5% in the results.

I would like to know why is there a huge difference between the WER and SER values. How can i narrow down this difference? How can i improve the performance of my system in general?

For training the triphone system, I used num of leaves and num of gaussians similar to rm recipe (1800,9000). Since my data is small and only for single speaker how can i change these parameters.

Kindly Help

remi....@gmail.com

unread,
Feb 29, 2016, 5:57:45 AM2/29/16
to kaldi-help
The sentence error rate is expected to be higher, because only sentences that are 100% right will count. With 38% WER you have a very small probability of getting everything right in an utterance.
With smaller data you need less parameters. I'm don't know what would work for you, probably a low amount of leaves (like 400), but with only one hours of data you should be able to try a lot of them and see what works best.
By the way, the tedlium corpus is free, and there is a recipe for it in Kaldi. I think it's about 150 hours, so it would give you already a decent system. 
Reply all
Reply to author
Forward
0 new messages