WER Over 100%

Stefan Watson

unread,

Nov 17, 2015, 2:16:58 PM11/17/15

to kaldi-help

Hey

I am using Kaldi for some recognition experiments using a different data set from the ones prescribed in the example. Whenever I decode the monophone model the WER is always over 100 percent. I do get insertions and deletions however majority of my words recognized are substituted. Below is an example of my most recent attempt

WER: 101% ins 1414, del 1546, subs 6948

SER: 94% 9908/10487

I'm no to certain why this problem persists if anyone could help me out it would be much apreciated

Tony Robinson

unread,

Nov 17, 2015, 2:26:00 PM11/17/15

to kaldi...@googlegroups.com

Do you need monophone models for some reason? If not, I suggest you just continue, monophone models aren't expected to be very good (unless you are doing something fancy like LSTMs/CTC). If this isn't appropriate we need more info as to what your problem is.

Tony

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK

Stefan Watson

unread,

Nov 17, 2015, 2:47:07 PM11/17/15

to kaldi-help

When ever I continue onward the WER doesn't change by much if at all. I've looked at the logs and they seen pretty normal compared to others. I'm currently decode a rerun I did with some changes. But there is a couple thing that maybe you could clarify as I'm new to this toolkit. How do you get a visual representation of the mfcc features and how do you deal with stereo data? in the tutorial I see where they use a stm file and a reco2file_and_channel files to select which channel to use. However the data I'm working with doesn't have a .stm file. I know that compute-mfcc-feats.cc uses channel zero however I'm unsure as to the correct approach to take

Tony Robinson

unread,

Nov 17, 2015, 3:00:00 PM11/17/15

to kaldi...@googlegroups.com

Okay, so have you run through a few of the examples? You'll want to know that you can run them end-to-end and get the same results as in the RESULTS files. I'd recommend the TEDLIUM recipe to really get into it. Then you'll want to modify the local scripts to deal with your data. I'd start by just changing the acoustic model unless you think that you have a very specific language modelling task.

Make your audio single channel and downsample to 16kHz to match TEDLIUM.

There's no good visualisation of MFCC features.

Of course you could be doing everything right and your data is just very very difficult. If you like send me an email and I'll make available our system for say a 4 hour test set - that way you'll know roughly if ASR is ever going to do the job you want it to do or if the data is just too noisy.

Tony

Stefan Watson

unread,

Nov 17, 2015, 3:18:50 PM11/17/15

to kaldi-help

Tony

I did run through the examples as best as possible as I do not have access to the data for the examples online. I do not have access to the TEDLIUM recipe. I did downsample the audio to 16kHz but its still stereo so I will attempt that. Thanks for letting me know about the MFCC features. I would appreciate that test set to see if I can get better result

Thanks for your help it it much appreciated

Reply all

Reply to author

Forward