About FST based lexicon transducer

Seongmin Lim

unread,

Jul 23, 2019, 8:29:45 PM7/23/19

to kaldi-help

Hello.

I implement an end-to-end speech recognition model.

I try to compare the performance between the phoneme model and the character model.

To find out WER on phoneme model,

I made L.fst using make_lexicon_fst.py as in kaldi.

However, if I use fstcompose,

when there are strange characters in the middle of the phoneme sequence,

composing does not work and have an empty fst output.

ex) Lexicon

WE w i

CAN k ae n

FLY f l a i

input sequence

w i k ae n f l a i --> WE CAN FLY

w i k ae l n f l a i --> X

How kaldi decoder solved this problem?

Daniel Povey

unread,

Jul 23, 2019, 8:41:33 PM7/23/19

to kaldi-help

That situation doesn't arise in Kaldi because the phone set is defined in advance.

It seems to me you should have thought about this kind of thing before designing your model.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/fea3f54d-db07-41e1-bcc6-286674edbe02%40googlegroups.com.

Seongmin Lim

unread,

Jul 23, 2019, 8:55:48 PM7/23/19

to kaldi-help

Sorry for wrong expression "strange".

I mean even if the whole recognized phone sequence is in the phone set,

my basic FST transducer can not handle a little wrong recognition case.

If the recognized sequence is "w i k ae l n f l a i",

w i --> WE

k ae l n --> <UNK>

f l a i --> FLY

So I want to have result like "WE <UNK> FLY" .

But if I use

>> fstcompose input.fst L.fst decoded.fst

The result is empty because composed FST is broken when it meets unknown phone sequence.

It maybe solved if I add transition from all states to LOOP state,

but I think that is not good idea.

2019년 7월 24일 수요일 오전 9시 41분 33초 UTC+9, Dan Povey 님의 말:

That situation doesn't arise in Kaldi because the phone set is defined in advance.

It seems to me you should have thought about this kind of thing before designing your model.

On Tue, Jul 23, 2019 at 5:29 PM Seongmin Lim <gbri...@gmail.com> wrote:

Hello.
I implement an end-to-end speech recognition model.
I try to compare the performance between the phoneme model and the character model.

To find out WER on phoneme model,
I made L.fst using make_lexicon_fst.py as in kaldi.

However, if I use fstcompose,
when there are strange characters in the middle of the phoneme sequence,
composing does not work and have an empty fst output.

ex) Lexicon
WE w i
CAN k ae n
FLY f l a i

input sequence
w i k ae n f l a i --> WE CAN FLY
w i k ae l n f l a i --> X

How kaldi decoder solved this problem?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,

Jul 24, 2019, 5:15:00 PM7/24/19

to kaldi-help

You could just ensure that the lexicon contains an entry for each of the phones.

You'd need some kind of probabilities/costs so that those paths were less probable than the "real" ones.

You can't view it as a deterministic mapping, it needs to have some kind of model.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/1d2a703d-1539-425b-88a0-241824a86617%40googlegroups.com.

Reply all

Reply to author

Forward