how to filter words from lexicon.txt to words.txt in kaldi

138 views
Skip to first unread message

Sage Khan

unread,
Jul 24, 2022, 3:06:14 AM7/24/22
to kaldi-help

 am trying to use this script to filter words from a lexicon dictionary for my corpus. I want to train my ASR and I am using following script (http://www.eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html#create-files-for-datatrain) I get the following error when I run the code:

   File "/home/metanet/ProgramFiles/kaldi/kaldi/egs/ASR-for-urdu/s5/data/filterdict.py",    

   line 20, in pron = columns[1] 

   IndexError: list index out of range

The original code was not working as it is so I altered it. Here is the code

                                      
#!/bin/sh

#  filter_dict.py
#  
#
#  Created by Eleanor Chodroff on 2/22/15.
# This script filters out words which are not in our corpus.
# It requires a list of the words in the corpus: words.txt

import os

ref = dict()
phones = dict()

with open("lexicon.bak") as f:
    for line in f:
        line = line.strip()
        columns = line.split(" ", 1)
        word = columns[0]
        pron = columns[1]
        try:
            ref[word].append(pron)
        except:
            ref[word] = list()
            ref[word].append(pron)

print (ref)

lex = open("lexicon.txt", "wb")
lex.write("<oov> <oov>\n")

with open("words.txt") as f:
    for line in f:
        line = line.strip()
        if line in ref.keys():
            for pron in ref[line]:
                lex.write(line + " " + pron+"\n")
        else:
            print ("Word not in lexicon:" + line)


Desh Raj

unread,
Jul 25, 2022, 9:02:24 PM7/25/22
to kaldi...@googlegroups.com
Looks like some of the words in your lexicon.bak do not have corresponding pronunciations.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/19126b04-1e99-42ee-b844-3455032c930cn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages