Is there any utility to convert Phone Level CTM (aliignmen) to word level?

81 views
Skip to first unread message

Agrover112

unread,
Aug 22, 2022, 11:14:21 AM8/22/22
to kaldi-help
I need to convert a .ctm file which has phones, and get the .ctm file with words OR
worst case a way to get the word boundaries .

Daniel Povey

unread,
Aug 23, 2022, 12:21:09 AM8/23/22
to kaldi...@googlegroups.com
there isn't a straightforward way, from ctm.  from alignment files there would be a way.

On Mon, Aug 22, 2022 at 8:14 AM Agrover112 <agrov...@gmail.com> wrote:
I need to convert a .ctm file which has phones, and get the .ctm file with words OR
worst case a way to get the word boundaries .

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/52c5665a-6364-4eb7-b01b-1aca472647c0n%40googlegroups.com.

Nayan JHA

unread,
Aug 23, 2022, 3:07:23 AM8/23/22
to kaldi-help
@Dan Povey  can you please elaborate on how to get the word level boundary time stamps from the alignments files !

Agrover112

unread,
Aug 31, 2022, 6:49:20 AM8/31/22
to kaldi-help
Cool, I found a way and leaving this in the thread for others seeking an similar answer:

linear-to-nbest ark:exp/mono0a/ali.1  "ark:utils/sym2int.pl -f 2- data/lang/words.txt data/train_yesno/split1/1/text|"  '' '' 'ark:1.nbest'

# Step 2

lattice-align-words-lexicon
"data/lang/phones/align_lexicon.int" "exp/mono0a/final.mdl" "ark:1.nbest" "ark:aligned.1"
# Use lattice-align-words if pos dep training (word_boundary.int) is not present

# Step 3 Word level
# Convert nbest lattice to ctm  and then map the integers to 2 symbols (words are the symbols here)

nbest-to-ctm "ark:aligned.1" - | utils/int2sym.pl -f 5- data/lang/words.txt > words.1.ctm

Reply all
Reply to author
Forward
0 new messages