How to get phone alignments using kaldi?

Iris Luo

unread,

Dec 23, 2017, 3:21:22 AM12/23/17

to kaldi-help

Hi everyone,

I'm now trying to get phone alignments using ali-to-phone in Kaldi, but I find that after training, the ali.*.gz files only contain the alignments of the train part in dataset.

So I wonder how can I get the alignments of "test" part.

I have tried to use gmm-align-compiled to generate ali files with "test" features, but it keeps sending error message. I don't know if it's because the fst file only contain the information of "train" utterances.

And I also can's understand why fsts files are needed here, isn't it used in decoding to text?

Can someone help me? I have been puzzled for a quiet long time.

Sincerely, Iris.

Iris Luo

unread,

Dec 23, 2017, 3:31:53 AM12/23/17

to kaldi-help

Here is my code:

#!/bin/bash

stage=0

srcdir=`pwd`/tri3b

alidir=`pwd`/ali

honedir=`pwd`/phone

#featdir=`pwd`/pdeltafeats

featdir=`pwd`/pwsjfeats

cmd=utils/run.pl

lang=utils/lang_nosp

#. utils/parse_options.sh

. ./utils/path.sh

# Begin configuration.

nj=10

mdl="gmm-boost-silence --boost=$boost_silence cat'$lang/phones/optional_silence.csl' $srcdir/final.mdl - |"

scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"

beam=10

retry_beam=40

careful=false

#boost_silence=1.0 # Factor by which to boost silence during alignment.

for d in $phonedir; do

if [ ! -d "$d" ]; then

mkdir $d

fi

done

if [ $stage -le 1 ]; then

echo "$stage: aligning data in $featdir using model from $srcdir, putting alignments in $alidir"

for x in train test; do

# for file in `find -L $featdir/$x -iname '*.ark'` ;do

# feats="ark,t:$file ark:- |"

# done

#feats="ark:apply-cmvn --utt2spk=$featdir/$x/utt2spk scp:$featdir/$x/cmvn.scp scp:$featdir/$x/feats.scp ark:- | add-deltas --delta-window=3 --delta-order=2 ark:- ark:- |"

# $cmd JOB=1:$nj $alidir/log/align.JOB.log \

# gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful $srcdir/final.mdl \

# ark:$srcdir/fsts.JOB.gz $feats ark:$alidir/ali.JOB || exit 1;

feats=/media/toshiba2/gan/pdeltafeats/train/raw_mfcc_train.6.ark

$cmd JOB=1:$nj $alidir/log/align.JOB.log \

gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful $srcdir/final.mdl \

"ark:gunzip -c $srcdir/fsts.JOB.gz|" $feats ark:$alidir/ali.JOB || exit 1;

# tra="ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt $sdata/JOB/text|";

# # We could just use gmm-align in the next line, but it's less efficient as it compiles the

# # training graphs one by one.

# $cmd JOB=1:$nj $alidir/log/align.JOB.log \

# compile-train-graphs --read-disambig-syms=$lang/phones/disambig.int $srcdir/tree $srcdir/final.mdl $lang/L.fst "$tra" ark:- \| \

# gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful "$mdl" ark:- \

# $feats "ark,t:|gzip -c >$dir/ali.JOB.gz" || exit 1;

done

fi

if [ $stage -le 2 ]; then

echo "$stage: converting ali to phone, putting phone in $phonedir"

for x in train test; do

for file in `find -L $srcdir/$x -iname 'ali.*.gz'` ;do

basename=`basename $file .gz`

gunzip -c $file > $phonedir/temp

ali-to-phones --per-frame $srcdir/final.mdl ark:$phonedir/temp ark,t:$phonedir/$basename

rm $phonedir/temp

done

fi

Daniel Povey

unread,

Dec 23, 2017, 3:59:58 PM12/23/17

to kaldi-help

If you trained your model with train_sat.sh then you can get alignments for your testing data with align_fmllr.sh; otherwise, with align_si.sh. However, these alignments will use the 'text' file in your testing directory so the testing data would be treated as just another dataset with supervision.

If you don't want to make use of the supervision, i.e. you want the alignments to be derived from the decoding output, probably the most straightforward way to get the ali.*.gz files would be to use the script decode_nolats.sh.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/853938fd-289e-471d-8436-e6cf60339556%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Iris Luo

unread,

Dec 25, 2017, 6:32:34 AM12/25/17

to kaldi-help

Thank you for your reply!

It's very helpful.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Reply all

Reply to author

Forward