How to get phone alignments using kaldi?

1,779 views
Skip to first unread message

Iris Luo

unread,
Dec 23, 2017, 3:21:22 AM12/23/17
to kaldi-help
Hi everyone, 
I'm now trying to get phone alignments using ali-to-phone in Kaldi, but I find that after training, the ali.*.gz files only contain the alignments of the train part in dataset.
So I wonder how can I get the alignments of "test" part.
I have tried to use gmm-align-compiled to generate ali files with "test" features, but it keeps sending error message. I don't know if it's because the fst file only contain the information of "train" utterances.
 And I also can's understand why fsts files are needed here, isn't it used in decoding to text?

Can someone help me? I have been puzzled for a quiet long time. 

Sincerely, Iris.

Iris Luo

unread,
Dec 23, 2017, 3:31:53 AM12/23/17
to kaldi-help

Here is my code:
#!/bin/bash

stage=0
srcdir=`pwd`/tri3b
alidir=`pwd`/ali
honedir=`pwd`/phone
#featdir=`pwd`/pdeltafeats
featdir=`pwd`/pwsjfeats
cmd=utils/run.pl
lang=utils/lang_nosp

#. utils/parse_options.sh
. ./utils/path.sh

# Begin configuration.
nj=10
mdl="gmm-boost-silence --boost=$boost_silence cat'$lang/phones/optional_silence.csl' $srcdir/final.mdl - |"
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
beam=10
retry_beam=40
careful=false
#boost_silence=1.0 # Factor by which to boost silence during alignment.

for d in $phonedir; do
   if [ ! -d "$d" ]; then
     mkdir $d
   fi
done

if [ $stage -le 1 ]; then
   echo "$stage: aligning data in $featdir using model from $srcdir, putting alignments in $alidir"
   for x in train test; do
#      for file in `find -L $featdir/$x -iname '*.ark'` ;do
#          feats="ark,t:$file ark:- |"
#      done
      #feats="ark:apply-cmvn --utt2spk=$featdir/$x/utt2spk scp:$featdir/$x/cmvn.scp scp:$featdir/$x/feats.scp ark:- | add-deltas --delta-window=3 --delta-order=2 ark:- ark:- |"
#      $cmd JOB=1:$nj $alidir/log/align.JOB.log \
#          gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful $srcdir/final.mdl \
#             ark:$srcdir/fsts.JOB.gz $feats ark:$alidir/ali.JOB || exit 1;

      feats=/media/toshiba2/gan/pdeltafeats/train/raw_mfcc_train.6.ark
      $cmd JOB=1:$nj $alidir/log/align.JOB.log \
          gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful $srcdir/final.mdl \
             "ark:gunzip -c $srcdir/fsts.JOB.gz|" $feats ark:$alidir/ali.JOB || exit 1;
#      tra="ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt $sdata/JOB/text|";
#      # We could just use gmm-align in the next line, but it's less efficient as it compiles the
#      # training graphs one by one.
#      $cmd JOB=1:$nj $alidir/log/align.JOB.log \
#        compile-train-graphs --read-disambig-syms=$lang/phones/disambig.int $srcdir/tree $srcdir/final.mdl  $lang/L.fst "$tra" ark:- \| \
#        gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful "$mdl" ark:- \
#        $feats "ark,t:|gzip -c >$dir/ali.JOB.gz" || exit 1;

   done

fi

if [ $stage -le 2 ]; then
   echo "$stage: converting ali to phone, putting phone in $phonedir"
   for x in train test; do
      for file in `find -L $srcdir/$x -iname 'ali.*.gz'` ;do
         basename=`basename $file .gz`
         gunzip -c $file > $phonedir/temp
         ali-to-phones --per-frame $srcdir/final.mdl ark:$phonedir/temp ark,t:$phonedir/$basename
         rm $phonedir/temp
      done
   done
fi

Daniel Povey

unread,
Dec 23, 2017, 3:59:58 PM12/23/17
to kaldi-help
If you trained your model with train_sat.sh then you can get alignments for your testing data with align_fmllr.sh; otherwise, with align_si.sh.  However, these alignments will use the 'text' file in your testing directory so the testing data would be treated as just another dataset with supervision.
If you don't want to make use of the supervision, i.e. you want the alignments to be derived from the decoding output, probably the most straightforward way to get the ali.*.gz files would be to use the script decode_nolats.sh.

Dan



--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/853938fd-289e-471d-8436-e6cf60339556%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Iris Luo

unread,
Dec 25, 2017, 6:32:34 AM12/25/17
to kaldi-help
Thank you for your reply!
It's very helpful.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages