The force alignment can not be done directly by any switch. It is
possible to create a network with linear string of models and word end
nodes.
This can be used for decoding.
Set these lines in the config file:
[decoder]
type=stkint
[networks]
default=$C/net/network
gen_phn_loop=false
gen_kws_net=false
The network is read from the 'network' file and it can be generated by a
script. The network must be generated (and the tool run) for
each speech file separately. Another possibility is to modify the main file.
Petr
When I've done forced alignment in the past I adapted the lattice script,
and used the HVite -a option and removed the z,w,n,l options i.e.:
-bash-3.2$ diff lattice.sh tim_forcedAlign.sh
49c49
< -T 1 -y 'rec' -z 'latt' \
---
> -T 1 -y 'rec' \
51,52c51
< -w ${OutputDir}/scoring/monophones_lnet.hvite \
< -n 2 1 \
---
> -a \
55d53
< -l ${OutputDir}/lattice \
The known label file (without timing information) than just needs to go into
the htkout directory, e.g. /tmp/phnrec_lat/htkout/test.lab
This seems to produce good results. Petr - would this produce similar
results to the solution you mention, or is this suboptimal?
I'd be interested to know if anyone has compared the performance of phnrec
forced alignment with the state of the art (e.g. Hosom 2009 on TIMIT).
Thanks,
Tim