Why does one load the stats file, generated by HERest, when generating
the MMF for use in HEAdapt?
according to the handbook the regression tree is generated using HHEd,
specifing the number of terminal nodes and using this to split the data
using euclidean distance measure into these node. So where does it take
into account the results of the stats file
thanks
Shona
I am a bit confused by what you mean by 'stats' file. In my mind it has
two possible meanings:
a) The occupation stats generated by the -s option of HERest
b) The actual models produced (MMF) by HERest
Which one do you mean?
Cheers
Alastair
thanks
Shona
I cant actually see where it says you need to pass the -s stats file to
HEAdapt! As far as I can see there is no way of passing it? Which flag
are you using?
Alastair
Shona
I think the answer is this:
The regression tree created by HHEd stores the occupation information
for each node, this is taken from the 'stats' file that was created by
HERest. The regression tree *just stores* this information for each node.
The reason this is stored in the regression tree is that HEAdapt will
need the occupation information to decide how far down the regression
tree to go. See the -m option in HEAdapt.
Does this make sense?
Alastair James
S
Yes, HHEd created the regression tree using the splitting method.
However, HHEd cannot work out the occupany information on its own (it
has no MFCCs), and therefore needs to load this information from
somewhere. This is where the HERest -s stats file is used?
Make sense?
Alastair
Shona
The occupancy is the result of TRAINING. The occupancy basically says
how much data is available for that state. This MUST be worked out by
aligning all the training data (MFCCs) to the models. This is done in
HERest.
You cant work out occupancies in HHEd on its own.... It does not have
any MFCC files.
Say I have 3 states in a HMM...
From HERest:
State Occupancy
-----------------
S1 o1
S2 o2
S3 o3
Then HHed builds the following regression tree:
N1 ----------- S1
|
|------ N2 ------ S2
|
|-------- S3
(I hope you can see that!)
I.e. S1 is directly below N1. N2 is directly below N1. S2 and S3 are
below N2.
The occupancy is worked out as:
Node Occupancy
-----------------
N1 o1 + o2 +o3 (as all states are below it)
N2 o2 + o2 (as stats 2 and 3 are below it)
S1 o1
S2 o2
S3 o3
So you can see that this process still needs ORIGINAL occupancy infor
from HEREST!!!
Regards
Alastair
N1 ----------- S1
|
|--------- N2 ---------- S2
|
|-------- S3
Al
Shona
HTK is not the most easy to learn bit of software!
Alastair
Shona
It does not really go into this in the HTK book. I think its because you
need to know both sets of occupancies. You need to know the occupancy of
the training data to see if the occupancy of the adaption data is
statistically significant.
Obviously, the adaption set of occupancies can be computed at run-time
by HEAdapt, but the training ones need to be pre-computed.
Perhaps one of the Cambridge guys could confirm this?
Alastair
Shona