Many thanks for the previous suggestion and help (including the ICASSP2018 paper).
Your suggestion helped improve the accuracy-wise performance a lot.
I value each single second you put in my case. So I did want to try my best first and here comes my follow-up.
By working on smaller chunk,say 2s, it can partially solve the problem. But not completely.
After reading your docs/responses, I can use nnet3-copy to play with nnet3 models.
To achieve my streaming goal, I splited the original trained dnn (final.raw) into two parts/steps: (something may be wrong, so I provided details here:)
#step1: only include first 5 layers
(named this dnn as dnn_step2)
(will be used in a for loop to handle streaming data) :
nnet3-copy --prepare-for-test=true --nnet-config='echo output-node name=output input=tdnn5.batchnorm |' --edits='remove-orphans' final.raw dnn_step1
#step2:the rest of the layers, i.e, statistic extraction/pooling layer + layer 6 (named this dnn as dnn_step2)
nnet3-copy --binary=false --nnet-config=final.raw dnn_part2.tmp
#Then remove the component-node of the firt 5 layers and change input node and input dim
grep -v "component-node name=tdnn[1-5]" dnn_step2.tmp | sed -e "s|input-node name=input dim=20|input-node name=input dim=1500|; s|input=tdnn5.batchnorm|input=input|" > tmp.net; #remove orphans:
nnet3-copy --edits='remove-orphans' tmp.net dnn_step2
In other words, I want divide a DNN into two parts like the following:
######## dnn_step1
input-node name=input dim=23
component-node name=tdnn1.affine component=tdnn1.affine input=Append(Offset(input, -2), Offset(input, -1), input, Offset(input, 1), Offset(input, 2))
component-node name=tdnn1.relu component=tdnn1.relu input=tdnn1.affine
component-node name=tdnn1.batchnorm component=tdnn1.batchnorm input=tdnn1.relu
...
component-node name=tdnn5.affine component=tdnn5.affine input=tdnn4.batchnorm
component-node name=tdnn5.relu component=tdnn5.relu input=tdnn5.affine
component-node name=tdnn5.batchnorm component=tdnn5.batchnorm input=tdnn5.relu
########### (I want to split the dnn: final.raw from here) ##################
######## dnn_step2
component-node name=stats-extraction-0-10000 component=stats-extraction-0-10000 input=tdnn5.batchnorm
component-node name=stats-pooling-0-10000 component=stats-pooling-0-10000 input=stats-extraction-0-10000
component-node name=tdnn6.affine component=tdnn6.affine input=Round(stats-pooling-0-10000, 1)
...
component-node name=output.log-softmax component=output.log-softmax input=output.affine
output-node name=output input=output.log-softmax objective=linear
====
My plan is to rely on the function RunNnetComputation from the following file:
to do the following operation:
step 1: for-loop:proccess mfcc feature stream with dnn_step1
step 2: get mean & stdvar and derive xvector with dnn_step2
No. 1 challenge here is that:
It complains:
cindex output(0, 0, 0) is not computable for the following reason
Detailed info is attached:
kNotComputable.log.txt
since I can not make this work, so I also tried the loopped version (as mentioned in my earlier emails, and Thanks for Dan's suggestion.)
This can make step 1 successfully run.
Then I continue rely on RunNnetComputation to finish step 2.
Surprisingly, this approach (let's name it as system4) gives different xvetors therefore results.
I assume we should get exactly close results as non-splited version. (system2 or 3) (see the following table)
To sum up, I listed the performance for different system here:
system1: i-vector: 9.8% (based on egs/sre10/v1 framework)
system2: x-vector: 8.5% (based on egs/sre16/v2 framework,max_chunk_size=5s)
system3: x-vector:11.2% (my modification, mean version of DecodableNnetSimpleLooped::GetOutputForFrame) (previously EER=15.6%, there's some mis-match between training & testing)
system4: x-vector:14.2% (looped_step1 + step2)
system5: x-vector: 8.8% (based on egs/sre16/v2 framework, but with smaller chunk size: max_chunk_size=2s); (Thanks for David's suggestion)