Hi Roman, there may be many reasons.
1) number of parameters in the neural networks
2) how the data is randomized (the file lists should be joint together
3) learning rate (should be appropriate to your set size, but if the set
is randomized, it does not have such impact)
4) what normalization is used
From my experience, the utterances in SpeechDat are very short and
contain lot of silence in comparison to some conversational recordings.
If you do the sentence mean and variance normalization, the melbank
features after normalization are shifted. So it may be good to look at this.
Dne 21.7.2015 v 15:10 Roman napsal(a):
> You received this message because you are subscribed to the Google
> Groups "phnrec" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to phnrec+un...@googlegroups.com
> For more options, visit https://groups.google.com/d/optout