dev file test.txt: 11697 sentences, 65592 words, 0 OOVs 0 zeroprobs3gram.kn012.gz logprob= -205753.1 ppl=459.3325 ppl1= 1370.451 WER=21.32
3gram.me.gz logprob= -219334.9 ppl= 688.4208 ppl1= 2207.637 WER=17.23
Yes, that really looks strange. One possibility is that the training text is not as much in-domain as you think.
$ steps/info/chain_dir_info.pl exp/chain/tdnn1i_sp/exp/chain/tdnn1i_sp/: num-iters=422 nj=3..12 num-params=10.4M dim=40+100->3512 combine=-0.043->-0.042 (over 8) xent:train/valid[280,421,final]=(-0.972,-0.820,-0.815/-0.984,-0.857,-0.844) logprob:train/valid[280,421,final]=(-0.060,-0.043,-0.042/-0.061,-0.047,-0.048)
augment data with reveb, noise, babble
$steps/info/chain_dir_info.pl exp/chain/tdnn1i_online_cmvn_aug_sp/
exp/chain/tdnn1i_online_cmvn_aug_sp/: num-iters=979 nj=3..12 num-params=10.4M dim=40+100->3512 combine=-0.072->-0.071 (over 10) xent:train/valid[651,978,final]=(-1.16,-0.976,-0.956/-1.41,-1.26,-1.22) logprob:train/valid[651,978,final]=(-0.089,-0.068,-0.068/-0.100,-0.086,-0.083)
The results in lm weights between 7~17 are almost the same.
I think My acoustic model is overfitting because my train and test set sentences generated automatic and have the same pattern in almost of sentences. (I think the occurrence of sequences of phones are actually the same in both train and sets.)