Possibly this is the effect of cepstral mean normalization.
David Snyder is going to commit an example script soon showing speech-silence-music segmentation, which might be useful here.
It could also be the effect of roundoff in the decoder, simply from the length of the file. Recompiling with
-DKALDI_DOUBLEPRECISION=1 in kaldi.mk would show if that is the case. This is unlikely though.Most of the effect of cepstral mean normalization will disappear, though, if you use adaptation in your decoding pipeline.