I tried to do speech recognition on a long wav. While I set do-endpointing false, it can decode complete sentence. However, if I try to do endpoint detection, words between segment sometimes miss. I tried to figure out that missing words' wave shows in previous segment (by counting chunk time).
So my question is:
1. what is the possible reason for this situation? I simply guess that it's because of language model (i.e. "<s> how" probability is much higher than "how </s>" , so the words, how, maybe miss in the end of wave), but I'm not sure.
2. Do you suggest to use "EndpointDetected" or just simply break it up into smaller chunks(maybe per 5 minutes)?
Thank you.