I am using the v1 streaming API to recognize answers to questions. I start the StreamingRecognize and then play the question. I stop playing the question when I get the first intermediate result, as the user has barged in. As the answers are simple words, I have been using the single_utterance mode.
The problem is that when the question is quite long, around 8 seconds, and the user waits until the question has finished. In these circumstances, I get no results even though the user has definitely spoken. If I don't use single utterance mode, I get results regardless of how long the question is.
It seems that the long period of silence before the answer is the problem as shorter questions work OK.
Is this a reasonable use case for the single utterance mode?