Some more detail on this. If the call flow is that we have an MRCPRecog that plays a prompt while listening to user input, and then plays a response.
The incoming RTP packets are very crudely like this, where the caller is silent then says something to the prompt and is then silent again.
_______________nnnnnnnnnnnnnn_________________
The flat lines are RTP packets containing silence, and the n's represent audio. The outgoing RTP packets from our side over the same period are:
nnnnnnnnnnnnnn nnnnnnnnnnnnnn
Note we send no RTP packets at all during the listening state.
What happens when we send nothing at all is that the first packet after the silence appears to get rewritten by the VOIP provider. The packet they receive from us after the silence has a sequence number one greater than the prior packet and it's timestamp is 5 seconds after the prior packet. Our provider seems to rewrite its timestamp to be the timestamp of the prior packet plus its length (160ms). Then the final destination seeing the mismatch between arrival time and timestamp interprets this as jitter and ignores the first set of packets so the initial part of the prompt after recognition is not heard by the caller.
What our provider says, is that we should be sending packets during the listening state even though we're not saying anything at that point i.e.:
nnnnnnnnnnnnnn____________________nnnnnnnnnnnnnn
As mentioned transmit_silence = yes does not seem to cause the required behaviour.
I have tried this with a different VOIP provider and I also experience prompt clipping, so it does not appear to be unique to our provider.
Is there anyway to cause silent RTP packets to be transmitted during the listening phase of MRCPRecog ?