Capture conversation audio using recordChannelAudio

16 views
Skip to first unread message

Spiros Dimopoulos

unread,
Jun 16, 2015, 11:25:25 AM6/16/15
to si...@googlegroups.com
Hi,

I am using version 1.9.8 of reciprocate and sipXtapi 3.3.0 test17 library package for Ubuntu to setup a simple conversation server using recon.
What I want to do is capture the audio stream during a conversation, in either a file or a buffer. Recon has a class for remote participants and it is instantiated every time a new remote caller enters the conversation. I have found that in every remoteParticipant the sipXtapi ConnectionID is saved and also the MediaInterface pointer. By accessing those 2 I am calling the RecordChannelAudio function of Media interface like this:

participant->getMediaInterface()->getInterface()->recordChannelAudio(participant->getMediaConnectionId(),"/tmp/test.wav");

Confirmation from sipXtapi is OK that starts recording (although verifying the connectionID as -1 and not the ConnectionID saved in participant). The wav file is created and the length is quite close to the actual call duration that the participant is attending. But the audio file is total silence. Do I have to make any adjustments to the flowgraph to connect the recorder with the channel and capture the audio?? Maybe I have to modify the mixer weights somewhere?


Regards,
Spiros

Daniel Petrie

unread,
Jun 16, 2015, 12:52:27 PM6/16/15
to si...@googlegroups.com
Hi Spiros:
In the latest version checked into subversion in the sipX main branch, the connectionId is no longer used at all in recordChannelAudio.  I am not sure where 3.3.0 took its snapshot, but connectionId has been ignored for at least 6 months, perhaps longer.

The weights for the recorder output on the bridge should all be set to 1.0 by default.  It is possible that something in recon is setting them to something different.  I would turn on debug logging for sipX and look for a message like the following:
 "MprBridge::handleSetMixWeightsForOutput(outputPort: %d, numWeights: %d, weights[%s]"

The default output port on the bridge mixer for the recorder is 0 unless someone has customized the topology.

You can set the bridge recorder port output weights back to 1.0 using:
participant->getMediaInterface()->getInterface()->setMixWeightForOutput(0, 1.0)

Cheers,
Dan



--
You received this message because you are subscribed to the Google Groups "sipX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sipx+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Spiros Dimopoulos

unread,
Jun 17, 2015, 4:52:08 AM6/17/15
to si...@googlegroups.com, dpe...@sipez.com
Dan,

Thanks for your answer. Your help was invaluable. It seems that recon sets the mixer outputs all to zero except for the calls that the participant is attending, consequently muting the recorder input. By setting the weight back to 1.0 it does the trick and audio signal is restored for the recorder resource.

Spiros

Spiros Dimopoulos

unread,
Jun 18, 2015, 3:24:25 PM6/18/15
to si...@googlegroups.com, dpe...@sipez.com
OK let me ask another question. I set up a recorder in the participant class in recon and, by modifying the bridge weights in output 0, I can record the participant of my choice. Each participant's network RTP stream is connected to a different bridge port. Now let's say that I want to record the call audio, but keep each participants' audio contribution signal in separate audio files. Given the default topology which comes with one recorder in each media interface, this seems impossible. The media interface that comes with the MediaAdapterLib and the method RecordChannelAudio seems to use only the default recorder. What is the recommended path to solve this? Is it possible to modify the resource topology and add another recorder somehow and then connect it to a different port in bridge to capture the audio to different files for each participant?

S.

Daniel Petrie

unread,
Jun 24, 2015, 3:36:22 PM6/24/15
to Spiros Dimopoulos, si...@googlegroups.com, Scott Godin
Hi Spiros:
Sorry, I missed your message earlier.  Yes there is one recorder in the default topology.  The flowgraph construct was originally intended to be used on a single call or conference (where typically you want to record the whole conference mix or a subset of it).  In most applications, there are many flowgraphs instantiated.  In those cases you have a separate recorder for each call or conference/conversation.  Participants are moved from one flowgraph to another to facilitate changing of the conference/conversation participants.

Recon uses the flowgraphs a little differently.  It only uses a single flowgraph.  The upside of this is that you can change conversation/conference/mix participants instantly on the fly (moving particiants from one flowgraph to another, requires temporary holding of the media.).  The downside is there is only a single flowgraph and its associated resources.

Options to have multiple recorders are:
1) Change recon to use multiple flowgraphs
2) Add multiple recorders to the topology inline with the bridge inputs from the RTP and mic streams.  Hence having a dedicated recorder for each source.
3) Add multiple recorders to additional bridge outputs.  Giving the ability to set the output mix to each additional recorder.

I am not sure how much work the 1st option would be, but if you have many 2-way calls or lots of small conversations, it is much more efficient to have more flowgraphs with smaller numbers of bridge mixer inputs and output, than a single flowgraph with many partipants.  The cost of mixing is related to the square of the number of inputs/output on the bridge.

The later 2 options require that you modify the default topology definitions and then send messages to the new recorders.  This would require lower level interfaces than the CpTopologyGraphInterface which is primarily used by application.

Cheers,
Dan




On Thursday, June 18, 2015 3:24 PM, Spiros Dimopoulos <sdi...@gmail.com> wrote:


OK let me ask another question. I set up a recorder in the participant class in recon and, by modifying the bridge weights in output 0, I can record the participant of my choice. Each participant's network RTP stream is connected to a different bridge port. Now let's say that I want to record the call audio, but keep each participants' audio contribution signal in separate audio files. Given the default topology which comes with one recorder in each media interface, this seems impossible. The media interface that comes with the MediaAdapterLib and the method RecordChannelAudio seems to use only the default recorder. What is the recommended path to solve this? Is it possible to modify the resource topology and add another recorder somehow and then connect it to a different port in bridge to capture the audio to different files for each participant?

S.

Τη Τετάρτη, 17 Ιουνίου 2015 - 11:52:08 π.μ. UTC+3, ο χρήστης Spiros Dimopoulos έγραψε:
Dan,

Thanks for your answer. Your help was invaluable. It seems that recon sets the mixer outputs all to zero except for the calls that the participant is attending, consequently muting the recorder input. By setting the weight back to 1.0 it does the trick and audio signal is restored for the recorder resource.

Spiros

Τη Τρίτη, 16 Ιουνίου 2015 - 7:52:27 μ.μ. UTC+3, ο χρήστης Dan Petrie έγραψε:
Hi Spiros:
In the latest version checked into subversion in the sipX main branch, the connectionId is no longer used at all in recordChannelAudio.  I am not sure where 3.3.0 took its snapshot, but connectionId has been ignored for at least 6 months, perhaps longer.

The weights for the recorder output on the bridge should all be set to 1.0 by default.  It is possible that something in recon is setting them to something different.  I would turn on debug logging for sipX and look for a message like the following:
 "MprBridge:: handleSetMixWeightsForOutput( outputPort: %d, numWeights: %d, weights[%s]"

The default output port on the bridge mixer for the recorder is 0 unless someone has customized the topology.

You can set the bridge recorder port output weights back to 1.0 using:
participant-> getMediaInterface()-> getInterface()-> setMixWeightForOutput(0, 1.0)

Cheers,
Dan



On Tuesday, June 16, 2015 11:25 AM, Spiros Dimopoulos <sdi...@gmail.com> wrote:


Hi,

I am using version 1.9.8 of reciprocate and sipXtapi 3.3.0 test17 library package for Ubuntu to setup a simple conversation server using recon.
What I want to do is capture the audio stream during a conversation, in either a file or a buffer. Recon has a class for remote participants and it is instantiated every time a new remote caller enters the conversation. I have found that in every remoteParticipant the sipXtapi ConnectionID is saved and also the MediaInterface pointer. By accessing those 2 I am calling the RecordChannelAudio function of Media interface like this:

participant-> getMediaInterface()-> getInterface()-> recordChannelAudio( participant-> getMediaConnectionId(),"/tmp/ test.wav");


Confirmation from sipXtapi is OK that starts recording (although verifying the connectionID as -1 and not the ConnectionID saved in participant). The wav file is created and the length is quite close to the actual call duration that the participant is attending. But the audio file is total silence. Do I have to make any adjustments to the flowgraph to connect the recorder with the channel and capture the audio?? Maybe I have to modify the mixer weights somewhere?


Regards,
Spiros

Spiros Dimopoulos

unread,
Jul 27, 2015, 12:52:29 PM7/27/15
to sipX, sgo...@sipspectrum.com, dpe...@sipez.com
Hi Dan,

I managed to modify the default topology graph and add another recorder connected to a different bridge output port. By modifying the weights as you suggested I can get each incoming audio stream in a different recorder instance. Also modified the recorder class and added TCP relaying of audio streams (instead of filesystem output) to another ip address and port. Unfortunately, I am now facing some audio corruption issues. If I use the default frame size of 10ms, some zero inserted samples/frames appear in the call audio (in both participants). The original incoming audio gets time expanded by these insertions. Increasing the frame size to 40msec seems to reduce this effect in about 20-30% of cases. By further increasing the frame size towards 100-120msec another, the audio starts to deteriorate again. This time I have audio frames replaced by zero frames, so audio sounds corrupted.
Also it seems that the speex and gsm codecs are not working as expected. Although I have managed to compile them in the sipXtapi installation, when speex is used by the SIP clients the call is not relayed at all. GSM codec loses data (~1-2 seconds in the beginning of the call) and randomly crashes the RTP relay agent. I have used the latest Ubuntu speex and gsm dev packages for compilation and linking. Do you have any suggestion about these issues?

Thank you,
Spiros

Daniel Petrie

unread,
Jul 27, 2015, 4:59:22 PM7/27/15
to si...@googlegroups.com, Spiros Dimopoulos
Hi Spiros:
I think that I need to understand a little more before I can come up with ideas of where the problem is.

The default resource topology, centered around the bridge mixer, looks like the following:
(hopefully not too mangled in email)
Inputs       --------  Outputs
             |Bridge|
             |Mixer |
3+) RTP in  -|      |- 3+) RTP out
2) fromMic  -|      |- 2) toSpeaker
1) ToneGen  -|      |- 1) null
0) FromFile -|      |- 0) recorder
             --------

If I understand correctly you have added more recorders to the topology, something like the following:
Inputs       --------  Outputs
             |Bridge|
             |Mixer |
6+) RTP in  -|      |- 6+) RTP out
5) fromMic  -|      |- 5) toSpeaker
4) ToneGen  -|      |- 4) null
3) ??       -|      |- 3) recorder-3
2) ??       -|      |- 2) recorder-2
1) ??       -|      |- 1) recorder-1
0) FromFile -|      |- 0) recorder-0
             --------

Is this the exact topology or is it different?  Did you add any resources for the corresponding inputs on the bridge to the additional recorders?  It may not be necessary, but I am just trying to understand the situation.

You said there is a corruption in the call audio.  Are you referring to the audio sent to a remote client via RTP?    Also you said that there are extra frames of audio samples all zero'ed out.  How do you know this,  are you looking at an RTP capture of what was sent or or some sort of recording or local debug output?  Knowing where and how you observed this, is helpful in understanding what is going on.

When using 10 ms frames, what is the nature of the zero'd frames?  Are the extra frames of zero magnitude samples at seemingly random times or are they at a regular interval?

Have you reviewed the sipX log to look for error messages?

Cheers,
Dan

Spiridon Dimopoulos

unread,
Jul 29, 2015, 5:58:39 PM7/29/15
to Daniel Petrie, si...@googlegroups.com
Hi Dan,

The topology looks great, thanks really informative. Well in fact because the recon is actually messing up with the weights every time a new participant is added or removed from the conversation, I had to keep the first ports intact and I’ve adde the new recorder in the first available port after port 3, which actually was port 4 as my application is doing simple 2-way call relaying and the local audio is disabled (acting in fact as B2BUA). So my topology looks like this:

Inputs       --------  Outputs
             |Bridge|
             |Mixer |
4) ???????  -|      |- 4) recorder-new
3) RTP in2  -|      |- 3) RTP out2
2) RTP in1  -|      |- 2) RTP out1
1) ToneGen  -|      |- 1) null
0) FromFile -|      |- 0) recorder
             --------

I did not mess up with port 4 input, so I really am not sure what is hooked up there, most probably nothing.
My testing setup is to call the recon-based app by using either a SIP client or Asterisk and then record the audio in another Asterisk:

Asterisk ===>>==<<== recon-app ===<<==>>==== Asterisk ---> recorded1.wav
                        |
                        |
                        +-> recorded2.wav

So I have the recorded channel audio in two files, which are identical and both have various distortions which seem to be zero frame insertions in random positions inside the signal, not one or two samples, but big chunks of zero-valued samples. They usually happen during silence sections of signal, thus expanding the signal time in total. But could happen inside voiced sections and make the voice sound broken like an old CD with scratches. Especially when I increase the number of concurrent call processing by the app (for example 10 calls in parallel), the sound degrades further and these zeroed sections become very frequent and voice becomes impossible to understand.
I did not manage to find the sipX log file. But the sipXtapi log seems to be mapped through recon class function and is output together with resiprocate logs. I did read this and did not find any errors.

Spiros
Reply all
Reply to author
Forward
0 new messages