Hello,
I'm trying to implement the same mechanism used by webrtc compliant browsers (chrome, firefox, etc...) to achieve lip-lynchronization at presentation time.
I'm using an SFU that allow me to access all the decoded RTP and RTCP packets data including timestamp, ntp timestamp, sequence number, source id, etc... (
rfc3550)
I use
libav to remux these packets to produce a recording of Webrtc streams. All works fine except for the fact that often in my recording, the audio is not synced with video (usually audio comes later than correspondent video).
This problem is know also as "lip-synchronization" as said before and, as far as I know, is solvable by using the information regarding timestamp and ntptimestamp contained in RTCP SR packets.
I checked chromium source code and I saw that it uses a sort of algorithm to deal with this problem using RTCP SR reports, but unfortunately I'm not able to fully understand chromium code since it is quite complex and split in severals files.
Does anyone knows, and can explain me in details, how this mechanism works?
Up to now I can extract the data necessary to implement an algorithm, but I'm not sure how to use it.
I can extract:
- RTP timestamp
- RTP Squence number
- RTCP SR timestamp
- RTCP ST ntp timestamp
- Other RTCP SR headers listed in rfc3550
My guess is that my recordings are not synced since at the moment I do not use RTCP SR info to compensate delays in packet reception.
Notice that these very same packets are also forwarded to the receiver client that receive and display the media with audio and video correctly synced so, obtain a synced recording should be possible.
Thanks in advance.
Don't hesitate to ask for clarifications.
Simone