Synchronize inbound video track from remote JS peer and Native API peer local frame decoding

576 views
Skip to first unread message

Francesco Pretto

unread,
Oct 11, 2018, 9:43:20 AM10/11/18
to discuss-webrtc
Hello,

it follows the design I'm trying to implement in the proper way: I have a JS peer that is sending a video track to a Native API peer. At some point during the transmission (actually immediately after the connection has been established, but it could be at any moment) I want to start a stopwatch on JS peer side and perform some temporized operations, actually some rendering on a canvas overlaying the video playback. On Native peer side I want to be able to synchronize on the instant the stopwatch started on JS peer, and consider only received frames recorded after that instant, performing some other kind of processing. What I am doing now (fragile and limiting solution):
- As soon as the peers connect, tracking RTCPeerConnection.iceConnectionState, I start the stopwatch on the JS peer;
- As soon as the first webrtc::VideoFrame arrives on the Native peer I store the frame timespam;
- On Native peer I use first frame timestamp to compute relative time in a similar way the stopwatch allows me on JS peer.

This design is limiting because I may want to synchronize on any instant, not just on peers connection establishment, and also fragile because I think the WebRTC protocol is allowed to drop the very first received frames for any reason (delays or transmission errors). Ideally I would like to take a timestamp at the chosen synchronization point in the JS peer, send it to the Native peer and be able to compare webrtc::VideoFrame timestamps. I am unable to do it naively because VideoFrame::timestamp_us() is clearly skewed by some amount I am not aware of. Also I can't interpret VideoFrame::timestamp(), which is poorly documented in api/video/video_frame.h, VideoFrame::ntp_time_ms() is deprecated and actually always return -1. What I should do to accomplish this kind of synchronization between the two peers?

Regards,
Francesco 

Francesco Pretto

unread,
Oct 15, 2018, 6:22:47 AM10/15/18
to discuss-webrtc
Bump. The question looks interesting to me and lack a definitive answer if Native Code offers facilities to compare inbound webrtc::VideoFrame timestamps to remote endpoint clock. I posted the same question to StackOverflow[1] and I got suggestion to encode a signal/metadata in the sent frames in JS side: even supposing API support is widespread available on JS side to perform such task, the question is still if I can get more remote endpoint timestamp information on Native Code side, information which I still think should be available somewhere in the stack. Thank you very much for any hint.

Regards,

Francesco Pretto

unread,
Oct 17, 2018, 6:07:22 PM10/17/18
to discuss-webrtc
In a attempt to be constructive I did some research on the matter. From reading RFC 3550 and debugging Native Code implementation I discovered that:
- the RTP timestamps present in every incoming frame webrtc::VideoFrame starts from a initial random value (5.1 RTP Fixed Header Fields);
- RFC 3550 prescripts a sender to periodically send a Sender Report (6.4.1 SR: Sender Report RTCP Packet) with contains NTP timestamp, which is supposed to be the sender system wallclock and a RTP timestamp for the current SSRC (source identifier);
- The SR allows to correlate RTP timestamps to sender absolute time, allowing for example to synchronization different RTP sources (example audio and videos);
- Empirically verified: all major browsers I tested (Chrome, Firefox, Edge, Safari) respects the prescription of RFC 3550 so that NTP wallclock is seconds since 1 January 1900 UTC (4. Byte Order, Alignment, and Time Format).

To achieve my intended design, specifically to correlate a sync point in the remote JS peer to Native Code incoming video frame timestamps, I should do the following:
- In the JS peer take Date.now() at the sync point and convert to NTP wallclock time (seconds since 1 January 1900 UTC). Send this timestamp to the Native Code peer;
- Correlate RTP timestamps of incoming webrtc::VideoFrame to sender NTP wallclock. Compare these NTP timestamps to the above sync point NTP timestamp.

From static analysis and live debugging of Native Code it seems all utility classes to perform this task are available but **not** properly exposed nor fed with useful data. In particular:
- webrtc::RtpToNtpEstimator is supposed to correlate RTP to NTP timestamps;
- In webrtc::vcm::RtpVideoStreamReceiver an instance of RtpToNtpEstimator is wrapped through webrtc::RemoteNtpTimeEstimator;
- RemoteNtpTimeEstimator has a completely different purpose than RtpToNtpEstimator: it's meant to estimate a NTP timestamp based on **local** wallclock, not the sender one;
- Interesting enough, RemoteNtpTimeEstimator is never fed with parameters because to work it also need RTT (round-trip time, refer to RtpVideoStreamReceiver::DeliverRtcp()). RTT seems to be computed only when Receiver Reference Time Reports (RRTR), described by RFC 3611[2], are received. Even if "exposed" in a recent patch[2] in M67, RRTR are disabled and non functional by default.

Summarizing, what I'm asking to is actually a missing feature and requires quite some internals knowledge to put it in place. I actually sketched an attack plan to implement the feature:
- Add a publicly accessible "sender_ntp_time_ms" field to webrtc::VideoFrame and all intermediate result classes that precede in the chain to allow propagation;
- Make it possible in webrtc::RemoteNtpTimeEstimator to use an external instance of webrtc::RtpToNtpEstimator;
- Add two instances, respectively webrtc::RtpToNtpEstimator and webrtc::RemoteNtpTimeEstimator, to base class webrtc::RtpData, or inherit it so it can be used by both webrtc::vcm::RtpVideoStreamReceiver or webrtc::Channel (used for audio?). Instance of RemoteNtpTimeEstimator should use the external RtpToNtpEstimator;
- On the received Rtcp packets, inheritors should update either RemoteNtpTimeEstimator or the RtpToNtpEstimator, depending on the availability of the RTT;
- RtpVideoStreamReceiver (and other classes for audio) should always estimate "sender_ntp_time_ms" from local RtpToNtpEstimator and set it to intermediate classes in the pipeline (probably RtpVideoStreamReceiver::OnReceivedPayloadData) so it will be eventually propagated to webrtc::VideoFrame.

What do you think? Makes sense?

Regards,
Francesco

Alex Cohn

unread,
Oct 21, 2018, 3:27:39 AM10/21/18
to discuss-webrtc
Hi Francesco,

I believe that what you ask for a very legitimate feature, which would be nice to have for many real-life scenarios. Unfortunately, it's not there, and while the native side can be patched, the JS (browser) side will take much longer to comply.

In the mean while, the question is, for practical purposes, do you see the wall clock skew between the two edges so significant that synchronization by the clock, not packet timestamps, is really critical.

BR,
Alex

Francesco Pretto

unread,
Oct 22, 2018, 5:06:26 AM10/22/18
to discuss...@googlegroups.com
On Sun, 21 Oct 2018 at 09:27, Alex Cohn <sasha...@gmail.com> wrote:
> I believe that what you ask for a very legitimate feature, which would be nice to have for many real-life scenarios. Unfortunately, it's not there, and while the native side can be patched, the JS (browser) side will take much longer to comply.
>

For what it concerns to my use case, not much is actually needed on JS side:
- It would be useful to have something like a standardized method
WebRTC.ntpNow() which returns the current system implementation for
NTP timestamps. According to rfc 3550 (6.4.1), depending on system
features, this may be system uptime or in the worse case 0 if the
system doesn't have a counter for elapsed time;
- there should some prescription in the rfc that RTP timestamps are
computed relatively to send time, not frame presentation/encoding
time. For now, I'm just assuming it already works like this: I may be
wrong.

> In the mean while, the question is, for practical purposes, do you see the wall clock skew between the two edges so significant that synchronization by the clock, not packet timestamps, is really critical.
>

The Native Code doesn't specify very well what
VideoFrame::timestamp_ms() is, the only certain thing is that is in
local clock time base. It could be an arbitrarily chosen presentation
time, which would be useless to me. It could be RTP timestamps
converted to absolute time in local clock timebase: to use those so I
can do a meaningful comparison Native Code should compute RTT which
according to my investigation is not doing. Also I should compute my
self RTT or find it somewhere to convert a timestamp on JS side taken
with Date.now() in Native Side clock. In short, I should still do much
research on Native Code and it could still be a risky approach: I
would prefer Native Code stack to evolve so things can be done in a
much more robust way. For practical purposes I already have my
workaround I explained in the first post, which currently fits my
needs enough.

Francesco Pretto

unread,
Feb 16, 2019, 12:22:44 PM2/16/19
to discuss-webrtc
I worked on my original plan and got results. Not receiving comments from project members here I opened the following issue:


The patch is posted there. I'm also attaching it here together with the introduction message that follows.

----------------------------------------------------
Hello,

it follows a proposed patch to add a webrtc::VideoFrame::sender_ntp_time_ms() method, identifying the NTP wallclock time of the sender for the specific frame. Differently from other timestamps present in webrtc::VideoFrame, like VideoFrame::timestamp (the RTP timestamp) or VideoFrame::ntp_time_ms (the timestamp of the frame in local NTP time) that have little or no usefulness for final user code, and in fact are marked for future removal, the sender NTP time is extremely useful for appliances that requires synchronization of events that happen in the remote peer. For example consider the following design:

- a JS peer is connecting to a Native peer sending a video stream, having a feedback of transmitted frames on a <video> element. The JS peer wants to perform some temporized operations on the UI, like rendering visual clues on a overlay canvas at precise timepoints (NOTE: the clues won't be transmitted in the video stream). The JS peer can synchronize on the RTC connection establish event, but ideally it may want to synchronize on any other event, like for example a connection quality assessment performed by the remote Native peer;
- The remote Native peer is receiving the video stream: it wants to process frames according to temporized overlay clues that happen on the client side. The Native peer can synchronize on first received frame, but first frame could get lost making the design fragile. Ideally it would want to synchronize at more accurate timepoints signaled by the JS client.

Since Native WebRTC is currently just focusing on sending synchronized audio/video, accurate synchronization of events that happens in the peers is not currently possible. This design has been already illustrated to webrtc-discuss[1], sadly not attracting interest of any WebRTC project member despite such feature would allow very interesting applications to be developed without use of hacks.

The positive think is that facilities to allow synchronization of events between remote peers can be added to Native WebRTC very easily. In fact, any compliant peer will send NTP time of sent packets encoded as RTP timestamps. Estimating of RTP timestamps in sender NTP time[2] is currently performed in two places in Native WebRTC:
1) While synchronizing audio and video, in webrtc::RtpStreamsSynchronizer;
2) When estimating Round-trip time (RTT), needed to compute frame timestamps in local NTP time, which is **not** enabled by default[4],
in webrtc::RtpVideoStreamReceiver::DeliverRtcp() for video and webrtc::ChannelReceive::ReceivedRTCPPacket() for audio.

Because I'm just processing video and not audio, I can't plug in RtpStreamsSynchronizer very easily. Instead it's quite easy to plug in RtpVideoStreamReceiver and ChannelReceive separately by enabling RemoteNtpTimeEstimator to independently estimate sender NTP time even when no RTT is available. The proof of concept patch does exactly this and with boring propagation steps it exposes sender NTP time to webrtc::VideoFrame. The only chain link missing to close the circle in the above design is a simple javascript function to add in the JS peer side:

    function WebRTC() {};

    // Returns NTP wallclock, milliseconds since 1 Jan 1900
    WebRTC.ntpNow = function()
    {
        // Reference: https://tools.ietf.org/html/rfc5905, NTP Date Format
        // Reference2: https://tools.ietf.org/html/rfc3550, 4. Byte Order, Alignment, and Time Format
        // Reference3: https://chromium.googlesource.com/external/webrtc/+/master/system_wrappers/include/clock.h -> kNtpJan1970
        const NtpJan1970Ms = 2208988800000;
        return Date.now() + NtpJan1970Ms;
    };

This function WebRTC.ntpNow() just returns the now timestamp in NTP reference time (1 Jan 1900, rfc3550 4. Byte Order, Alignment, and Time Format). This timestamp can be sent trough the wire and Native peer can use it to synchronize an event that happens on JS peer. Even if time is not accurate in the remote peer the only requirement to have accurate synchronization is that it matches the WebRTC NTP platform implementation. It has been tested to work on Chrome/Firefox/Edge on Windows and Chrome on Android.

The Native WebRTC patch applies to M72 (commit 784fccbd71). While I would love to personally commit these bits to WebRTC, I am currently asking if WebRTC project members could patronize the feature for the following reasons:
1) merging the patch as-is would result in having two more NtpTimeEstimator(s) always working when also synchronizing audio/video. This is because what is currently performed in RtpVideoStreamReceiver::DeliverRtcp() and ChannelReceive::ReceivedRTCPPacket() is done just when RTT is available, which again is **not** enabled by default[4]. A lot of cleaning would be required to have webrtc::RtpStreamsSynchronizer to use NtpTimeEstimator(s) of RtpVideoStreamReceiver and ChannelReceive, for example;
2) I currently do just Video, so on audio side the patch ends with exposing sender_ntp_time_ms to audio_frame (which is still compressed audio, I think);
3) there would be required some discussions on current naming of member variables. For example having sender_ntp_time_ms makes unclear what ntp_time_ms is. Better qualified name for ntp_time_ms would be a must.

Please, let me know what do you think about the work and don't esitate to ask me for any clarification.

[1] https://groups.google.com/d/msg/discuss-webrtc/npYIyxSBOLI/oiN_49o8BgAJ
[2] Through "Sender Reports", 6.4.1 https://tools.ietf.org/html/rfc3550
[3] Through "Receiver Reference Time Reports" (RRTR), 4.4 https://tools.ietf.org/html/rfc3611
[4] https://bugs.chromium.org/p/webrtc/issues/detail?id=9108
videoframe_sender_ntp_time_ms.patch

Francesco Pretto

unread,
Mar 8, 2019, 9:07:46 PM3/8/19
to discuss-webrtc


On Saturday, February 16, 2019 at 6:22:44 PM UTC+1, Francesco Pretto wrote:
I worked on my original plan and got results. Not receiving comments from project members here I opened the following issue:


The patch is posted there. I'm also attaching it here together with the introduction message that follows.



The patch has been updated to also support adding sender ntp time to audio sinks as well, I attached it here and in the issue above. I tested with a video and an audio track from a Javascript client: It works but the number of SR reports for the audio tracks is very low compared to video, meaning that audio will be able to estimate sender time for samples after 5-6 seconds, compared to video ~0.5 seconds. Can anyone suggest a way to request the client to send more SR reports for the audio track by editing the SDP or using the Native WebRTC API?
add_sender_ntp_time_ms.patch
Reply all
Reply to author
Forward
0 new messages