DataChannel and video frame synchronization

2,774 views
Skip to first unread message

Lisen Mu

unread,
Jan 7, 2018, 10:49:41 PM1/7/18
to discuss-webrtc
Hi,

I've searched in google and in this group and find out that the short answer is NO: I cannot precisely synchronize my meta data sent through DataChannel with frames, with current javascript API.

I think this is a very common and important feature, especially for usecases in AI industry.

One example would be overlaying video frames with bounding boxes and labels, which are generated in the remote peer to annotate objects in the video with modern cv algorithms and are sent though datachannel.

In this case, unlike lyrics or movie subtitles, frame-to-frame precision level synchronization mechanism is mandatory, otherwise the bounding boxes would be inaccurate for moving objects.

Currently, many research & industry efforts have pushed AI capability from clusters in data centers to endpoint devices(cellphone, camera, smart gadgets etc.). Hardware/software stack eligible for these low-power devices are maturing, pushing a trend so-called 'edge computing', in which raw data is more likely to be processed where it is generated. Tensorflow Lite, Tengine to name a few.

If WebRTC could provide such synchronization method, it has the potential to become the de-facto default access protocol for devices like ip cameras with cv capabilities. We can provide unified user experiences by creating user interface in html/js, using webrtc to communicate with remove devices.


Correct me if anything wrong,

Thanks

Harald Alvestrand

unread,
Jan 7, 2018, 11:27:23 PM1/7/18
to WebRTC-discuss
You are not the first. What times do you want to know? When the frame was produced, when it was received, when it was successfully decided, when the browser intends to display it, or when the browser actually displayed it? Unfortunately all these are different, and the relationship between them varies - before designing a mechanism, we should be sure we solve the right problem.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/2691d45d-f5ae-4633-af41-35527cc61448%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lisen Mu

unread,
Jan 8, 2018, 2:38:50 AM1/8/18
to discuss-webrtc
Thanks for the reply,

In my scenario, the problem is: the data generated by processing a frame should be rendered together with that frame.

Which means there are two problems for the receiver: a) which part of meta-data in DataChannel is associated to each frame, and b) what is the right time to render these data.

For a), a certain kind of ID mechanism could be used to map data in DataChannel to frame. Since there are no, -or it seems too heavy to support, user defined attributes of frames, the frame production timestamp could serve the purpose of such ID, as long as this timestamp is accessible from API.

For b), callback hook at render time would be sufficient.

gustav hallström

unread,
Aug 27, 2018, 9:56:48 AM8/27/18
to discuss-webrtc
Hi,

I also would like to see a way to synchronize datachannel messages with frames. I work in an application were we perform calculations on a frame before it is being sent. The result of these calculations are then sent via a data channel to the remote peer were they are used for image processing. If it was possible to check on the remote peer when the frame was created on the sender side or if I would be able to see a frame count this would make it possible for me to synchronize the frame with the message on the data channel.

I have looked through the source code and seen multiple timestamps. It seems to me that all these are set on the remote peer. Is this correct? I have also seen a variable named "transport_frame_id" which sounds like what I am looking for. Right now however this only returns the rtp timestamp. Is this variable destined for a different purpose in the future?

Thanks
//Egga

Harald Alvestrand

unread,
Aug 28, 2018, 1:06:51 AM8/28/18
to WebRTC-discuss
Seems you are looking for a way to identify frames when you capture them at the sending side, and a way to get the same identifier when you capture the same frame on the receiving side. Is that correct?

Tying the question to time makes it harder to solve, since we need to consider things like which clock we're referring to at any given time (sender's and receiver's clock are generally not completely in sync).


--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrtc+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/79fb5f63-c0b1-4479-9f82-eea86f40d5c8%40googlegroups.com.

gustav hallström

unread,
Aug 28, 2018, 7:36:08 AM8/28/18
to discuss-webrtc
Clocks out of sync would not be a problem for us since we want to compare two timestamps that are both set on the sending side.

We first capture a frame via Windows built in apis. This frame then goes through some calculations. The result of these calculations are then sent over a datachannel together with a timestamp set by ourselves. While still on the sending side we then convert the same frame to a webrtc VideoFrame which gets its own timestamps. Our problem is however that the timestamps on this frame are set anew on the receiving side. We would like to keep the timestamps set on the sending side so we could compare these to the timestamps of the calculations.

In short we want to match the result of our calculations with the frame they were performed on.

The best solution would be if we had a frame sequence number since we then could compare the frame number for the calculations with the frame number of a received frame.

gustav hallström

unread,
Aug 30, 2018, 5:42:32 AM8/30/18
to discuss-webrtc
It seems to me that a way to track frames from the sender by looking at a sequence number on the receiving side would be very useful in many scenarios. This would enable syncing messages over a datachannel with frames.

Is this something you are considering implementing?

Harald Alvestrand

unread,
Aug 30, 2018, 7:05:28 AM8/30/18
to WebRTC-discuss
Yes, it's been considered - there's no active project that has it on its TODO list just now, AFAIK.

One problem is finding (or synthesizing) an identifier that is both available as early as possible and visible at both the sender and the receiver - the RTP timestamp is visible in the code at both ends, for instance, but it's only assigned to the video frame when it is being sent - while the capture timestamp is only visible at the sender side; it's not moved over the wire in any standard format AFAIK, so adding it would be a protocol extension.

On Wed, Aug 29, 2018 at 4:11 PM, gustav hallström <gust...@gmail.com> wrote:
It seems to me that a way to track frames from the sender by looking at a sequence number on the receiving side would be very useful in many scenarios. This would enable syncing messages over a datachannel with frames.

Is this something you are considering implementing?

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrtc+unsubscribe@googlegroups.com.

Eric Davies

unread,
Aug 30, 2018, 1:36:52 PM8/30/18
to discuss-webrtc
One admittedly ugly approach:  embed a code in the image itself, along one of the edges, and mask that edge. The code would have to be coarse enough that it would survive lossy compression of a certain degree. For example, if you used a row of black or white blocks of size 4x4, you might be safe for reasonable levels of loss. On the sending side, you could do this by rendering the local video into a canvas, drawing your code on the canvas, and getting a media stream from the canvas (workable on Chrome, not sure about other browsers).  On the receiving side, you crop that edge by putting the video object in a slightly smaller div.

I believe people have used this approach in the past to synchronize audio with video, by encoding the audio as part of the each video frame.

gustav hallström

unread,
Aug 31, 2018, 5:46:29 AM8/31/18
to discuss-webrtc
Ok, thank you for your response! Just as a note, we would greatly appreciate any implementation of frame data channel synchronization in the future.

Cao Phong

unread,
Aug 31, 2018, 5:46:53 AM8/31/18
to discuss-webrtc
This can be achieved using RTP Header Extensions but it's only available in native APIs. Here's some examples:
https://github.com/phongcao/webrtc-metadata-example
https://github.com/phongcao/webrtc-mrvc-sample

gustav hallström

unread,
Aug 31, 2018, 5:47:31 AM8/31/18
to discuss-webrtc
Thanks for the suggestion Eric. Yes this is something we are thinking about as it seems to be the best solution to the problem right now. I did not know that people have used this for audio syncing however. That is good to know!

Alexandre GOUAILLARD

unread,
Sep 1, 2018, 12:22:22 AM9/1/18
to discuss...@googlegroups.com
RTP header extension help embedding the data, but finding the right data to sync can be problematic. Also, while an audio frame can be contained in a single audio frame, usually a video frame will be split into several rtp packets, so if you have metadata attached to a single frame, you might end up duplicating it several time in each packet corresponding to a frame. Finally, in webrtc, the maximum size of an rtp packet is limited to fit inside an MTU (1.400?) so it does not get further segmented during transport at the network layer. If the metadata you are trying to add is too large, you will have a problem 

You could embed it directly in the video frame, before the encoding using the native multiplex class. "Currently it is a completely modular/optional codec you can add that can send/receive RGBA frames + custom metadata. It is used in some of [Google's] AR/VR projects." It avoid most of the above problems.


On Fri, Aug 31, 2018 at 5:47 PM gustav hallström <gust...@gmail.com> wrote:
Thanks for the suggestion Eric. Yes this is something we are thinking about as it seems to be the best solution to the problem right now. I did not know that people have used this for audio syncing however. That is good to know!

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/444a661d-346e-418d-af67-36598d740ae4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Alex. Gouaillard, PhD, PhD, MBA
------------------------------------------------------------------------------------
President - CoSMo Software Consulting, Singapore
------------------------------------------------------------------------------------

Alexandre GOUAILLARD

unread,
Sep 1, 2018, 12:26:33 AM9/1/18
to discuss...@googlegroups.com
sorry, wrong link for the multiplex encoder class, please find the correct one below:

gustav hallström

unread,
Sep 3, 2018, 4:53:45 AM9/3/18
to discuss-webrtc
Thanks Alexandre!
Very interesting suggestion. We are unfortunately not able to use the extension if it is not included in the main source. However it would be great to be able to send RGBA frames.

gustav hallström

unread,
Sep 3, 2018, 4:53:45 AM9/3/18
to discuss-webrtc
Thank you Cao Phong! Unfortunately we need to be compatible with javascript api so rebuilding webrtc is not an option. Otherwise it would have been a great solution.

Jesse G

unread,
Dec 19, 2018, 2:48:35 AM12/19/18
to discuss-webrtc
Hello all, thanks for the great info in this thread!  

I believe my goal is similar to those expressed here: in particular, I'm looking preserve the association of data with a specific video frame across the webrtc link, where the receiving end of the connection is in javascript (in-browser).  This could take the form of the sender assigning the frame a unique ID that is accessible by the receiver, or it could even be done through a read-only value that may be already present and accessible by both ends of the link that could be used as a key (such as a consistent timestamp).

I'm getting the picture from this thread that it may not be possible, which is a surprising answer!

It sounds like there is no such timestamp or other value usable as a key that is already accessible both to the sender and to a (javascript) receiver? 

If I had control over the native code at both ends, it looks like I could try the custom RTP header strategy that Cao Phong created, or the custom codec strategy that Alexandre Gouaillard mentioned, however in my case I need to use a receiver running in a stock browser.

Alexandre, it looks like the custom codec you described is in the chromium repository, is there a chance this strategy is destined to make it into the stock chrome browser?  Or does anyone know of any other plans in the works that would make this synchronization possible in the future?

Gustav did you end up finding a solution?  Or did you have to go with a visual barcode or similar?

Thanks very much for any hints!!
Jesse

Alexandre GOUAILLARD

unread,
Dec 19, 2018, 3:26:49 AM12/19/18
to discuss...@googlegroups.com

This could take the form of the sender assigning the frame a unique ID that is accessible by the receiver, or it could even be done through a read-only value that may be already present and accessible by both ends of the link that could be used as a key (such as a consistent timestamp).

Unless you generate the frames yourself (from a canveat), and you have a synchronized clock across, this is not possible. Specifically, today the output of a capturer is a video track which is opaque. Correspondingly on the receiving side, unless you bypass the <video> element completely (not recommended by any mean), you do not control the rendering speed nor time, so you cannot re-sync data and frame.

I'm getting the picture from this thread that it may not be possible, which is a surprising answer!

That was no tin the original requirements for WebRTC 1.0. It is now in the requirements of webrtc NV.  

It sounds like there is no such timestamp or other value usable as a key that is already accessible both to the sender and to a (javascript) receiver? 

Yes, the media path is pretty opaque. 

If I had control over the native code at both ends, it looks like I could try the custom RTP header strategy that Cao Phong created, or the custom codec strategy that Alexandre Gouaillard mentioned, however in my case I need to use a receiver running in a stock browser.

True. 

Alexandre, it looks like the custom codec you described is in the chromium repository, is there a chance this strategy is destined to make it into the stock chrome browser?

Not that I know of.
 
 Or does anyone know of any other plans in the works that would make this synchronization possible in the future?

It is on the table for WebRTC NV.

Harald has done some preliminary work to study things at the frame level and the document is shared here: https://alvestrand.github.io/webrtc-framelog/

Right now several people have expressed the hope that a common transport protocol (*cough*QUIC*cough*) would implicitly solve the problem. Some others have opposed claiming that this is not necessary to address the problem, without proposing any alternative solution so far. It feels like it s gonna take some time for the Working Group to reach a consensus. (There is a shortage of consensus nowadays ...).

There was a consensus in the interim face to face meeting in stockholm in april about opening lower level APIs, and for example provide access in JS to raw frames and encoded frames. I believe this would be a first step in the right direction.

HTH.

Message has been deleted

gustav hallström

unread,
Dec 19, 2018, 5:02:19 AM12/19/18
to discuss-webrtc
Hi we have experimented some with adding info into the frame. However we saw this issue and thought that it may be interesting:


We have not investigated further but it sounds interesting for sure.

Alexandre GOUAILLARD

unread,
Dec 19, 2018, 10:46:56 AM12/19/18
to discuss...@googlegroups.com
that's the class I pointed to in my september 1st e-mail in this thread.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jesse Gray

unread,
Jan 2, 2019, 11:27:33 PM1/2/19
to discuss-webrtc
Thank you Alexandre and Gustav for your responses!

I will keep an eye on that bug/feature-request!

Alexandre, you bring up a great point - the entire render loop is out of my control in a <video> element, so my javascript code can't really interact with individual frames.  Whether or not the frames have a unique id is beside the point in this case, since I wouldn't know or control when they are displayed!

Unless you generate the frames yourself (from a canveat), and you have a synchronized clock across, this is not possible. Specifically, today the output of a capturer is a video track which is opaque. Correspondingly on the receiving side, unless you bypass the <video> element completely (not recommended by any mean), you do not control the rendering speed nor time, so you cannot re-sync data and frame.

You mentioned that it might be possible (but not recommended) to bypass the <video> element on the receiving end; is that possible in pure javascript in the browser?  All the ways I've found to get data out of the MediaStreamTrack (via javascript) involve going through a <video> element.  If there is a way to bypass the <video> element in the receiver, it could be an interesting option, although perhaps it comes with a huge performance hit, or isn't possible from javascript?  (I have a bit more flexibility on the sending side: in my case it happens that on the sending side I will have native-level access, it is the receiving end that must run in-browser.)

Harald has done some preliminary work to study things at the frame level and the document is shared here: https://alvestrand.github.io/webrtc-framelog/

Thank you for the link to this document on frame level logging, it looks like this extension would make available a number of interesting time-stamps to the receiver, some of which would also have been available to the sender, so they could function as a unique id "key" for the frame.  That is what I was initially hoping for, but as you say they couldn't actually be used to synchronize metadata display with image frame rendering while the <video> element is opaquely handling the render timing!

Thanks very much!
Jesse

Alexandre GOUAILLARD

unread,
Jan 4, 2019, 6:19:06 AM1/4/19
to discuss...@googlegroups.com
Unless you generate the frames yourself (from a canveat), and you have a synchronized clock across, this is not possible. Specifically, today the output of a capturer is a video track which is opaque. Correspondingly on the receiving side, unless you bypass the <video> element completely (not recommended by any mean), you do not control the rendering speed nor time, so you cannot re-sync data and frame.

You mentioned that it might be possible (but not recommended) to bypass the <video> element on the receiving end; is that possible in pure javascript in the browser?

What I meant is to bypass the <video> element for the rendering only, yo unfortunately still need it for now to access the video frames.
 
All the ways I've found to get data out of the MediaStreamTrack (via javascript) involve going through a <video> element.  If there is a way to bypass the <video> element in the receiver,

that is also my understanding.
 
it could be an interesting option, although perhaps it comes with a huge performance hit, or isn't possible from javascript?  (I have a bit more flexibility on the sending side: in my case it happens that on the sending side I will have native-level access, it is the receiving end that must run in-browser.)

there is no solution i could think of that would be acceptable in terms of speed and delay. Our hopes are to get one of those lower level API discussed in the context of webrtc NV available soon to be able to work at the frame level. While the design is clear, the consensus hasn't been reached within the working group, so we don't know when we can hope to have it.

Another hope is QUIC, but there again, ..... well let's say we re even further away from a consensus.
 

Harald has done some preliminary work to study things at the frame level and the document is shared here: https://alvestrand.github.io/webrtc-framelog/

Thank you for the link to this document on frame level logging, it looks like this extension would make available a number of interesting time-stamps to the receiver, some of which would also have been available to the sender, so they could function as a unique id "key" for the frame.  That is what I was initially hoping for, but as you say they couldn't actually be used to synchronize metadata display with image frame rendering while the <video> element is opaquely handling the render timing!

Thanks very much!
Jesse

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages