Error Message: Non-Monotonically Increasing dts to muxer

2,737 views
Skip to first unread message

John L. Poole

unread,
Jun 14, 2020, 11:08:15 PM6/14/20
to moonfire-nvr-users

Dear subscribers of moonfire-nvr,

I wanted to give a heads up on a task I'm undertaking.

I'm researching and working up a bug submission for errors which occur when streams from my Reolink cameras come into an ffmpeg client.  Scott Lamb has indicated (Issue #36 ) this is a bug in the code of ffmpeg; however, the bug has not yet been submitted to on the ffmpeg Trac system.  The significance of this bug is that Scott feels it's a gating factor to handling audio streams in Moonfire-nvr.

Here's a screen shot of the error "Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1738 >= 1738" generated in a session of ffmepg accessing the rtsp URL for my Reolink Camera.

The error has been reproduced running ffmpeg against a public streaming server which streams a video:
./ffmpeg -report -rtsp_transport tcp -i rtsp://wowzaec2demo.streamlock.net/vod/mp4:BigBuckBunny_115k.mov -map 0:v -filter:v showinfo -frames:v 10 -f null /dev/null
ffmpeg started on 2015-11-19 at 22:18:19
See: https://trac.ffmpeg.org/ticket/5018

I confess I'm still working on this and trying to understand what is causing this and that has resulted in my spending a lot of time reading and doing some experiments, especially to standardize a testing script which can be used to duplicate the error.  I spent Saturday morning fine tuning a bash script that can serve as a template for mass testing and produce preserved-color logs with line numbering.  (I may have some live video cameras made available to me from others soas to test against a variety of commercial cameras.)  I've determined my Reolink cameras are using LIVE555 streaming servers.  I installed a LIVE555 streaming server on a Gentoo system, but after reading the fine print and learning it is licensed only for 1 client and the project owner's opinion that anyone with a gmail.com email address is not a professional worthy of being able to submit email to their email list, I reconsidered venturing into that area.  I wanted to rule out the possibility that the LIVE555 server was the cause of the problem.

Right now, I'm trying to configure nginx running on Gento to act as a streaming server for content fed to it from ffmpeg.  I want to confirm that an MP4 file served up from ffmpeg [server] ->nginx->another ffmpeg [client] causes the error and that it is reproducible from the same file being streamed.  This kind of scenario, coupled with problem I have encountered with streams from my Reolink cameras and the Big Buck Bunny test from wowzaec2demo, ought to convincingly demonstrate the problem lies with ffmpeg.

I believe if I submit something from my Reolink cameras through my network, there will be wiggle room for the ffmpeg developers to raise questions and suggest it is not their code that is causing the problem.  So, my current goal is to have an MP4 source file, e.g. a short snippet of the Big Bunny video, send by an ffmpeg session to nginx which streams to an ffmpeg client that causes these errors to appear.  The problem with using live video cameras is you do not have a reproducible control situation and there are factors which cannot be reproduced, e.g. faulty camera, network problems.  My preliminary test against a streaming server sending a snippet of Big Buck Bunny, as Scott pointed out in FFmpeg's Trac ticket # 5018,  was the same: my ffmpeg session flagged "non monotonically" errors.

I also want to know and understand why this is occuring so I can articulate the problem and give Scott's opinion and unassailable position.  Hence, my venture into the Rust code which has lead me to the ISO standards.  Scott in Issue #36 points to ffmpegs code in libavformat/rtpdec.c asserting the problem is occurring there.  Unfortunately, I do not see why the referred-to code has a problem as I do not have the expertise Scott does, nor do I have a working familiarity with the packet processing, so I'm facing a large learning curve.

I said to Scott I hoped to have something this weekend worked up, but I don't.  I've probably spent over 10 hours so far and still feel I have a way to go.

Cheers,
John

Scott Lamb

unread,
Jun 15, 2020, 2:36:09 AM6/15/20
to John, moonfire-nvr-users
On Sun, Jun 14, 2020 at 8:08 PM John L. Poole <jlpo...@gmail.com> wrote:

Dear subscribers of moonfire-nvr,

I wanted to give a heads up on a task I'm undertaking.

I'm researching and working up a bug submission for errors which occur when streams from my Reolink cameras come into an ffmpeg client.  Scott Lamb has indicated (Issue #36 ) this is a bug in the code of ffmpeg; however, the bug has not yet been submitted to on the ffmpeg Trac system.  The significance of this bug is that Scott feels it's a gating factor to handling audio streams in Moonfire-nvr.

Here's a screen shot of the error "Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1738 >= 1738" generated in a session of ffmepg accessing the rtsp URL for my Reolink Camera.

I'm thinking now this particular variant of the message has to be a red herring. In particular, notice that it has a very low number that may just match the frame rather than be a timestamp in 90,000ths of a second. Maybe this is an unrelated bug in the "show_info" filter.

The problem I experienced was less frequent. My theory is that the bug only happens on SNTP time jumps, so no more often than the SNTP interval configured in the camera's UI. My SNTP interval on my cameras was set to 1 minute, so this would happen about once every minute. (Maybe about once every two minutes if it jumps forward and back equally.)

Furthermore, maybe the SNTP problem is more rare/esoteric than I'd realized. I don't seem to be experiencing it on my Dahua IPC-HDW5231R-Z anymore. I don't think the relevant ffmpeg code has changed—the snippet I quoted before is still there. I think maybe my camera's behavior has changed as I've updated firmware since then. It's possible it now has "proper" NTP support and slews time. (I'll explain this more below.) It's also possible that the likelihood of this bug occuring literally depends on the weather. I believe temperature is a factor in how well these cameras keep time.

I let a stream run for a while and don't actually see any non-increasing ptses in it. Stream command:

./ffmpeg -loglevel debug -rtsp_transport tcp -rtsp_flags prefer_tcp -i 'rtsp://admin:reda...@192.168.5.103:554/Streaming/Channels/102?transportmode=unicast&profile=Profile_2' -f null -filter:v showinfo /dev/null > logs 2>&1

Command to search for non-decreasing ptses. (I hope the perl usage warms your heart.)

perl -ne 'm{pts:(\d+)} or next; if ($1 <= $pts) { print; } $pts = $1' logs

It didn't find anything, where my notes I just dug up from 25 Sep 2017 say this same command did then.

The error has been reproduced running ffmpeg against a public streaming server which streams a video:
./ffmpeg -report -rtsp_transport tcp -i rtsp://wowzaec2demo.streamlock.net/vod/mp4:BigBuckBunny_115k.mov -map 0:v -filter:v showinfo -frames:v 10 -f null /dev/null
ffmpeg started on 2015-11-19 at 22:18:19
See: https://trac.ffmpeg.org/ticket/5018

I confess I'm still working on this and trying to understand what is causing this and that has resulted in my spending a lot of time reading and doing some experiments, especially to standardize a testing script which can be used to duplicate the error.  I spent Saturday morning fine tuning a bash script that can serve as a template for mass testing and produce preserved-color logs with line numbering.  (I may have some live video cameras made available to me from others soas to test against a variety of commercial cameras.)  I've determined my Reolink cameras are using LIVE555 streaming servers.  I installed a LIVE555 streaming server on a Gentoo system, but after reading the fine print and learning it is licensed only for 1 client and the project owner's opinion that anyone with a gmail.com email address is not a professional worthy of being able to submit email to their email list, I reconsidered venturing into that area.  I wanted to rule out the possibility that the LIVE555 server was the cause of the problem.

Right now, I'm trying to configure nginx running on Gento to act as a streaming server for content fed to it from ffmpeg.  I want to confirm that an MP4 file served up from ffmpeg [server] ->nginx->another ffmpeg [client] causes the error and that it is reproducible from the same file being streamed.  This kind of scenario, coupled with problem I have encountered with streams from my Reolink cameras and the Big Buck Bunny test from wowzaec2demo, ought to convincingly demonstrate the problem lies with ffmpeg.

I believe if I submit something from my Reolink cameras through my network, there will be wiggle room for the ffmpeg developers to raise questions and suggest it is not their code that is causing the problem.  So, my current goal is to have an MP4 source file, e.g. a short snippet of the Big Bunny video, send by an ffmpeg session to nginx which streams to an ffmpeg client that causes these errors to appear.  The problem with using live video cameras is you do not have a reproducible control situation and there are factors which cannot be reproduced, e.g. faulty camera, network problems.  My preliminary test against a streaming server sending a snippet of Big Buck Bunny, as Scott pointed out in FFmpeg's Trac ticket # 5018,  was the same: my ffmpeg session flagged "non monotonically" errors.

If the problem is indeed the RTCP Sender Reports, though, I don't think you'll ever be able to reproduce that from .mp4 files. They're not saved there; it's a very RTSP-specific concept.
 
I also want to know and understand why this is occuring so I can articulate the problem and give Scott's opinion and unassailable position.  Hence, my venture into the Rust code which has lead me to the ISO standards.  Scott in Issue #36 points to ffmpegs code in libavformat/rtpdec.c asserting the problem is occurring there.  Unfortunately, I do not see why the referred-to code has a problem as I do not have the expertise Scott does, nor do I have a working familiarity with the packet processing, so I'm facing a large learning curve.

Let me explain my theory a little more and what that code snippet does.

An RTSP session is divided into one or more streams. Currently Moonfire only cares about video, but sometimes there's also an audio stream and/or an "ONVIF metadata" stream (which might have results from motion detection / video analytics, camera self-monitoring data like CPU usage, etc). The streams don't necessarily use the same "time base". Eg, video typically is marked off in 90,000ths of a second. Audio might be in a variety of time bases, like 8,000ths of a second or 44,100ths of a second. The camera will report the pts for each "packet" (in ffmpeg terminology, not necessarily exactly corresponding to a packet at the IP level, think closer to a video frame or a group of audio samples) of data in that stream's time base, starting from 0 at the beginning of the stream. It doesn't directly report wall time for these packets.

Many cameras however do report wall time in the form of "RTCP Sender Reports". These come periodically (every 10 seconds or something?) and look like this:

image.png

(That screenshot is from Wireshark. If you have a packet capture of the RTSP stream, you can find these packets easily in Wireshark by typing in "rtcp" as the display filter.)

Each report says that some timestamp in this stream's base (the "RTP timestamp" here) matches some wall time (the "NTP time"). So you can use one of those to anchor all the other timestamps to a wall time, or two of them to confuse yourself thoroughly if they are inconsistent.

If you trust these timestamps, this is useful for knowing when a frame of video happened, so you can match them up with real events, other cameras, etc.

ffmpeg also appears to be using them to relate two streams in the same RTSP session with that code. I'll annotate it with my understanding of this code:

If it has gotten a RTCP Sender Report and has more than one stream:

If it has received a wall clock time and has more than one stream:

    if (s->last_rtcp_ntp_time != AV_NOPTS_VALUE && s->ic->nb_streams > 1) {
        int64_t addend;
        int delta_timestamp;

Compute how long it's been (in the stream's time base) since the last time it received a wall clock time:

        /* compute pts from timestamp with received ntp_time */
        delta_timestamp = timestamp - s->last_rtcp_timestamp;

Compute the difference between the first and last NTP timestamps, and convert that to the stream's time base:

        /* convert to the PTS timebase */

        addend = av_rescale(s->last_rtcp_ntp_time - s->first_rtcp_ntp_time,
                            s->st->time_base.den,
                            (uint64_t) s->st->time_base.num << 32);

Then form a timestamp as the time (in the stream's timebase) when it first received a NTP timestamp, plus those two previous computations.

        pkt->pts = s->range_start_offset + s->rtcp_ts_offset + addend +

                   delta_timestamp;
        return;
    }

Honestly I have no idea why ffmpeg does this at all. Maybe the author of this code believed the streams' time bases are computed from two entirely separate clocks and was trying to correct for this. But as far as I know, that's not true. It seems reasonable to assume instead that all streams' timestamps are taken from the same camera's monotonic clock then just divided by different factors to get to the right time base for each stream. So I have no reason to believe this code solves a real problem. I'd like to simply delete it, if ffmpeg developers could be convinced to accept such a patch, or allow it to be enabled and disabled by an option. (Using an option also has the nice property that because ffmpeg tells you if it understood your options or not, we could reliably tell if we have a "fixed" version of ffmpeg.)

Here's the problem I think this code causes: the camera likely used a "monotonic" clock to calculate the stream ptses. But by definition, it uses a wall clock for the NTP timestamps. Wall times on computers infamously aren't monotonically increasing. They can "step" forward or backward in time. http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-CLOCK-DISCIPLINE talks about this. Steps are supposed to be rare, and they if the clock is managed with a "full" NTP implementation as on a "real" computer. But I think these cameras use "SNTP", a simplified form that may just step the time any time it does an adjustment. If this happens, and it at least used to on my camera, it will cause the pts to behave strangely. The backward steps are most visible because negative durations are impossible. Forward steps don't seem great either; this means you have a video frame that incorrectly claims a really long duration. Likewise audio may hiccup.

I tried looking a little in ffmpeg's git history to figure out why this code exists.
  • 2003-10-29: revision 8b1ab7b added the original form of this code with the comment /* XXX: is it really necessary to unify the timestamp base ? */ which I think means "do we need this code at all?" I'd assert the answer is "no".
  • 2008-01-04: revision 8eb793c moved it from libavformat/rtp.c to libavformat/rtpdec.c, unchanged.
  • 2008-07-02: revision d6b9e57 makes it use the stream's time base in the calculation rather than a hardcoded 90,000 (which was wrong for audio streams). It also removed the doubtful comment.


I said to Scott I hoped to have something this weekend worked up, but I don't.  I've probably spent over 10 hours so far and still feel I have a way to go.

I know that feeling. :-(


Cheers,
John

--
You received this message because you are subscribed to the Google Groups "moonfire-nvr-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moonfire-nvr-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/moonfire-nvr-users/016b937a-c98c-35d2-418e-452703d83d98%40gmail.com.


--

John

unread,
May 12, 2021, 12:41:54 PM5/12/21
to moonfire-nvr-users
For posterity, I want to document how I determined that Reolink (model RLC-420-5MP, build 19013001  Hw No. IPC_51516M5M  Cfg. version v2.0.0.0 Fw Version V2.0.0.354_19013001) was using a LIVE555 server (2013 version) in its camera to serve up an rtsp connection.  I discovered this using ffmpeg to connect to the camera directly in a session conducted 2020-06-10 09:41:00-07:00.

The command I used was:

      ffmpeg -loglevel debug -rtsp_transport tcp -rtsp_flags prefer_tcp -i rtsp://[account:password redacted]@192.168.1.180:554/h264Preview_01_main -f null /dev/null 

At the beginning of the output was the following, I have highlighted in red the line " a=tool:LIVE555 Streaming Media v2013.04.08"

[tcp @ 0x556acd18a880] Successfully connected to 192.168.1.180 port 554
[rtsp @ 0x556acd1883c0] SDP:
v=0

o=- 1591558887036372 1 IN IP4 192.168.1.180

s=Session streamed by "preview"

i=h264Preview_01_main

t=0 0

a=tool:LIVE555 Streaming Media v2013.04.08

a=type:broadcast

a=control:*

a=range:npt=0-

a=x-qt-text-nam:Session streamed by "preview"

a=x-qt-text-inf:h264Preview_01_main

m=video 0 RTP/AVP 96

c=IN IP4 0.0.0.0

b=AS:500

a=rtpmap:96 H264/90000

a=fmtp:96 packetization-mode=1;profile-level-id=640033;sprop-parameter-sets=Z2QAM6zoAoAPGQ==,aO48sA==

a=control:trackID=1

m=audio 0 RTP/AVP 97

c=IN IP4 0.0.0.0

b=AS:256

a=rtpmap:97 MPEG4-GENERIC/16000

a=fmtp:97 streamtype=5;profile-level-id=15;mode=AAC-hbr;sizelength=13;indexlength=3;indexdeltalength=3;config=1408; profile=1;

a=control:trackID=2


Failed to parse interval end specification ''


If there were a place on the web where I could stage my colorized HTML file, ffmpeg_peck_west.log_LINED.html, of that session, I'd place it there, pastebin will not display the colors making it more difficult to read.
Reply all
Reply to author
Forward
0 new messages