Finding the reason for dropped frames

2,535 views
Skip to first unread message

Xianzhu Wang

unread,
Jul 20, 2021, 4:27:30 PM7/20/21
to graphics-dev
Hi,

In the attached screenshot of a trace [1], there are some long frames, but I can't find the reason. The CPU usage is low, both the main thread and the compositor thread update quickly, and there are no big tile jobs. I guess the reason could be found from the trace. Any hints would be appreciated.

Also I found that the "dropped frame" alerts at the top seem not to correspond to the dropped frames in FrameSequenceTracker. The latter shows much more dropped frames (which are counted in the UMA metrics like Graphics.Smoothness.PercentDroppedFrames.AllSequences) than the former. Does this have any reason, or is it a known issue?

[1] The trace is from the 3rd run of commit 89824 in https://pinpoint-dot-chromeperf.appspot.com/job/1682012e320000

Thanks,
Xianzhu
CAP1.png

Mingjing Zhang

unread,
Jul 21, 2021, 12:18:51 PM7/21/21
to Graphics-dev, Xianzhu Wang
Hi Xianzhu,

In this particular trace, it appears that most frames are actually not dropped, but rather "partially presented", i.e. main updates do get displayed but are lagging behind. Such frames are reported with STATE_PRESENTED_PARTIAL state in the PipelineReporter, and will not show up in the dropped frame alerts. I believe these partially presented frames are no longer counted into PercentDroppedFrame metrics either. See https://crbug.com/1210636.

In general, FrameSequenceTracker alone may not be sufficient to show the existence of dropped frames, since the long frame durations in the tracker could be due to reasons other than frame dropping. It is worthwhile to also look at PipelineReporter to confirm each frame's state.

Regards,
Mingjing

Xianzhu Wang

unread,
Jul 21, 2021, 1:19:39 PM7/21/21
to Mingjing Zhang, Graphics-dev
Thanks Mingjing for the answer.

However, it seems that some of such dropped frames are counted into PercentDroppedFrame. The trace's PercentDroppedFrames.AllSequences is 2.333 (from 3 values: 6, 2, 0). The frame counts in the sequences are (produced/expected): 265/281, 268/284, 298/299, 298/299, 185/185, 182/182. There are 34 dropped frames reported in PercentDroppedFrames, but only 9 dropped frame alerts (some of which seem to happen before the real measurement starts).

In PipelineReporter, there are constantly STATE_PRESENTED_PARTIAL frames during the whole test, but during most of the time, there are no long frames in FrameSequenceTracker. Are there other reasons causing the long frames at the beginning of the sequence (between 5500ms and 6900ms)?

Also I would like to know the reason for the dropped frame alerts at 5496ms and 5530ms, so that I can find a way to avoid them. PipelineReporter shows STATE_DROPPED (do we only report such frames as dropped frame alerts?), but the CPU looks mostly idle, and there are no long tasks in the main thread or the compositor thread. Any hints about the reason for such dropped frames?

Chris Harrelson

unread,
Jul 21, 2021, 1:21:43 PM7/21/21
to Xianzhu Wang, Michal Mocny, beh...@chromium.org, jon...@chromium.org, Mingjing Zhang, Graphics-dev
+Michal Mocny +Behdad Bakhshinategh +Jonathan Ross This sounds like more instances of the things being discussed as part of the smoothness metric project? Maybe there are more insights in this example?

Behdad Bakhshinategh

unread,
Jul 21, 2021, 1:33:11 PM7/21/21
to Chris Harrelson, Xianzhu Wang, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
Just to confirm, is the trace in question from the 3rd run of Commit 898254 of the pinpoint run (https://pinpoint-dot-chromeperf.appspot.com/job/1682012e320000)?

Xianzhu Wang

unread,
Jul 21, 2021, 1:38:39 PM7/21/21
to Behdad Bakhshinategh, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
On Wed, Jul 21, 2021 at 10:33 AM Behdad Bakhshinategh <beh...@chromium.org> wrote:
Just to confirm, is the trace in question from the 3rd run of Commit 898254 of the pinpoint run (https://pinpoint-dot-chromeperf.appspot.com/job/1682012e320000)?

Yes. Sorry for the typo.

Michal Mocny

unread,
Jul 22, 2021, 10:28:28 AM7/22/21
to Xianzhu Wang, Behdad Bakhshinategh, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
On Wed, Jul 21, 2021 at 1:38 PM Xianzhu Wang <wangx...@chromium.org> wrote:
On Wed, Jul 21, 2021 at 10:33 AM Behdad Bakhshinategh <beh...@chromium.org> wrote:
Just to confirm, is the trace in question from the 3rd run of Commit 898254 of the pinpoint run (https://pinpoint-dot-chromeperf.appspot.com/job/1682012e320000)?

Yes. Sorry for the typo.
 

On Wed, Jul 21, 2021 at 1:21 PM Chris Harrelson <chri...@chromium.org> wrote:
+Michal Mocny +Behdad Bakhshinategh +Jonathan Ross This sounds like more instances of the things being discussed as part of the smoothness metric project? Maybe there are more insights in this example?

Thanks, will dig in.


On Wed, Jul 21, 2021 at 10:19 AM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Mingjing for the answer.

However, it seems that some of such dropped frames are counted into PercentDroppedFrame. The trace's PercentDroppedFrames.AllSequences is 2.333 (from 3 values: 6, 2, 0). The frame counts in the sequences are (produced/expected): 265/281, 268/284, 298/299, 298/299, 185/185, 182/182. There are 34 dropped frames reported in PercentDroppedFrames, but only 9 dropped frame alerts (some of which seem to happen before the real measurement starts).

In PipelineReporter, there are constantly STATE_PRESENTED_PARTIAL frames during the whole test, but during most of the time, there are no long frames in FrameSequenceTracker. Are there other reasons causing the long frames at the beginning of the sequence (between 5500ms and 6900ms)?

Also I would like to know the reason for the dropped frame alerts at 5496ms and 5530ms, so that I can find a way to avoid them. PipelineReporter shows STATE_DROPPED (do we only report such frames as dropped frame alerts?), but the CPU looks mostly idle, and there are no long tasks in the main thread or the compositor thread. Any hints about the reason for such dropped frames?


On Wed, Jul 21, 2021 at 9:18 AM Mingjing Zhang <mjz...@chromium.org> wrote:
Hi Xianzhu,

In this particular trace, it appears that most frames are actually not dropped, but rather "partially presented", i.e. main updates do get displayed but are lagging behind. Such frames are reported with STATE_PRESENTED_PARTIAL state in the PipelineReporter, and will not show up in the dropped frame alerts. I believe these partially presented frames are no longer counted into PercentDroppedFrame metrics either. See https://crbug.com/1210636.

Just a clarification on this--  STATE_PRESENTED_PARTIAL can still count as dropped for PercentDroppedFrame metrics.  They will only not count if the new partial frame has a new main frame update to go alongside the new impl side update.  We deem that equivalent to PRESENTED_ALL, since the only visual difference between them is Latency, which we are not attempting to account for with smoothness.

However, many STATE_PRESENTED_PARTIAL frames have only a impl side update and will still count as DROPPED, so, it's not as easy as looking at the state.

(Perhaps we can consider adding more values to STATE to make it easier to differentiate various cases without looking at multiple signals)

Behdad Bakhshinategh

unread,
Jul 22, 2021, 12:09:37 PM7/22/21
to Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
I will try to explain what I understand from the trace for dropped frame alerts at 5496ms and 5530ms below:

For the dropped frame at 5496ms:
Looking at the PipelineReporter (PR) events, there are two PRs that start at that time with the frame_sequence of 149. This is because the initial PR (PR_1) has been in the stage of "EndActivateToSubmitCompositorFrame", one vsync (16.6ms) after the end of the last frame (as shown below), so we reached the deadline before we get a chance to submit the frame.
Screenshot 2021-07-22 11.32.44 AM.png
But because the work was not ready to be submitted the second PR was reported (PR_2) for the possible available compositor updates of that frame (vsync). PR_2 has been then submitted and presented but because some main thread is missing for that vsync (is still in "EndActivateToSubmitCompositorFrame" stage), PR_2 is reported as "STATE_PRESENTED_PARTIAL".
For PR_1, which is the cause of "Dropped frame" alert, it is reported as "STATE_DROPPED" because "EndActivateToSubmitCompositorFrame" took too long to the point that the next PR (frame_sequence 150) reached its "Activation" stage and replaced the active tree of previous frame (PR_1). In other words, a new (more recent) update from the main thread was received and there was no point in submitting the older update, thus dropping the PR_1.

For the dropped frame at 5530ms:
Looking at the PipelineReporter (PR) events, there are two PRs that start at that time with the frame_sequence of 151. This is because the initial PR (PR_1) has been in the stage of "EndActivateToSubmitCompositorFrame", one vsync (16.6ms) after the end of the last frame (as shown below).
Screenshot 2021-07-22 11.01.49 AM.png
But because the work was not ready to be submitted the second PR was reported (PR_2) for the possible available compositor updates of that frame (vsync).
But looking at the attributes in that PR there is no scroll nor compositor animation (e.g scroll_state is "SCROLL_NONE" and has_compositor_animation is false) so there was no compositor update to be shown  (There might be another reason for this, but given the existing signals I would assume there were no compositor update to be shown). This is also apparent from the breakdowns of PR_2 as it never submits any update. This resulted in that frame being reported as STATE_DROPPED because we only had the main thread update for that vsync and we were not able to present it in time.

In terms of avoiding such cases, we would need to see why the "EndActivateToSubmitCompositorFrame" took so long. Looking at the exact time that we were approaching the deadline, on the "Graphics.Pipeline.DrawAndSwap" line of the "Browser" process (shown below), shows that "DrawAndSwap" was taking longer and made the frame drop.
image.png
This can be rather because the "DrawAndSwap" was started too late, or that it took longer than expected.



On Wed, Jul 21, 2021 at 1:38 PM Xianzhu Wang <wangx...@chromium.org> wrote:

Xianzhu Wang

unread,
Jul 23, 2021, 8:24:14 PM7/23/21
to Behdad Bakhshinategh, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
Thanks Behdad for the detailed explanation.

I still have several questions:
1. The reason for PR_2 seems that PR_1 is slow. However when starting PR_2, how do we know PR_1 will be slow? (I feel my understanding of "starting PR_2" is wrong, but I'm still asking like this, hoping the answer will help me correctly understand it :) )
2. What is "EndActivateToSubmitCompositorFrame" waiting for exactly? The end of PR_1 seems much later than the end of the "DrawAndSwap" during the same time span.
3. For PR_2 of frame 151, is it possible that the frame doesn't need submit because there is no change?
4. The main thread updates are faster around the dropped frames. It seems that if the main thread updates were slower, the frames wouldn't be dropped because the next frame wouldn't be activated before the current frame was submitted. It's weird, but can we say faster main thread updates could cause dropped frames? A trace before the regression doesn't have the faster main thread updates.

There are also long frames in FrameSequenceTracker without dropped frame alerts, e.g. the one starting at 5554ms, duration 86ms (5 vsyncs). The frame covers 10 PRs, 5 of them are STATE_PRESENTED_ALL and others are STATE_PRESENTED_PARTIAL. There is no compositor animation or scroll. The test is a JS animation test. I compared the traces during long frames and during short/smooth frames, and didn't find any difference. What am I missing? 

Thanks,
Xianzhu

Michal Mocny

unread,
Jul 26, 2021, 10:04:40 AM7/26/21
to Xianzhu Wang, Behdad Bakhshinategh, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
On Fri, Jul 23, 2021 at 8:24 PM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Behdad for the detailed explanation.

I still have several questions:
1. The reason for PR_2 seems that PR_1 is slow. However when starting PR_2, how do we know PR_1 will be slow? (I feel my understanding of "starting PR_2" is wrong, but I'm still asking like this, hoping the answer will help me correctly understand it :) )

First a clarification: PR2 is not really "started" even though it looks like that in the breakdowns.  It is just "reported", as a fork of PR1, only after PR1 has already missed its deadline to show the main thread update.  Notice that the startime and initial breakdowns are exactly the same between PR1 and 2, because it is a copy, and the final breakdowns of PR2 are shared with the previous sequence... while it looks like it started mid-frame in the timeline, it wasn't actually decided/created at that time, the trace just looks like that.

 
2. What is "EndActivateToSubmitCompositorFrame" waiting for exactly? The end of PR_1 seems much later than the end of the "DrawAndSwap" during the same time span.
3. For PR_2 of frame 151, is it possible that the frame doesn't need submit because there is no change?
4. The main thread updates are faster around the dropped frames. It seems that if the main thread updates were slower, the frames wouldn't be dropped because the next frame wouldn't be activated before the current frame was submitted. It's weird, but can we say faster main thread updates could cause dropped frames? A trace before the regression doesn't have the faster main thread updates.

Good point.  FWIW in our smoothness analysis work, we try to look only at the LAST Impl-side PR of each frame for this reason.  In this case, the "DROPPED" state is really just informative because there was a 2nd available frame already, which shouldn't be a problem, I guess.

That said, the real "drop" was the PARTIAL frame before.  But, it looks like that one came with a new update from the previous Frame sequence-- and we currently do not consider PARTIAL updates which came with new main updates as a drop.

Behdad Bakhshinategh

unread,
Jul 26, 2021, 11:55:15 AM7/26/21
to Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
I tried answering some of your questions below.
But overall, I should note that the current dropped frame alerts are simply looking at the state of each PR and create and alert for every dropped frame and that is not fully matching the definition of dropped frame used for calculation of "Graphics.Smoothness.PercentDroppedFrames.AllSequences". For example the initial three dropped frame alerts (around 3080ms time) are not contributing to the percent dropped frames because no sequence is started yet. Also based on the length of sequence (around 13s) we would have about 700 frames and an increase of 1.7 percent dropped frames would mean around 10 more frames being dropped which are not shown with comparing the dropped frame alerts of traces before and after the change.

On Fri, Jul 23, 2021 at 8:24 PM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Behdad for the detailed explanation.

I still have several questions:
1. The reason for PR_2 seems that PR_1 is slow. However when starting PR_2, how do we know PR_1 will be slow? (I feel my understanding of "starting PR_2" is wrong, but I'm still asking like this, hoping the answer will help me correctly understand it :) )

As Michal explained, PR_1 and PR_2 are reporting the same frame, and they are not two different pieces of work that might block each other. We usually like to have one PR per frame (example of that would be PR with frame sequence of 150), but for a frame that is reaching it's deadline, and we will be submitting it partially (the compositor side update is ready and main thread update is not), we would report that frame using two PRs being the PR_1 and PR_2 to report both the partial update and it's breakdowns (PR_2) and later on the full update including the main thread work (PR_1).
 
2. What is "EndActivateToSubmitCompositorFrame" waiting for exactly? The end of PR_1 seems much later than the end of the "DrawAndSwap" during the same time span.

I'm not sure about that, to know the details of that we need to look into the scheduler and how it decides how much to wait.
In terms of "DrawAndSwap" I think the initial one goes unused and the later "DrawAndSwap" is used for the frame. To check for the corresponding "DrawAndSwap" look at the last breakdown of the PR, "SubmitCompositorFrameToPresentationCompositorFrame" and more specifically its own breakdown of "StartDrawToSwapStart".
 
3. For PR_2 of frame 151, is it possible that the frame doesn't need submit because there is no change?

No, if that was the case, the state would be reported as "NO_UPDATE_DESIRED", also looking at other flags of the PR, that frame has an active has_main_animation.
 
4. The main thread updates are faster around the dropped frames. It seems that if the main thread updates were slower, the frames wouldn't be dropped because the next frame wouldn't be activated before the current frame was submitted. It's weird, but can we say faster main thread updates could cause dropped frames? A trace before the regression doesn't have the faster main thread updates. 

There are also long frames in FrameSequenceTracker without dropped frame alerts, e.g. the one starting at 5554ms, duration 86ms (5 vsyncs). The frame covers 10 PRs, 5 of them are STATE_PRESENTED_ALL and others are STATE_PRESENTED_PARTIAL. There is no compositor animation or scroll. The test is a JS animation test. I compared the traces during long frames and during short/smooth frames, and didn't find any difference. What am I missing? 

There is an issue with how FrameSequenceTracker separates frames that we noticed in these traces, the issue is most likely related to out of sync presentation feedback that is used for cutting the time into frame segments in frame sequence tracker. 

Xianzhu Wang

unread,
Jul 26, 2021, 2:22:04 PM7/26/21
to Behdad Bakhshinategh, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
Thanks Michal and Behdad for the detailed answers!

On Mon, Jul 26, 2021 at 8:55 AM Behdad Bakhshinategh <beh...@chromium.org> wrote:
I tried answering some of your questions below.
But overall, I should note that the current dropped frame alerts are simply looking at the state of each PR and create and alert for every dropped frame and that is not fully matching the definition of dropped frame used for calculation of "Graphics.Smoothness.PercentDroppedFrames.AllSequences". For example the initial three dropped frame alerts (around 3080ms time) are not contributing to the percent dropped frames because no sequence is started yet. Also based on the length of sequence (around 13s) we would have about 700 frames and an increase of 1.7 percent dropped frames would mean around 10 more frames being dropped which are not shown with comparing the dropped frame alerts of traces before and after the change.

On Fri, Jul 23, 2021 at 8:24 PM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Behdad for the detailed explanation.

I still have several questions:
1. The reason for PR_2 seems that PR_1 is slow. However when starting PR_2, how do we know PR_1 will be slow? (I feel my understanding of "starting PR_2" is wrong, but I'm still asking like this, hoping the answer will help me correctly understand it :) )

As Michal explained, PR_1 and PR_2 are reporting the same frame, and they are not two different pieces of work that might block each other. We usually like to have one PR per frame (example of that would be PR with frame sequence of 150), but for a frame that is reaching it's deadline, and we will be submitting it partially (the compositor side update is ready and main thread update is not), we would report that frame using two PRs being the PR_1 and PR_2 to report both the partial update and it's breakdowns (PR_2) and later on the full update including the main thread work (PR_1).

Are the following correct about PR_2 of frame 149:
- The works are actually from frame 148.
- The STATE_PRESENTED_PARTIAL means that the updates from frame 148 are presented (after frame 149 is started), but the updates for frame 149 haven't been presented yet.

If the above are correct, then can I understand STATE_PRESENTED_DROPPED of frame 151 (question 3) like the following?
- It's not like frame 149 which is STATE_PRESENTED_PARTIAL because frame 150 finished in time.
- Compared to before "regression" (where no frame completed on time in that sequence), this STATE_PRESENTED_DROPPED is actually a progression of frame 150.
- However, one faster frame update among slow updates may cause a jitter, which may not be a good thing, but we can only do as fast as possible in blink code.

2. What is "EndActivateToSubmitCompositorFrame" waiting for exactly? The end of PR_1 seems much later than the end of the "DrawAndSwap" during the same time span.

I'm not sure about that, to know the details of that we need to look into the scheduler and how it decides how much to wait.
In terms of "DrawAndSwap" I think the initial one goes unused and the later "DrawAndSwap" is used for the frame. To check for the corresponding "DrawAndSwap" look at the last breakdown of the PR, "SubmitCompositorFrameToPresentationCompositorFrame" and more specifically its own breakdown of "StartDrawToSwapStart".
 
3. For PR_2 of frame 151, is it possible that the frame doesn't need submit because there is no change?

No, if that was the case, the state would be reported as "NO_UPDATE_DESIRED", also looking at other flags of the PR, that frame has an active has_main_animation.
 
4. The main thread updates are faster around the dropped frames. It seems that if the main thread updates were slower, the frames wouldn't be dropped because the next frame wouldn't be activated before the current frame was submitted. It's weird, but can we say faster main thread updates could cause dropped frames? A trace before the regression doesn't have the faster main thread updates. 

There are also long frames in FrameSequenceTracker without dropped frame alerts, e.g. the one starting at 5554ms, duration 86ms (5 vsyncs). The frame covers 10 PRs, 5 of them are STATE_PRESENTED_ALL and others are STATE_PRESENTED_PARTIAL. There is no compositor animation or scroll. The test is a JS animation test. I compared the traces during long frames and during short/smooth frames, and didn't find any difference. What am I missing? 

There is an issue with how FrameSequenceTracker separates frames that we noticed in these traces, the issue is most likely related to out of sync presentation feedback that is used for cutting the time into frame segments in frame sequence tracker. 

The issue seems to cause false dropped frames (i.e. fewer main-frames-produced than expected) and false *.PercentDroppedFrames.* regressions. Is there already a bug for this?

Behdad Bakhshinategh

unread,
Jul 26, 2021, 3:30:25 PM7/26/21
to Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev
On Mon, Jul 26, 2021 at 2:22 PM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Michal and Behdad for the detailed answers!

On Mon, Jul 26, 2021 at 8:55 AM Behdad Bakhshinategh <beh...@chromium.org> wrote:
I tried answering some of your questions below.
But overall, I should note that the current dropped frame alerts are simply looking at the state of each PR and create and alert for every dropped frame and that is not fully matching the definition of dropped frame used for calculation of "Graphics.Smoothness.PercentDroppedFrames.AllSequences". For example the initial three dropped frame alerts (around 3080ms time) are not contributing to the percent dropped frames because no sequence is started yet. Also based on the length of sequence (around 13s) we would have about 700 frames and an increase of 1.7 percent dropped frames would mean around 10 more frames being dropped which are not shown with comparing the dropped frame alerts of traces before and after the change.

On Fri, Jul 23, 2021 at 8:24 PM Xianzhu Wang <wangx...@chromium.org> wrote:
Thanks Behdad for the detailed explanation.

I still have several questions:
1. The reason for PR_2 seems that PR_1 is slow. However when starting PR_2, how do we know PR_1 will be slow? (I feel my understanding of "starting PR_2" is wrong, but I'm still asking like this, hoping the answer will help me correctly understand it :) )

As Michal explained, PR_1 and PR_2 are reporting the same frame, and they are not two different pieces of work that might block each other. We usually like to have one PR per frame (example of that would be PR with frame sequence of 150), but for a frame that is reaching it's deadline, and we will be submitting it partially (the compositor side update is ready and main thread update is not), we would report that frame using two PRs being the PR_1 and PR_2 to report both the partial update and it's breakdowns (PR_2) and later on the full update including the main thread work (PR_1).

Are the following correct about PR_2 of frame 149:
- The works are actually from frame 148.
 
Parts of it are from frame 148 and parts are from frame 149. The pixels displayed when PR_2 of frame 149 ends, are from the compositor thread updates of frame 149 and main thread updates of frame 148. This is also the reason why the "SubmitCompositorFrameToPresentationCompositorFrame" block of the two PRs match perfectly.

- The STATE_PRESENTED_PARTIAL means that the updates from frame 148 are presented (after frame 149 is started), but the updates for frame 149 haven't been presented yet.
 
 Yes, but the STATE_PRESENTED_PARTIAL only is about frame 149 not having its update presented yet, and frame 148 would not impact the state of PR_2 of frame 149.

Philip Rogers

unread,
Jul 27, 2021, 3:08:34 PM7/27/21
to Behdad Bakhshinategh, Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev, Stefan Zager
We have an upcoming project to move compositing decisions off the main thread and these issues will likely impact that project too (+szager).

Is it correct to say that there are two bugs in PercentDroppedFrames?
1. If a compositor frame finds out there is a new main frame due to faster main frames, the compositor frame can be dropped when it wouldn't have been with a slower main frame.
2. FrameSequenceTracker separates frames for an unknown reason, possibly related to ordering of frames.

--
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.

Behdad Bakhshinategh

unread,
Jul 28, 2021, 10:51:34 AM7/28/21
to Philip Rogers, Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev, Stefan Zager
On Tue, Jul 27, 2021 at 3:08 PM Philip Rogers <p...@chromium.org> wrote:
We have an upcoming project to move compositing decisions off the main thread and these issues will likely impact that project too (+szager).

Is it correct to say that there are two bugs in PercentDroppedFrames?
1. If a compositor frame finds out there is a new main frame due to faster main frames, the compositor frame can be dropped when it wouldn't have been with a slower main frame.
 
I don't think so, the  cause of the frame dropping is it taking longer than a vsync to be produced, the faster next main frame is not the cause of frame being dropped, but just a cutting 
point for giving up on the previous frame because we have a more recent update ready to be presented.

2. FrameSequenceTracker separates frames for an unknown reason, possibly related to ordering of frames.
 
Yes, there is a bug filed for that here: https://bugs.chromium.org/p/chromium/issues/detail?id=1233115  

Michal Mocny

unread,
Jul 28, 2021, 11:23:20 AM7/28/21
to Behdad Bakhshinategh, Philip Rogers, Xianzhu Wang, Chris Harrelson, Michal Mocny, Jonathan Ross, Mingjing Zhang, Graphics-dev, Stefan Zager
On Wed, Jul 28, 2021 at 10:51 AM Behdad Bakhshinategh <beh...@chromium.org> wrote:


On Tue, Jul 27, 2021 at 3:08 PM Philip Rogers <p...@chromium.org> wrote:
We have an upcoming project to move compositing decisions off the main thread and these issues will likely impact that project too (+szager).

Is it correct to say that there are two bugs in PercentDroppedFrames?
1. If a compositor frame finds out there is a new main frame due to faster main frames, the compositor frame can be dropped when it wouldn't have been with a slower main frame.
 
I don't think so, the  cause of the frame dropping is it taking longer than a vsync to be produced, the faster next main frame is not the cause of frame being dropped, but just a cutting 
point for giving up on the previous frame because we have a more recent update ready to be presented.

My interpretation of the question was:  For a main frame update that misses its deadline and bleeds into the next frame (i.e. PR_1 in the first example)... It seems its status can be either PRESENTED_ALL or DROPPED.  Presumably, that depends on the next main frame update.  If the next update happens to finish within deadline, then the means the previous was never needed and can be dropped.  However, if that next update was itself also late, then we did present what we had.

However, the question as worded asked if "the compositor frame can be dropped" which I think is not right.

Xianzhu Wang

unread,
Jul 28, 2021, 11:57:39 AM7/28/21
to Michal Mocny, Behdad Bakhshinategh, Philip Rogers, Chris Harrelson, Jonathan Ross, Mingjing Zhang, Graphics-dev, Stefan Zager
The first question is mainly about the regression we are investigating.
- Before the regression, every frame couldn't finish on time, and PRESENTED_PARTIAL and PRESENTED_ALL were reported, and PercentDroppedFrames was low.
- After the regression, one frame (150) finishes on time, causing the next frame (151) to be reported as DROPPED, which seems to increase PercentDroppedFrames.

I agree that the frame update time is long, but it's not a regression and it may not be addressable because the complexity of the benchmark needs the time to process the frame updates. However, for frame 150, we do have progression because it becomes on-time, but it seems to cause false regression of PercentDroppedFrames.

Jonathan Ross

unread,
Jul 28, 2021, 4:07:47 PM7/28/21
to Xianzhu Wang, Michal Mocny, Behdad Bakhshinategh, Philip Rogers, Chris Harrelson, Mingjing Zhang, Graphics-dev, Stefan Zager
Hi,

I believe what we are seeing at frame 150 -> 151 we are seeing a complicated interaction.

(TLDR: 151 misses 2 VSyncs and is never shown.)

PipelineReporters' presentation state is tied to the frame_sequence number, and this is applied retroactively. So if a Compositor-thread frame submits for frame X, it won't decide on PARTIAL/ALL until the Main-thread has completed for frame X as well. The previous Main-thread frame (X-1) submitting within that VSync won't directly affect the submission result. 

Looking at 150,151
8jUhKH9Z2xDEgGW_2.png
  • 150 is both Main and Compositor(Impl) successfully submitting in time.
  • 151 Impl has no damage, and there is no update from 151 Main.
    • This would have been PARTIAL if 151 Main had submitted within the next VSync. However it did not, 152 starts and the content of 151 is never shown. That is why 151 is DROPPED
151 Impl itself appears to have no updates, confirmed by it not attempting to GenerateRenderPasses
QgtUrXCje9Vau6K.png

While you are now dropping more frames, other metrics have improved. Such as CompositorLatency.TotalLatency, which while narrow on average shows a large reduction in worst cases. (Patch is on the left)
latency.png

This case seems to be a rare edge-case. behdadb@ has noticed other traces where the increase in PercentDroppedFrames appears to be tied to long running Graphics.Pipeline.DrawAndSwap taking over 1 VSync. 

It also seems to be unfortunate that with the patch EndActivateToSubmitCompositorFrame is taking very close to 1 VSync. So we've gone from constantly being late (higher latency.) To better latency, but occasional frame drops due to missing deadlines.

Regards,
Jonathan

Behdad Bakhshinategh

unread,
Jul 28, 2021, 5:48:39 PM7/28/21
to Jonathan Ross, Xianzhu Wang, Michal Mocny, Philip Rogers, Chris Harrelson, Mingjing Zhang, Graphics-dev, Stefan Zager
Hi all,

I looked more into the differences of traces before and after the change and I believe the jump in values is because of what we discussed earlier about the FrameSequenceTracker frame detection and unrelated to the state of pipeline reporters.
Lookin at the instances of "Graphics.Smoothness.PercentDroppedFrames.AllSequences" we have three sequences with values (rounded) of 0%, 1% and 6% averaging to 2.33%.
image.png
And that matches the three sequences we have in the trace. Clicking on each of the sequences would give us the number of frames in each category. The first sequence is the one with the 6% dropped frames:
image.png
Which gives us:
 main-frames-expected: 284
 main-frames-ontime: 1
 main-frames-produced: 268
so based on that 16 frames are dropping, but we are seeing only 4 dropped frame alerts, and comparing that with the first frame sequence of a trace from before this change (for example the 3rd trace of Commit 898253 [1]) we see:
 main-frames-expected: 300,
 main-frames-ontime: 0,
 main-frames-produced: 297
so only 3 frames are dropping , while we still have the 4 dropped frame alerts, making me believe that the alerts are not pointing at the actual issue.

And then I noticed that the 16 frames dropped matches the number of frames which are merged together (not separated correctly).
So the cause of the change in values is that the CL somehow changes the signal that frame_sequence_tracker uses to separate frames resulting in higher values for "Graphics.Smoothness.PercentDroppedFrames.AllSequences".

Michal Mocny

unread,
Jul 28, 2021, 8:45:32 PM7/28/21
to Jonathan Ross, Xianzhu Wang, Michal Mocny, Behdad Bakhshinategh, Philip Rogers, Chris Harrelson, Mingjing Zhang, Graphics-dev, Stefan Zager
On Wed, Jul 28, 2021 at 4:07 PM Jonathan Ross <jon...@chromium.org> wrote:
Hi,

I believe what we are seeing at frame 150 -> 151 we are seeing a complicated interaction.

(TLDR: 151 misses 2 VSyncs and is never shown.)

PipelineReporters' presentation state is tied to the frame_sequence number, and this is applied retroactively. So if a Compositor-thread frame submits for frame X, it won't decide on PARTIAL/ALL until the Main-thread has completed for frame X as well. The previous Main-thread frame (X-1) submitting within that VSync won't directly affect the submission result. 

Looking at 150,151
8jUhKH9Z2xDEgGW_2.png
  • 150 is both Main and Compositor(Impl) successfully submitting in time.
  • 151 Impl has no damage, and there is no update from 151 Main.
    • This would have been PARTIAL if 151 Main had submitted within the next VSync. However it did not, 152 starts and the content of 151 is never shown. That is why 151 is DROPPED
I don't think that's quite right, or at least I haven't seen that in other examples.

My understanding was that every time the main update is not available on time (vsync), that we fork the PR event and report the impl state.  Since in this case there is nothing to show PARTIALLY (i.e. neither a previous main update, nor a new compositor update), we report a state of DROPPED.  I didn't think it matters if the main update misses 2 frames.  (Actually, if that did happen, I think we would get another DROPPED impl update in the next frame.  You can see this pattern happening in frames 141 - 143 of that trace, where a single long Main PR has multiple short DROPPED impl PR).

In this specific case, to me it looks like 151 Main only missed a single vsync.  The total wall time is 24.225 ms.

Screen Shot 2021-07-28 at 8.10.04 PM.png

I think Xianzhu's observations were correct: had frame 150 missed its vsync deadline, frame 151 would have been labelled PARTIAL.  What happened in frame 151 doesn't matter at all.

FWIW, while this is an odd case, it's not actually super rare.  The jank (i.e. change in frame rate) that frame 150 introduces with its faster wall time, causes two frames to become labelled DROPPED (the main update before, and the impl update after) -- even though it was a performance improvement.

While that seems a bit weird, from the visual perspective of the user, they went from seeing an experience with a steady stream of slightly latent frame updates, to seeing 1 frame jumping ahead too quickly and then repeating twice.  All because the average frame latency improved.  It would have been visually smoother to just delay the frame 150 main update!
Reply all
Reply to author
Forward
0 new messages