Attention is currently required from: Sami Kyöstilä.
Khushal would like Sami Kyöstilä to review this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
---
M base/task/current_thread.cc
M base/task/current_thread.h
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline_manager.cc
A ui/gfx/rendering_pipeline_manager.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
7 files changed, 441 insertions(+), 0 deletions(-)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Sami Kyöstilä.
1 comment:
Patchset:
I still need to think through the integration with cc/viz but figured I'll send it to get an early review on the direction this is headed.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Khushal, Sami Kyöstilä.
2 comments:
Patchset:
Thanks for sharing! 😊
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
This is interesting :) We report the sum of walltime we actually spent working on tasks on the set of threads, as opposed to the walltime elapsed between starting and finishing the frame in the pipeline, correct?
Would this also mean that you could in theory end up with more |total_time| here than |target_time_| and still meet the deadline? - e.g. when there's parallel work on multiple threads that all belong to the same RenderingPipeline (more concretely, thread A + B both run a 1 second task in the same second -> 2 seconds total_time in 1 second wall time). Does that make sense? Would we have to tweak the |target_time_| accordingly in such a case? Or do we intentionally say that we have to be able to serialize all the work on the set of threads and still be able to meet the deadline?
Naively, what's the reason we don't just report the walltime between pipeline start & finish events (timedelta between when any thread in the pipeline started work on frame X and all threads completed work on frame X, or something similar)?
Another question related to this: Does taking the time before/after every task on all these threads incur any significant additional overhead? I remember that time tracking in SequenceManager queues is optional, but don't remember in which cases it's already enabled anyway.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sami Kyöstilä.
Khushal would like Eric Seckler to review this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
---
M base/task/current_thread.cc
M base/task/current_thread.h
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline_manager.cc
A ui/gfx/rendering_pipeline_manager.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
7 files changed, 441 insertions(+), 0 deletions(-)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sami Kyöstilä.
2 comments:
Patchset:
Thanks for the comments Eric. You brought up some really good points!
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
This is interesting :) We report the sum of walltime we actually spent working on tasks on the set of threads, as opposed to the walltime elapsed between starting and finishing the frame in the pipeline, correct?
The idea is to keep tracking the wall time for all tasks until the event which marks finishing of a frame. It's hard to reason about what constitutes starting a frame. We do have explicit BeginFrame signals, which tie to android choreographer, but a page could be using SetTimeout to do most of its work outside of BeginFrame/Draw. So the strategy to measure runtime between draw events seemed more reasonable to me.
Also if the page is doing some non-rendering work between frames, like requestIdleCallback, we do still want to account for that in the workload to expect for that thread in a frame.
Would this also mean that you could in theory end up with more |total_time| here than |target_time_| and still meet the deadline? - e.g. when there's parallel work on multiple threads that all belong to the same RenderingPipeline (more concretely, thread A + B both run a 1 second task in the same second -> 2 seconds total_time in 1 second wall time). Does that make sense? Would we have to tweak the |target_time_| accordingly in such a case? Or do we intentionally say that we have to be able to serialize all the work on the set of threads and still be able to meet the deadline?
That's an excellent question and clearly something I hadn't thought through. I checked with Android folks and the way ADPF will use this data is to keep giving the target threads more boost until execution time ~= target time. So for the case you mentioned, it is indeed equivalent to hitting the deadline as if the threads were executing serially.
So probably the reasonable thing would be to report max of execution time of each thread involved? As long as the threads together can finish all work within the frame budget, we won't be dropping any frames. What do you think?
> Naively, what's the reason we don't just report the walltime between pipeline start & finish events (timedelta between when any thread in the pipeline started work on frame X and all threads completed work on frame X, or something similar)?
>
This is for the idle work part I mentioned above. Even in the internal threads, there is some work we'll do immediately after a frame ends and before the next one starts. I wanted to make sure that's accounted for.
Another question related to this: Does taking the time before/after every task on all these threads incur any significant additional overhead? I remember that time tracking in SequenceManager queues is optional, but don't remember in which cases it's already enabled anyway.
This one I have no idea about. :) I'm expecting it won't be significant for internal threads, there is a predictable number of tasks run between frames for them. But for renderer main, it's dependent on what the page is doing. Maybe we can run this through our benchmarks and see?
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Khushal.
11 comments:
Patchset:
Very cool!
Thinking about the way the power scheduler could use this, I'm guessing the main knob we can tweak is the frame interval? Or should we override the per-thread wall times instead? For example, if the scheduler says a particular thread isn't important, should we report its runtime as zero here?
(Feel free to ignore the nits for now -- I tend to get carried away :)
File ui/gfx/rendering_pipeline_manager.h:
nit: PipelineType to make this more explicit?
Patch Set #1, Line 41: base::flat_map<Type, std::vector<Thread>>
nit: Should we wrap this into a Pipeline object? I could easily see us adding more properties there in the future.
Patch Set #1, Line 46: SetActive
Could we use a RAII pattern for this? That way there's no risk for the count to get unbalanced.
Patch Set #1, Line 66: lock_
nit: Let's annotate which members this protects
File ui/gfx/rendering_pipeline_manager.cc:
year += 3 :)
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
Also if the page is doing some non-rendering work between frames, like requestIdleCallback, we do still want to account for that in the workload to expect for that thread in a frame.
It's probably not easy to do, but if we can we shouldn't report any idle work to ADPF as being part of the workload. Idle work is designed to be best effort and IMO should never cause the system to move to a higher operating point.
The same goes for setTimeout(), however one practical issue I can see here is that if *don't* account for the time spent there, that code could still block other work on the main thread and mean we miss our deadlines. I wonder if we were able to more precisely track if BeginMainFrame specifically is hitting the deadline, maybe that would automatically end up ignoring work we shouldn't care about (requestIdleCallback). WDYT?
One way to do this would be to observe the priority of tasks that run and ignore everything thats below USER_BLOCKING. Alternatively we could only track time for specific task queues (e.g., compositing and input handling).
So probably the reasonable thing would be to report max of execution time of each thread involved?
That seems reasonable, yes. Otherwise making this more parallel would cause ADPF to throttle up ever harder, which isn't necessary as long as we can hit the deadline in each individual stage.
Does taking the time before/after every task on all these threads incur any significant additional overhead?
That ship has already sailed I think :) Using TaskTimeObserver means that the timestamps will only be sampled once no matter how many other observers there are.
Patch Set #1, Line 51: target_time_
nit: Should we be grabbing the lock here? :)
nit: We could probably make this an atomic swap
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #1, Line 43: dlopen
nit: We should probably se base/native_library.h for this.
Patch Set #1, Line 65: PlatformThreadId
nit: Is sizeof(base::PlatformThreadId) == sizeof(int) always on Android?
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Khushal, Sami Kyöstilä.
1 comment:
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
So probably the reasonable thing would be to report max of execution time of each thread involved?
I think this might work in some cases, but maybe not in others. In general I guess there are two categories of work split across threads: sequential and parallel. For sequential work, you want the sum of work time. For parallel work, you want the max. The tricky bit is separating the two kinds. (Though maybe the pipeline split does some of this for us? I'm not totally sure.)
To give a construed example: Say in one pipeline you have the compositor thread + raster threads, and for some reason the rasterization blocks the frame submission. To simplify more, say the compositor thread has to do some work for the frame before rasterization can start, and after rasterization is complete. That is, something like this:
cc |----| |-------|
raster |---------||------|
raster |------------|
raster |----------|
Here, you'd want to report sum(max(work on a single raster thread) + cc work).
This example this construed (not sure how close to reality..), and I dunno if such dependencies exist within a single RenderingPipeline here in reality 😊 But there's probably some degree of parallel and sequential work dependencies?
I guess my point is, this might be a little harder to get right and more complicated to reason about 😊
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Khushal.
1 comment:
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
> So probably the reasonable thing would be to report max of execution time of each thread involved? […]
Eric and I chatted a bit more offline and came to the conclusion that it's probably best if we can explicitly track each step along the journey of a BeginFrame (and input events) through the system and properly account for serial/parallel fan-out. Khushal, do you think that would be feasible? For the trickier parts (e.g., raster threads) maybe we can start with some kind of approximation, since continuous raster work is already unlikely to consistently hit vsync cadence.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Khushal would like Sadrul Chowdhury to review this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
---
M base/task/current_thread.cc
M base/task/current_thread.h
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline_manager.cc
A ui/gfx/rendering_pipeline_manager.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
7 files changed, 441 insertions(+), 0 deletions(-)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
2 comments:
Patchset:
Very cool!
Thinking about the way the power scheduler could use this, I'm guessing the main knob we can tweak is the frame interval? Or should we override the per-thread wall times instead? For example, if the scheduler says a particular thread isn't important, should we report its runtime as zero here?
I think reporting a ~0 value for runtime/target time seems like a reasonable thing. I'd expect the platform to schedule the thread in the most efficient way but not sure what numbers we should use, the android implementation details are still pretty murky on this. We can push to get more clarity once we have something up for rendering that shows good results.
Also, the power scheduler shouldn't need to use a RenderingPipelineManager for cases where the thread is in a binary important or not state and we use the strategy above. Don't need to track execution times. We can set up a RenderingStageScheduler directly instead (and probably rename it at that point :P).
(Feel free to ignore the nits for now -- I tend to get carried away :)
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
Sorry about the late reply. I had a draft response but you raised some pretty good questions that I was taking time to mull over.
Idle work is designed to be best effort and IMO should never cause the system to move to a higher operating point
That's a fair point. I'm actually curious about how the renderer scheduler deals with idle work in situations where frames are running too tight. Do we pause processing them during active rendering?
The same goes for setTimeout()
This one is tricky. It's very much possible for a page to drive its rendering work using SetTimeout. In the ideal case all work for a frame happens in the BeginMainFrame callback and script uses rAF. This would be much easier to measure, just need to track the time spent in the BeginMainFrame task (excluding blocking on cc commit). And if this work is eating up all the frame budget, then any lower priority work outside of BeginMainFrame is automatically throttled/paused.
But I suspected there will be cases where work outside BeginMainFrame is not optional, setTimeout being a big one. I do like the idea of using task priority/queues to make this distinction though. If we know the category or queues of tasks that the renderer scheduler will pause to keep them from blocking rendering then we could certainly ignore those. Are there such queues we can explicitly ignore?
To give a construed example:
That's actually pretty close to what the common frame for a raster inducing animation should look like. I started with the ASCII art style you and it got complicated very quickly. I added a diagram here for the control flow I was hoping to see in the high latency state : https://docs.google.com/document/d/1WlCqvmZusclIfgzq8bcZO3EK24Y9Xt0zA_cs8LdxBRY/edit.
It was an excellent idea to trace the life of a frame to think through all the nuances. It's not as simple as I was hoping it could be. A few points that came to my mind:
1) The commit, which happens on the compositor thread, blocks renderer main from starting the next frame so maybe renderer main's execution time should include commit time.
2) We can assume serial execution for CC + raster. We only have one foreground raster worker on Android so that makes it easier to ignore the parallelization among raster workers. CPU usage for workers is also pretty low in our stats. That makes sense since serializing paint commands for a tile is very fast, the only expensive work we do on these threads is image decoding which is sporadic because of caching.
3) Reasoning about viz + gpu main is even harder. I think we could consider all work on viz thread + work for display context (composite) on GPU thread serial. And all other GPU work (raster or WebGL) could be parallelized with the viz thread. I suppose this could be reported as max (all viz + composite, all gpu)?
The other issue here is that we still need to account for latency in the driver + execution on the GPU. We recently added a metric to compute this latency that could be leveraged here.
Scrolling is going to be a beast of its own. There are so many edge cases to consider about workload changes due to pre-rendering and the impact of slow execution on checkerboarding that timing data wouldn't capture. I'm kinda hoping we can defer that until there is some data to show that it's worth tackling that. Maybe we only try to put renderer main on a slower core for composited scrolls...
In terms of implementation, I'm open to considering an approach for explicit tracking of work for rendering steps instead of accumulating runtime for threads though a TaskObserver. I did peek around the gfx metrics setup for that : https://source.chromium.org/chromium/chromium/src/+/master:cc/metrics/compositor_frame_reporter.h;l=81. But multiple factors dissuaded me from it :
1) I was just not sure whether this would exhaustively capture everything, particularly for renderer main as highlighted above.
2) The gfx metrics do a great job at capturing latency, but don't always drill down into CPU execution time. For instance, kEndActivateToSubmitCompositorFrame or kReceivedCompositorFrameToStartDraw could include time waiting for downstream work to finish because of back pressure.
3) In some instances we're not capturing CPU time somewhat by design because it's hard to associate it with a specific frame. Like the raster time on the GPU thread. It could partially show up in StartDrawToSwapStart which measures time from draw start on viz to composite finishing on GPU thread, depending on how parallelized draw and raster was.
That's why I was more inclined to take the approach for measuring all work on a thread and isolating stages which we need separately for reporting (commit and display context).
Also adding Sadrul who's well versed with gfx scheduling and metrics in case he has an opinion.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Khushal, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
Lots of good points 😊 Thanks for the diagram, it's looking complicated indeed. Got a few more comments/questions on what you brought up.
so maybe renderer main's execution time should include commit time
I think the main thread is actually blocked while the commit is pending/happening. So likely this would already be included in the main thread's "BeginMainFrame" task's execution:
Scrolling is going to be a beast of its own. There are so many edge cases to consider about workload changes due to pre-rendering and the impact of slow execution on checkerboarding that timing data wouldn't capture. I'm kinda hoping we can defer that until there is some data to show that it's worth tackling that. Maybe we only try to put renderer main on a slower core for composited scrolls...
Hmm, sounds like we should definitely break out scrolling explicitly in our RAIL mode analysis to make an informed decision here. Definitely de-prioritizing the main thread on composited scrolls makes sense. But I'd imagine ADPF would also be super useful for composited scrolls, if we can make it work.
If we don't target scrolling with ADPF initially, which use cases are we thinking of? CSS/JS animations? Video? Games?
1) I was just not sure whether this would exhaustively capture everything, particularly for renderer main as highlighted above.
If we chose to follow a latency-metric driven approach, could we limit ourselves to situations where we are reasonably certain that we are tracking all important work? (e.g. composited animations or video)?
could include time waiting for downstream work to finish because of back pressure.
IIUC this would be blocked "idle" time waiting on some other compositing stage to complete an earlier frame? This seems somewhat similar to the main frame + commit interplay you mentioned earlier.
I think this sort of blocked time does have an effect on whether we can hit the deadlines or not, though. Just excluding it entirely doesn't seem right either.
Example: deadline 16ms, execution time 10ms, blocked 8ms. If you report 10ms to ADPF, it would assume we hit the deadline and schedule us less. If you report 18ms, it would try to schedule us more.
What would we want ADPF to do? Schedule us more so that we can get the 10ms execution down to 8ms? Schedule the downstream stage more, so that the 8ms blocked time go down to 6ms? A combination of both?
Maybe we could split the upstream stage's blocked time (evenly?) across the upstream's & downstream's reported time (sort of to tell ADPF that the downstream stage "took time away" from the upstream one)? Not sure, this may not really work because the downstream stage could hit its deadline and still block the upstream stage for some time..
In some instances we're not capturing CPU time.
Do you mean CPU time durations, or wall time duration during which there was "work to do" and the thread wasn't blocked by some other stage?
Not sure if we should be reporting CPU time durations to ADPF if we tell ADPF wall time deadlines. Example: deadline 16ms, 5ms CPU time executed, but 20ms wall execution time, because we were scheduled very infrequently. If we reported 5ms to ADPF, I'd expect it would schedule us even less / reduce CPU frequencies or something similar, because 5ms is clearly less than the 16ms deadline...
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #1, Line 43: get_and_reset_time_since_last_frame
Lots of good points 😊 Thanks for the diagram, it's looking complicated indeed. Got a few more comments/questions on what you brought up.
so maybe renderer main's execution time should include commit time
I think the main thread is actually blocked while the commit is pending/happening. So likely this would already be included in the main thread's "BeginMainFrame" task's execution:
True that, thanks for pointing that out. So nothing special needed for commit either.
Scrolling is going to be a beast of its own. There are so many edge cases to consider about workload changes due to pre-rendering and the impact of slow execution on checkerboarding that timing data wouldn't capture. I'm kinda hoping we can defer that until there is some data to show that it's worth tackling that. Maybe we only try to put renderer main on a slower core for composited scrolls...
Hmm, sounds like we should definitely break out scrolling explicitly in our RAIL mode analysis to make an informed decision here. Definitely de-prioritizing the main thread on composited scrolls makes sense. But I'd imagine ADPF would also be super useful for composited scrolls, if we can make it work.
I'm hoping we can start out slow. For instance, try to use little core if there is no raster but request boost if we know raster is about to come (based on how close we are to running out of pre-painted content). But since this is going to be more effort to get right, we should decide based on data.
We should definitely have all accelerated gestures (scrolling/pinch-zoom) separately in RAIL numbers. It's actually interesting to see how this intersects with Response. Do we timeout response x milliseconds after the first touch event or keep extending it while there is continuous input?
If we don't target scrolling with ADPF initially, which use cases are we thinking of? CSS/JS animations? Video? Games?
CSS/JS animations, canvas (probably common for ads), video. Basically any animation that is not driven by user input.
1) I was just not sure whether this would exhaustively capture everything, particularly for renderer main as highlighted above.
If we chose to follow a latency-metric driven approach, could we limit ourselves to situations where we are reasonably certain that we are tracking all important work? (e.g. composited animations or video)?
That seems unnecessarily limiting and will likely exclude too many cases. Even on m.youtube.com, I see significant chunk of work happening in tasks outside BeginMainFramee. I assume all the loading work has to happen on the main thread and is bound to be outside the rendering task?
could include time waiting for downstream work to finish because of back pressure.
IIUC this would be blocked "idle" time waiting on some other compositing stage to complete an earlier frame? This seems somewhat similar to the main frame + commit interplay you mentioned earlier.
I think it's different. Commit is special because the main thread synchronously blocks but in other instances we get asynchronously notified of the next stage's progress. The Activate/SubmitCompositorFrame is an example of that. CC doesn't dispatch the next frame until the previous one has been ack'd and is notified via an IPC back from the GPU. But the compositor thread can run other work while it waits.
I think this sort of blocked time does have an effect on whether we can hit the deadlines or not, though. Just excluding it entirely doesn't seem right either.Example: deadline 16ms, execution time 10ms, blocked 8ms. If you report 10ms to ADPF, it would assume we hit the deadline and schedule us less. If you report 18ms, it would try to schedule us more.
What would we want ADPF to do? Schedule us more so that we can get the 10ms execution down to 8ms? Schedule the downstream stage more, so that the 8ms blocked time go down to 6ms? A combination of both?
For the commit case where the main thread is blocked and can't execute, I'd err on the aggressive side and include the commit time on both thread's feedback. And looks like this will naturally fall out if we're using task time. But for all other cases, I'm hoping we shouldn't need to include this potentially idle wait time.
Ideally if ADPF makes sure the downstream stage is meeting its deadline then we won't get into this backpressure state. We should also try to utilize the parallelization opportunities as much as we can.
Maybe we could split the upstream stage's blocked time (evenly?) across the upstream's & downstream's reported time (sort of to tell ADPF that the downstream stage "took time away" from the upstream one)? Not sure, this may not really work because the downstream stage could hit its deadline and still block the upstream stage for some time..
In some instances we're not capturing CPU time.
Do you mean CPU time durations, or wall time duration during which there was "work to do" and the thread wasn't blocked by some other stage?
I meant the latter. Wall time while we were running a task. We should absolutely report wall time, if the thread is de-scheduled due to contention then that should be accounted for.
Not sure if we should be reporting CPU time durations to ADPF if we tell ADPF wall time deadlines. Example: deadline 16ms, 5ms CPU time executed, but 20ms wall execution time, because we were scheduled very infrequently. If we reported 5ms to ADPF, I'd expect it would schedule us even less / reduce CPU frequencies or something similar, because 5ms is clearly less than the 16ms deadline...
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 3:Commit-Queue +1
11 comments:
Patchset:
I've cleaned up the patch with end to end integration for a formal review. Looked at a couple of scrolling traces as well to make sure the values make sense.
File content/renderer/render_thread_impl.cc:
Patch Set #2, Line 139: #include "third_party/blink/renderer/platform/scheduler/public/thread.h"
This includes is failing the DEPS rules. Sami, is there a reason this needs to be disallowed?
File ui/gfx/rendering_pipeline_manager.h:
nit: PipelineType to make this more explicit?
Done
Patch Set #1, Line 41: base::flat_map<Type, std::vector<Thread>>
nit: Should we wrap this into a Pipeline object? I could easily see us adding more properties there […]
We use Pipeline objects internally, to track relevant state for the implementation. But for these init methods I added explicit args for each thread involved since I anticipate we'll have special logic for the Gpu pipeline going forward.
I might be misunderstanding your comment but let me know if you prefer having structs instead of an argument list for these init methods.
Patch Set #1, Line 46: SetActive
Could we use a RAII pattern for this? That way there's no risk for the count to get unbalanced.
Done, good idea!
Patch Set #1, Line 66: lock_
nit: Let's annotate which members this protects
Done
File ui/gfx/rendering_pipeline_manager.cc:
year += 3 :)
Done
Patch Set #1, Line 51: target_time_
nit: Should we be grabbing the lock here? :)
All methods on Pipeline objects should be called with the global lock on RenderingPipelineManager acquired. At least that's the pattern I was going for.
But given that all pipelines should be initialized at process startup, we could assume state on RenderingPipelineManager doesn't need a lock and each RenderingPipeline could have its own, if we wanted to limit the scope. I don't think it matters much right now, we're using RenderingPipelineManager from the same thread after init anyway.
nit: We could probably make this an atomic swap
This got complicated since I was trying to ensure that an ongoing task is also accounted for in our measurement, the values can oscillate significantly otherwise depending on which frame boundary the task ends up falling into.
Do let me know if there is a better way to structure this though.
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #1, Line 43: dlopen
nit: We should probably se base/native_library.h for this.
Done
Patch Set #1, Line 65: PlatformThreadId
nit: Is sizeof(base::PlatformThreadId) == sizeof(int) always on Android?
I'm actually not sure myself, I was hoping this assert would tell me. I assumed it is because the Android NDK API takes thread id using an int.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Khushal would like Jonathan Backer and Bo to review this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
---
M android_webview/browser/gfx/viz_compositor_thread_runner_webview.cc
M android_webview/browser/gfx/viz_compositor_thread_runner_webview.h
M base/android/java_handler_thread.cc
M base/android/java_handler_thread.h
M base/task/current_thread.cc
M base/task/current_thread.h
M cc/scheduler/scheduler.cc
M cc/scheduler/scheduler.h
M cc/scheduler/scheduler_state_machine.h
M chrome/browser/about_flags.cc
M chrome/browser/flag-metadata.json
M chrome/browser/flag_descriptions.cc
M chrome/browser/flag_descriptions.h
M components/viz/service/display/display.cc
M components/viz/service/display/display_scheduler.cc
M components/viz/service/display/display_scheduler.h
M components/viz/service/main/viz_compositor_thread_runner.h
M components/viz/service/main/viz_compositor_thread_runner_impl.cc
M components/viz/service/main/viz_compositor_thread_runner_impl.h
M components/viz/service/main/viz_main_impl.cc
M components/viz/service/main/viz_main_impl.h
M components/viz/service/main/viz_main_impl_unittest.cc
M content/gpu/gpu_child_thread.cc
M content/gpu/gpu_child_thread.h
M content/public/common/content_features.cc
M content/public/common/content_features.h
M content/renderer/categorized_worker_pool.cc
M content/renderer/categorized_worker_pool.h
M content/renderer/render_thread_impl.cc
M content/renderer/render_thread_impl.h
M third_party/blink/public/platform/platform.h
M third_party/blink/renderer/platform/exported/platform.cc
M tools/metrics/histograms/enums.xml
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline_manager.cc
A ui/gfx/rendering_pipeline_manager.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
38 files changed, 760 insertions(+), 6 deletions(-)
Attention is currently required from: Jonathan Backer, Bo, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
11 comments:
Patchset:
Looks great overall!
Generally agree that we should reuse the pipeline active/inactive state for power mode voting, too 😊
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #11, Line 58: start_time_ongoing_task_
should we use start_time just in case we didn't observe the WillProcessTask() for this task?
Patch Set #11, Line 80: ongoing_task
bikeshed: maybe "active_task"?
nit: braces for if/else 😊
>=? or is there a reason we expect exactly 1 here? (maybe add a comment)
Patch Set #11, Line 167: AggregatingRenderingPipeline
Might be worth adding some comments somewhere (maybe class comments for the ..Pipeline classes?) why we choose this approach of adding up task durations for most pipelines / why we do something else for GPU.
Patch Set #11, Line 179: base::TimeDelta total_time;
Should we add a TODO to discount idle tasks and similar on the main thread?
Patch Set #11, Line 213: scheduler_->ReportCpuCompletionTime(compositor_thread_time +
Do we always have gpu_latency_ set here? (will anything break if not? / could we add a DCHECK or similar?)
Patch Set #11, Line 275: // bug with setting up the task observer.
Uh, weird :) What's the error you get in that case?
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #11, Line 41: class
maybe a "struct" instead given all public fields?
Patch Set #11, Line 98: (int)
static_cast 😊
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 13:Commit-Queue +1
10 comments:
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #11, Line 58: start_time_ongoing_task_
should we use start_time just in case we didn't observe the WillProcessTask() for this task?
I assumed the Observer API contract is to always call WillProcessTask before DidProcessTask. The DCHECK above should also enforce that.
Patch Set #11, Line 80: ongoing_task
bikeshed: maybe "active_task"?
Done
nit: braces for if/else 😊
Done
>=? or is there a reason we expect exactly 1 here? (maybe add a comment)
Multiple pipelines on the same thread are hard to reason about. Each will run independently and have its own frame events which trigger reporting of feedback to ADPF. Maybe that's okay, each pipeline should probably have the same metrics given that their work will be interleaved on the same thread.
This is also very rare right now. Probably only in multi-window mode where we can have 2 tabs active in the same renderer and 2 separate windows being composited in the Gpu. I left a TODO with the details to sort this out going forward.
Patch Set #11, Line 167: AggregatingRenderingPipeline
Might be worth adding some comments somewhere (maybe class comments for the .. […]
Done. Added a comment on all RenderingPipeline classes.
Patch Set #11, Line 179: base::TimeDelta total_time;
Should we add a TODO to discount idle tasks and similar on the main thread?
Good point, I had forgotten about that. Done.
Patch Set #11, Line 213: scheduler_->ReportCpuCompletionTime(compositor_thread_time +
Do we always have gpu_latency_ set here? (will anything break if not? / could we add a DCHECK or sim […]
Good question. Gpu latency is only available with SurfaceControl right now, because we use fences to get that data. We can consider adding it to the other path as well. But given that SurfaceControl is default on R+, it should be there for the OS versions where we'd use ADPF. Something to keep in mind though when we do a study just to collect this data. That should be limited to OS versions where this data is available.
I don't think we can DCHECK it's set still. For one, it wouldn't be set for the first few frames. OnFrameFinished is called when we're done doing work on the CPU but haven't finished on the GPU. So the GPU feedback lags behind by a frame. And also because there can be edge cases where we don't get signal time on a fence and we handle errors like these safely.
Patch Set #11, Line 275: // bug with setting up the task observer.
Uh, weird :) What's the error you get in that case?
It's a segfault. I tried getting a tombstone so I could get a symbolized stack but no luck. Figured it's good to land the framework and fix these things in follow up patches.
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #11, Line 41: class
maybe a "struct" instead given all public fields?
Done
Patch Set #11, Line 98: (int)
static_cast 😊
Done
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
Patchset:
Thanks for the review Eric! :)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
17 comments:
Patchset:
Looks great!
File android_webview/browser/gfx/viz_compositor_thread_runner_webview.cc:
Patch Set #14, Line 100: return
DCHECK(viz_thread_.IsRunning())?
File base/android/java_handler_thread.h:
Patch Set #14, Line 106: thread_id_
thread_id_{} to default-initialize this.
File base/android/java_handler_thread.cc:
Patch Set #14, Line 135: thread_id_
DCHECK(thread_id_);
File cc/scheduler/scheduler.h:
Patch Set #14, Line 389: main_pipeline_
nit: main_pipeline_active_ and compositor_pipeline_active_? It took me a while to figure out what these were :)
File cc/scheduler/scheduler.cc:
For now this also includes cases where the rendering
// loop doesn't lead to any visual updates
Should we phrase this part as a TODO to highlight the potential improvement?
File chrome/browser/flag-metadata.json:
Patch Set #14, Line 52: add-web-content-drop-interaction",
Accidental change?
File components/viz/service/display/display.cc:
Should this be a DCHECK instead?
File components/viz/service/display/display_scheduler.h:
Patch Set #14, Line 100: gpu_pipeline_
ditto: gpu_pipeline_active_
File ui/gfx/rendering_pipeline_manager.h:
Patch Set #14, Line 51: ScopedPipelineActive
explicit
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #14, Line 31: get_and_reset_time_since_last_frame
nit: Should probably use CamelCase since this isn't quite a trivial getter.
Patch Set #14, Line 47: base::AutoLock hold(time_lock_);
I wonder if there's an alternative to grabbing and releasing two locks for each task we run? They're probably uncontended, but still it feels like we should try to avoid that. If both time_since_last_frame_ and start_time_active_task_ were relaxed atomics would that work?
Patch Set #14, Line 67: enabled
nit: {}
nit: {} (here and below)
Patch Set #14, Line 330: base::AutoLock hold(lock_);
This is called quite often from cc -- could we avoid calling into here if the duration hasn't changed from what it was before?
Should we DCHECK this instead of failing silently?
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #14, Line 34: dlsym
IIRC there's a way to use dlsym in native_library.h too?
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
Khushal uploaded patch set #15 to this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Bug : 1157620
38 files changed, 792 insertions(+), 6 deletions(-)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
16 comments:
File android_webview/browser/gfx/viz_compositor_thread_runner_webview.cc:
Patch Set #14, Line 100: return
DCHECK(viz_thread_. […]
Done
File base/android/java_handler_thread.h:
Patch Set #14, Line 106: thread_id_
thread_id_{} to default-initialize this.
Done
File base/android/java_handler_thread.cc:
Patch Set #14, Line 135: thread_id_
DCHECK(thread_id_);
Done
File cc/scheduler/scheduler.h:
Patch Set #14, Line 389: main_pipeline_
nit: main_pipeline_active_ and compositor_pipeline_active_? It took me a while to figure out what th […]
Done
File cc/scheduler/scheduler.cc:
For now this also includes cases where the rendering
// loop doesn't lead to any visual updates
Should we phrase this part as a TODO to highlight the potential improvement?
Yeah, that makes sense. My aim was to outline a potential improvement we could do to be aggressive about being in Idle mode.
File chrome/browser/flag-metadata.json:
Patch Set #14, Line 52: add-web-content-drop-interaction",
Accidental change?
Oops. Fixed.
File components/viz/service/display/display.cc:
Should this be a DCHECK instead?
Since these timestamps come from signal time on Android fences, we need to handle errors like these safely. We already do this for some values in the feedback here : https://source.chromium.org/chromium/chromium/src/+/master:components/viz/service/display/display.cc;l=93;bpv=0;bpt=1.
It's better to do all the sanitization logic in that function to keep it in one spot. But because this affects metrics, I'd prefer to do it as a separate change.
File components/viz/service/display/display_scheduler.h:
Patch Set #14, Line 100: gpu_pipeline_
ditto: gpu_pipeline_active_
Done
File ui/gfx/rendering_pipeline_manager.h:
Patch Set #14, Line 51: ScopedPipelineActive
explicit
Done
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #14, Line 31: get_and_reset_time_since_last_frame
nit: Should probably use CamelCase since this isn't quite a trivial getter.
Done
Patch Set #14, Line 47: base::AutoLock hold(time_lock_);
I wonder if there's an alternative to grabbing and releasing two locks for each task we run? They're […]
I'm generally wary of using atomics because they are so hard to reason about. And in this case most times the lock will be acquired on the target thread. The only scenario where it is acquired by the calling thread is when we sample, which is once per frame.
That said, I did try to think about how I would do this. I'm curious to know if it sounds right. And obviously, I'd love to hear if you have a better suggestion.
base::TimeDelta GetAndResetTimeSinceLastFrame() {
auto start_time = start_time_active_task_.load();
auto now = base::TimeTicks::Now();
base::TimeDelta active_task_execution_time;// If exchange failed, the task finished executing and should be accounted in
// time_since_last_frame_ in the current or next frame.
if (!start_time.is_null() &&
start_time_active_task_.compare_exchange_strong(start_time, now)) {
active_task_execution_time = now - start_time;
}
return time_since_last_frame_.exchange(base::TimeTicks()) + active_task_execution_time;
}
void WillProcessTask(base::TimeTicks start_time) {
start_time_active_task_.store(start_time);
} void DidProcessTask(base::TimeTicks start_time, base::TimeTicks end_time) {
start_time = start_time_active_task_.load();// If exchange failed, the start time was updated to now in GetAndResetTimeSinceLastFrame
// and time for this task is included in the sampled value.
if (start_time_active_task_.compare_exchange_strong(start_time, base::TimeTicks()) {
time_since_last_frame_.fetch_add(end_time - start_time);
}
start_time_active_task_.store(base::TimeTicks());
}
I would argue against the complication unless we see any perf impact, because for most tasks there will be no contention.
Patch Set #14, Line 67: enabled
nit: {}
Done
nit: {} (here and below)
Done
Patch Set #14, Line 330: base::AutoLock hold(lock_);
This is called quite often from cc -- could we avoid calling into here if the duration hasn't change […]
The only edge case which needs this is multiple pipelines. While we won't report any feedback if they are active simultaneously, the value set by the last one wins. Otherwise if there is only one pipeline this could be checked at the call-site, instead of shared state between pipelines in the pipeline object. Or as a convenience I was considering moving this function to ScopedPipelineActive to cache the state there.
I think we can punt on this because currently all feedback is being reported from the same thread, cc in the renderer and viz in the gpu process. So there shouldn't be any lock contention.
If the pattern remains the same as we add more pipelines then we can loosen the thread safety guarantee this class needs to provide. RenderingPipelineManager itself doesn't have any state which changes after initialization and it's likely that state for each pipeline object will be accessed from the same thread. For instance, when we add something for video or offscreen canvas, activation and feedback should come from the same thread. So we might not need any synchronization at all (other than the TaskObserver).
Should we DCHECK this instead of failing silently?
The only case where this would be null is if the feature is disabled and no pipelines are initialized. That was the aim with all the null-checks in this class, so the calling code doesn't need to be aware of feature state.
I could move this feature flag to ui/base and replaced with an explicit check here instead. If the feature is enabled and the APIs are used in a process, then the respective pipeline must have been initialized. Do you think that's better?
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #14, Line 34: dlsym
IIRC there's a way to use dlsym in native_library. […]
Done
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Avi Drissman, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Khushal would like danakj and Avi Drissman to review this change.
38 files changed, 797 insertions(+), 6 deletions(-)
Attention is currently required from: Jonathan Backer, danakj, Avi Drissman, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
Patchset:
Covering all bases for owners now.
+danakj for base/, content/renderer, third_party/blink/. And if you have any advice on atomics. 😊
+avi for everything else content/
boliu/backer/sadrul for overall review. sadrul in particular for ui/ OWNERS, where the bulk of the new code is.
Attention is currently required from: Jonathan Backer, danakj, Avi Drissman, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
7 comments:
File content/renderer/render_thread_impl.cc:
Patch Set #2, Line 139: #include "third_party/blink/renderer/platform/scheduler/public/thread.h"
This includes is failing the DEPS rules. […]
Yep, this is making sure we go through the public Blink API layer. What do you need from thread.h here? Could we move that to blink/public/platform/scheduler instead?
File ui/gfx/rendering_pipeline_manager.h:
Patch Set #1, Line 41: base::flat_map<Type, std::vector<Thread>>
We use Pipeline objects internally, to track relevant state for the implementation. […]
Nah, what you have here works fine too!
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #14, Line 47: base::AutoLock hold(time_lock_);
I'm generally wary of using atomics because they are so hard to reason about. […]
I was curious about the overhead myself and found an interesting reference: https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html
According to that, uncontended mutexes on a Cortex A72 are about 5x more expensive than atomics (40ns), and atomics themselves are about 5x more expensive than barrier-less operations. If we can expect up to about 1000 tasks per second, the locks here might add 200us of CPU time per second, which is not all that lot.
Patch Set #14, Line 330: base::AutoLock hold(lock_);
The only edge case which needs this is multiple pipelines. […]
It just seems a little wasteful to call in here every frame with the same value (1/60s for 60 fps) over and over again. Maybe as a compromise we could only call it when the pipeline becomes active -- or is it possible for that interval to change dynamically?
I do like the idea of restricting each pipeline to a single thread. Would it work if we hade a per-pipeline SequenceChecker that enforces that, which would eliminate the need for locking this this part? I'm worried that removing the locks will be much more difficult later when we don't really have a guarantee what's being called from where.
The only case where this would be null is if the feature is disabled and no pipelines are initialize […]
Ah, I forgot about the feature flag issue. Seems reasonable to null-check here because of that.
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #1, Line 65: PlatformThreadId
I'm actually not sure myself, I was hoping this assert would tell me. […]
Well, this assert should tell us if things are otherwise.
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #18, Line 53: android so
nit: Please mention libAdpf.so in this message so it's a little less mysterious for anyone stumbling on it. Could also say something like "disabling ADPF rendering stage scheduler".
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Avi Drissman, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
5 comments:
Patchset:
Sami's feedback sounds good, a few more things on top of that
File base/android/java_handler_thread.cc:
Patch Set #18, Line 93: thread_id_ = base::PlatformThread::CurrentId();
The CurrentId() is useful for logging but is probably not what you want here. Thread-checkers use the current ref instead.
File ui/gfx/rendering_pipeline_manager.h:
Patch Set #18, Line 37: static RenderingPipelineManager* GetInstance();
Why can't we do this without globals?
Patch Set #18, Line 78: RenderingPipelineManager(const RenderingPipelineManager&) = delete;
deleted member function should be public (https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-equals-delete.html)
Please fix.
Patch Set #18, Line 79: RenderingPipelineManager& operator=(const RenderingPipelineManager&) = delete;
deleted member function should be public (https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-equals-delete.html)
Please fix.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Khushal removed Avi Drissman from this change.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 19:Auto-Submit +1Commit-Queue +1
8 comments:
Patchset:
On the road to avoiding use of globals I realized it would be much easier to split this CL for reviews at this point. So this change adds the basic setup needed for all pipelines and dependent changes integrate them with renderer and gpu pipelines.
-avi since there is no content review needed now.
File base/android/java_handler_thread.cc:
Patch Set #18, Line 93: thread_id_ = base::PlatformThread::CurrentId();
The CurrentId() is useful for logging but is probably not what you want here. […]
This id is used with a platform API to identify a thread (not logging) so we indeed need the id.
File content/renderer/render_thread_impl.cc:
Patch Set #2, Line 139: #include "third_party/blink/renderer/platform/scheduler/public/thread.h"
Yep, this is making sure we go through the public Blink API layer. What do you need from thread. […]
This was on an old patchset, I probably got confused with scheduler/public. Not an issue anymore.
File ui/gfx/rendering_pipeline_manager.h:
Patch Set #18, Line 37: static RenderingPipelineManager* GetInstance();
Why can't we do this without globals?
We can. The pipelines can be stored on the root object in that process and plumbed through to where the dependency is needed.
Patch Set #18, Line 78: RenderingPipelineManager(const RenderingPipelineManager&) = delete;
> deleted member function should be public (https://clang.llvm. […]
Done
Patch Set #18, Line 79: RenderingPipelineManager& operator=(const RenderingPipelineManager&) = delete;
> deleted member function should be public (https://clang.llvm. […]
Done
File ui/gfx/rendering_pipeline_manager.cc:
Patch Set #14, Line 47: base::AutoLock hold(time_lock_);
I was curious about the overhead myself and found an interesting reference: https://travisdowns. […]
Interesting read. I assumed uncontended mutexes are cheap but nice to see it backed by data. Definitely something I'll keep in mind when doing things in a tight loop. :)
Patch Set #14, Line 330: base::AutoLock hold(lock_);
It just seems a little wasteful to call in here every frame with the same value (1/60s for 60 fps) over and over again. Maybe as a compromise we could only call it when the pipeline becomes active -- or is it possible for that interval to change dynamically?
It is. In fact the interesting cases where this would change is video playback. We start with 90Hz and if no other content is animating then go down to 60Hz. The interval changes while the pipeline is active in that case.
I do like the idea of restricting each pipeline to a single thread. Would it work if we hade a per-pipeline SequenceChecker that enforces that, which would eliminate the need for locking this this part? I'm worried that removing the locks will be much more difficult later when we don't really have a guarantee what's being called from where.
This sounds reasonable. I changed the threading assumptions to assume a pipeline is used on a single thread. It was easier to structure by making RenderingPipeline the public API (in the process of avoiding a singleton as well).
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Khushal uploaded patch set #20 to this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Bug:1157620
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
---
M base/android/java_handler_thread.cc
M base/android/java_handler_thread.h
M base/task/current_thread.cc
M base/task/current_thread.h
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline.cc
A ui/gfx/rendering_pipeline.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
9 files changed, 586 insertions(+), 0 deletions(-)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
Patch set 20:Commit-Queue +2
3 comments:
Patchset:
Looks great, thanks for splitting this up!
File ui/gfx/rendering_pipeline.h:
Patch Set #20, Line 65: durartion
typo: duration
File ui/gfx/rendering_pipeline.cc:
Patch Set #20, Line 293: constexpr
Should these strings be static? I guess in theory they might live on the stack otherwise.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Khushal, Sadrul Chowdhury.
CL chromium-review.googlesource.com/2518721 must be approved before triggering CQ
CL is missing the approving vote on Code-Review label.
Attention is currently required from: Jonathan Backer, danakj, Bo, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 21:Commit-Queue +1
2 comments:
File ui/gfx/rendering_pipeline.h:
Patch Set #20, Line 65: durartion
typo: duration
Done
File ui/gfx/rendering_pipeline.cc:
Patch Set #20, Line 293: constexpr
Should these strings be static? I guess in theory they might live on the stack otherwise.
Nice catch. You're right, this should be static to ensure there is a single instance of it instead of being reallocated on every invocation of the function : https://stackoverflow.com/questions/13865842/does-static-constexpr-variable-inside-a-function-make-sense
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
2 comments:
Patchset:
finished reading this CL, had a design question:
so in the meetings, we discussed that for reporting time for the pipelines, it's using wall clock time. I thought it means take the wall clock time of the start of a frame (of a pipeline), and take the end of that frame, and subtract to get the difference; it wouldn't matter how many threads the pipeline uses, or if they are (effectively) serial or parallel.
This implementation is more complicated, counting wall clock time of individual tasks, and not counting any idle time on the task scheduler. In that case, I'm not sure if wall clock still makes sense. The example I'm thinking of is the renderer compositor pipeline, which runs on the compositor and raster thread but is largely serial. Say if compositor thread is descheduled for awhile *after* it has kicked off more work for the raster thread, and the amount of time the compositor thread is descheduled is less than the amount of time raster thread. Then the descheduling didn't really hurt the performance of the frame; but counting task wallclock time would count that descheduled time as well, which seems like an overcount.
And just a note that webview's "display gpu thread" doesn't have a task runner, but let's not worry about webview for now.
File ui/gfx/rendering_pipeline.cc:
Patch Set #20, Line 92: base::TimeTicks start_time_active_task_;
add GUARDED_BY annotations
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
Patchset:
finished reading this CL, had a design question: […]
vc-ed with khushal. My summary is: while the above this is a problem, any other design we discussed to address it has even bigger problems. So consider this comment addressed :)
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Eric Seckler, Khushal, Sami Kyöstilä.
1 comment:
Patchset:
[sorry about the review delay, but I am still looking at the follow up CL to understand how this would work for cc]
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Eric Seckler, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
Patchset:
[sorry about the review delay, but I am still looking at the follow up CL to understand how this wou […]
I haven't uploaded a patchset for cc yet, the dependent change integrates this with viz.
If you want to look at cc integration, check PS18 on this change : https://chromium-review.googlesource.com/c/chromium/src/+/2518721/18. The changes in cc/* capture that well.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, danakj, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
Patchset:
I'm picking up this chain of CLs from Khushal. I've made some changes to this CL:
Ping for reviews
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 23:Code-Review +1
3 comments:
Patchset:
base LGTM
File base/android/java_handler_thread.cc:
Patch Set #23, Line 135: DCHECK(thread_id_);
Can we DCHECK start was called like the documentation requests?
File ui/gfx/rendering_stage_scheduler.h:
Patch Set #23, Line 27: virtual ~RenderingStageScheduler() {}
use '= default' to define a trivial destructor (https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-equals-default.html)
Please fix.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
2 comments:
File base/android/java_handler_thread.cc:
Patch Set #23, Line 135: DCHECK(thread_id_);
Can we DCHECK start was called like the documentation requests?
added a DCHECK-only bool initialized_
File ui/gfx/rendering_stage_scheduler.h:
Patch Set #23, Line 27: virtual ~RenderingStageScheduler() {}
> use '= default' to define a trivial destructor (https://clang.llvm. […]
Done
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
Patch set 24:Code-Review +1
4 comments:
File ui/gfx/rendering_pipeline.h:
Patch Set #24, Line 45: CreateRendererMain
Wondering if we should just have a single Create(const char* pipeline_name).. wdyt?
File ui/gfx/rendering_pipeline.cc:
Patch Set #20, Line 92: base::TimeTicks start_time_active_task_;
add GUARDED_BY annotations
Resolving.
File ui/gfx/rendering_pipeline.cc:
Patch Set #24, Line 72: DCHECK(start_time_active_task_.is_null());
Could we end up here with a non-null start time if we had a very quick succession of SetEnabled(false) and SetEnabled(true), without having executed any of the UpdateOnTargetThread tasks?
(imagining a case where we skip a DidProcessTask because enabled_ was briefly false, but don't manage to unregister the task observer before enabled_ turns back to true, and another WillProcessTask gets executed)
// TODO(crbug.com/1157620): Also add foreground worker thread. There seems to
// be a bug with setting up the task observer.
should this comment move somewhere else?
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Eric Seckler, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
4 comments:
Patchset:
just replying
File ui/gfx/rendering_pipeline.h:
Patch Set #24, Line 45: CreateRendererMain
Wondering if we should just have a single Create(const char* pipeline_name).. […]
I thought about it, and left it like this. The reason is this class shouldn't be created "randomly", and any new usage probably should be reviewed by an owner here. (And might be a good idea to DCHECK each type is created only once.)
File ui/gfx/rendering_pipeline.cc:
Patch Set #24, Line 72: DCHECK(start_time_active_task_.is_null());
Could we end up here with a non-null start time if we had a very quick succession of SetEnabled(fals […]
I think this is the sequence you are worried about?
1) SetEnabled(false)
2) WillProcessTask
3) SetEnabled(true)
4) UpdateOnTargetThread(false) from 1)
5) # DidProcessTask skipped here
6) UpdateOnTargetThread(true) from 3)
7) WillProcessTask
Most things are lock protected so ok to think about "sequence". And explicitly consider SetEnabled separately from UpdateOnTargetThread. In that sequence, 6) will reset start_time_active_task_ on line 108.
// TODO(crbug.com/1157620): Also add foreground worker thread. There seems to
// be a bug with setting up the task observer.
should this comment move somewhere else?
Oh I added that support now. Will remove this.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
1 comment:
File ui/gfx/rendering_pipeline.cc:
Patch Set #24, Line 72: DCHECK(start_time_active_task_.is_null());
I think this is the sequence you are worried about? […]
I was thinking a little differently (where multiple WillProcessTask & DidProcessTask are early-outing between SetEnabled(false) and SetEnabled(true)), but I think this is taken care of by the reset of the start time in line 80. So lgtm 😊
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Khushal, Sadrul Chowdhury, Sami Kyöstilä.
3 comments:
Patchset:
Sadrul: Ping! Do you have the cycles to review this, or should I pick another ui/ owner?
File ui/gfx/rendering_pipeline.cc:
// TODO(crbug.com/1157620): Also add foreground worker thread. There seems to
// be a bug with setting up the task observer.
Oh I added that support now. Will remove this.
Removed
File ui/gfx/rendering_stage_scheduler.cc:
Patch Set #18, Line 53: android so
nit: Please mention libAdpf. […]
Done
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Bo, Khushal, Sami Kyöstilä.
Patch set 26:Code-Review +1
1 comment:
Patchset:
LGTM Sorry about the delay.
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Jonathan Backer, Khushal, Sami Kyöstilä.
Patch set 26:Commit-Queue +2
1 comment:
Patchset:
Thanks. I think that's all the owners needed. Going in
To view, visit change 2518721. To unsubscribe, or for help writing mail filters, visit settings.
Chromium LUCI CQ submitted this change.
gfx/android: Introduce rendering pipeline integration with ADPF.
Adds the setup to track execution time of rendering threads to provide
feedback of actual vs targeted execution time for these threads to the
platform CPU scheduler.
Bug: 1157620
Change-Id: I20ca94679f1bb5f50fafe669e675d2cc6243428e
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2518721
Reviewed-by: Sadrul Chowdhury <sad...@chromium.org>
Reviewed-by: Eric Seckler <esec...@chromium.org>
Reviewed-by: danakj <dan...@chromium.org>
Commit-Queue: Bo <bo...@chromium.org>
Auto-Submit: Khushal <khusha...@chromium.org>
Cr-Commit-Position: refs/heads/master@{#850209}
---
M base/android/java_handler_thread.cc
M base/android/java_handler_thread.h
M base/task/current_thread.cc
M base/task/current_thread.h
M base/task/sequence_manager/sequence_manager_impl.cc
M ui/gfx/BUILD.gn
A ui/gfx/rendering_pipeline.cc
A ui/gfx/rendering_pipeline.h
A ui/gfx/rendering_stage_scheduler.cc
A ui/gfx/rendering_stage_scheduler.h
10 files changed, 625 insertions(+), 2 deletions(-)