Tracing questions and bugs?

263 views
Skip to first unread message

Bruce Dawson

unread,
Feb 24, 2015, 8:24:33 PM2/24/15
to tra...@chromium.org
I've recently started using about://tracing to investigate some performance issues in Chrome, with help from Nat Duca, and he suggested that I bring the discussion out to the tracing mailing list.

The page at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs says to select "Manually select settings" and then leave the default boxes checked, but that doesn't work if you've ever changed the settings. Is there a way to reset to the defaults? If not then we should document what the defaults are.

My test scenario is to start Canary, start tracing, open a couple of tabs (each time navigating to slashdot.org), close the tabs, and then stop tracing.

Each trace shows a large box on the TotalDiscardableMemoryUsage timeline. If I click that box then everything disappears. If I try zooming then the screen is redrawn with garbage lines on it. The only way to recover from this is to reload the trace.

On Nat's advice I checked ipc.flow and toplevel.flow when recording the trace, and checked Flow events when viewing the trace. My goal was to trace the JumpList::PostRunUpdate callbacks. With Flow events checked I can see lots of elegantly curve arrows connecting different events, but there are no arrows connecting to the JumpList::PostRunUpdate events on the Chrome_FileThread. I've manually traced back several levels using the debugger but clearly this doesn't scale well to customer machines. Is there some additional setting I need to check, or is there code that needs to be added to trace these tasks as well?

It would be very helpful to know where these tasks fit into the overall time-scale of the browser session. In particular, I'd like to see when I opened and closed the tabs on the timeline. Is that information there?

There also seem to be some presentation bugs at high zoom levels. If I zoom in far enough then sometimes the text and box get misaligned (not fatal) and sometimes the box disappears entirely! It feels like a float precision bug, although I'm not sure.

Any help, especially with how to get the complete ancestry for JumpList::PostRunUpdate, would be greatly appreciated.

--
Bruce Dawson

Dan Sinclair

unread,
Feb 25, 2015, 9:25:29 AM2/25/15
to Bruce Dawson, tra...@chromium.org
On Tue Feb 24 2015 at 8:24:35 PM 'Bruce Dawson' via tracing <tra...@chromium.org> wrote:
The page at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs says to select "Manually select settings" and then leave the default boxes checked, but that doesn't work if you've ever changed the settings. Is there a way to reset to the defaults? If not then we should document what the defaults are.

The only thing with defaults are the preset modes. So, if you select web developer or one of those then you get those presets. If you manually set them, there are no default 'manual' categories. You can switch between the preset modes and manual on each record run as needed.


Each trace shows a large box on the TotalDiscardableMemoryUsage timeline. If I click that box then everything disappears. If I try zooming then the screen is redrawn with garbage lines on it. The only way to recover from this is to reload the trace.

Can you use the 'save' button to dump out the trace and email it to me? I can try to take a look and see if it repros here and how to fix it. (Note, if you have other tabs open like Gmail or such, those tabs will also be grabbed, so it's best to run in a browser that you've opened just for tracing)

What do you mean garbage lines? LIke, it's trying to draw the events and they come out wrong? (If possible, a screenshot would help clarify what the issue is).

 
On Nat's advice I checked ipc.flow and toplevel.flow when recording the trace, and checked Flow events when viewing the trace. My goal was to trace the JumpList::PostRunUpdate callbacks. With Flow events checked I can see lots of elegantly curve arrows connecting different events, but there are no arrows connecting to the JumpList::PostRunUpdate events on the Chrome_FileThread. I've manually traced back several levels using the debugger but clearly this doesn't scale well to customer machines. Is there some additional setting I need to check, or is there code that needs to be added to trace these tasks as well?


It sounds like there are no FLOW events logged for that event. It's possibly being logged with a regular TRAC_EVENTx marcro. You'd need to go look at that method in the source and see how it's currently traced. If it's not, and should be, using a FLOW event, can you file a crbug.com bug with Performance-Tracing label and we can take a look (or feel free to put up a patch if you so desire).

 
There also seem to be some presentation bugs at high zoom levels. If I zoom in far enough then sometimes the text and box get misaligned (not fatal) and sometimes the box disappears entirely! It feels like a float precision bug, although I'm not sure.


There are a few issues at high zoom (the instant events disappear eventually). We haven't had time to track them down, but they do need to be fixed at some point. If this is causing persistent issues please let me know and I can try to investigate more.

 Thanks,
dan

Annie Sullivan

unread,
Feb 25, 2015, 11:29:43 AM2/25/15
to Dan Sinclair, Bruce Dawson, tra...@chromium.org
On Wed, Feb 25, 2015 at 9:25 AM, Dan Sinclair <dsin...@chromium.org> wrote:
On Tue Feb 24 2015 at 8:24:35 PM 'Bruce Dawson' via tracing <tra...@chromium.org> wrote:
The page at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs says to select "Manually select settings" and then leave the default boxes checked, but that doesn't work if you've ever changed the settings. Is there a way to reset to the defaults? If not then we should document what the defaults are.

The only thing with defaults are the preset modes. So, if you select web developer or one of those then you get those presets. If you manually set them, there are no default 'manual' categories. You can switch between the preset modes and manual on each record run as needed.

The ones that are checked initially for manual seem to be everything under "Record Categories" and nothing under "Disabled by Default Categories", right? Maybe we could just change the button text under "Record Categories" to "All (default)" and under "Disabled by Default Categories" to "None (default)"? Sorry if I'm misunderstanding!
 

Each trace shows a large box on the TotalDiscardableMemoryUsage timeline. If I click that box then everything disappears. If I try zooming then the screen is redrawn with garbage lines on it. The only way to recover from this is to reload the trace.

Can you use the 'save' button to dump out the trace and email it to me? I can try to take a look and see if it repros here and how to fix it. (Note, if you have other tabs open like Gmail or such, those tabs will also be grabbed, so it's best to run in a browser that you've opened just for tracing)

What do you mean garbage lines? LIke, it's trying to draw the events and they come out wrong? (If possible, a screenshot would help clarify what the issue is).

I'm hoping to ramp up a bit on trace-viewer code. Would be awesome if you could file a bug with the screenshot attached. Maybe upload the trace to google drive and share with Googlers? Please note Dan's warning above that it captures other tabs. Here is the trace-viewer issue tracker link to file bugs: https://github.com/google/trace-viewer/issues/new
 
On Nat's advice I checked ipc.flow and toplevel.flow when recording the trace, and checked Flow events when viewing the trace. My goal was to trace the JumpList::PostRunUpdate callbacks. With Flow events checked I can see lots of elegantly curve arrows connecting different events, but there are no arrows connecting to the JumpList::PostRunUpdate events on the Chrome_FileThread. I've manually traced back several levels using the debugger but clearly this doesn't scale well to customer machines. Is there some additional setting I need to check, or is there code that needs to be added to trace these tasks as well?


It sounds like there are no FLOW events logged for that event. It's possibly being logged with a regular TRAC_EVENTx marcro. You'd need to go look at that method in the source and see how it's currently traced. If it's not, and should be, using a FLOW event, can you file a crbug.com bug with Performance-Tracing label and we can take a look (or feel free to put up a patch if you so desire).

 
There also seem to be some presentation bugs at high zoom levels. If I zoom in far enough then sometimes the text and box get misaligned (not fatal) and sometimes the box disappears entirely! It feels like a float precision bug, although I'm not sure.


There are a few issues at high zoom (the instant events disappear eventually). We haven't had time to track them down, but they do need to be fixed at some point. If this is causing persistent issues please let me know and I can try to investigate more.

Would be awesome to file a trace-viewer bug here as well so I can follow along!
 
 Thanks,
dan

--
You received this message because you are subscribed to the Google Groups "tracing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tracing+u...@chromium.org.
To post to this group, send email to tra...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tracing/CAMYH%3DOh7rkX_wQhPMPcL3Y1xiXB_Cw%2BkgHP2Q%2Bdyz%3DOhQNuxRw%40mail.gmail.com.

Dan Sinclair

unread,
Feb 25, 2015, 11:34:02 AM2/25/15
to Annie Sullivan, Dan Sinclair, Bruce Dawson, tra...@chromium.org
On Wed Feb 25 2015 at 11:29:44 AM Annie Sullivan <sull...@google.com> wrote:
The ones that are checked initially for manual seem to be everything under "Record Categories" and nothing under "Disabled by Default Categories", right? Maybe we could just change the button text under "Record Categories" to "All (default)" and under "Disabled by Default Categories" to "None (default)"? Sorry if I'm misunderstanding!


It tries to be smart, so it will restore the categories you selected previously. This makes it nicer when you do multiple traces at the same time. I'm guessing, at some point in the past, you selected All and haven't changed it since?

If it is defaulting selecting everything, I think that's a bug.
 
 
Would be awesome to file a trace-viewer bug here as well so I can follow along!
 

Instant trace events disappear when you zoom in

Trace viewer clips rectangles when zooming in

dan

Annie Sullivan

unread,
Feb 25, 2015, 11:38:55 AM2/25/15
to Dan Sinclair, Bruce Dawson, tra...@chromium.org
On Wed, Feb 25, 2015 at 11:34 AM, Dan Sinclair <dsin...@chromium.org> wrote:
On Wed Feb 25 2015 at 11:29:44 AM Annie Sullivan <sull...@google.com> wrote:
The ones that are checked initially for manual seem to be everything under "Record Categories" and nothing under "Disabled by Default Categories", right? Maybe we could just change the button text under "Record Categories" to "All (default)" and under "Disabled by Default Categories" to "None (default)"? Sorry if I'm misunderstanding!


It tries to be smart, so it will restore the categories you selected previously. This makes it nicer when you do multiple traces at the same time. I'm guessing, at some point in the past, you selected All and haven't changed it since?

If it is defaulting selecting everything, I think that's a bug.

Oops, yeah, I just tried this on a fresh install, and none of the categories in either column is selected by default. "Continuous tracing" and "System tracing" are checked, and "State sampling" is unchecked.

Would it help to change the buttons to "None (default)" then?
 
Would be awesome to file a trace-viewer bug here as well so I can follow along!
 

Instant trace events disappear when you zoom in

Trace viewer clips rectangles when zooming in

Thanks!


Dan Sinclair

unread,
Feb 25, 2015, 11:44:25 AM2/25/15
to Annie Sullivan, Dan Sinclair, Bruce Dawson, tra...@chromium.org
On Wed Feb 25 2015 at 11:38:58 AM Annie Sullivan <sull...@google.com> wrote:
Oops, yeah, I just tried this on a fresh install, and none of the categories in either column is selected by default. "Continuous tracing" and "System tracing" are checked, and "State sampling" is unchecked.

Would it help to change the buttons to "None (default)" then?
 

I'm not sure how helpful that would be as tracing the default isn't really useful, heh.

Maybe just a matter of fixing up any docs which suggest to changing to the defaults?

dan

 

Bruce Dawson

unread,
Feb 25, 2015, 1:00:30 PM2/25/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
The docs should be updated to specify what the recommend settings are, since saying "leave the default boxes checked" is not helpful advice. The functionality of remember the settings is good and shouldn't be changed.

If making it easy to return to the startup-defaults is important then we could add a reset-to-defaults button, but otherwise just a documentation should suffice, at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs. I'd fix it but I don't have comment or edit permissions on that page. Should I file a bug?



I'll see about adding a FLOW event to my callbacks, to see if that helps with tracing.



Bug 601 describes the main issue I have when zooming in really far. With the 'instant' events it seems inevitable that people will keep zooming in until the event becomes wide enough to easily be clicked on, so this may need a UI-design fix more than anything else -- clamp the zoom level and make sure that the instant events expand to 3-5 pixels wide by this point? Having unselectable events seems cruel :-(



For the TotalDiscardableMemoryUsage issue I'll file a bug later (I don't have my github credentials handy) but for now I've uploaded one of the traces that shows the problem to here (sorry, google.com only) and here are some screenshots:

In this screen-shot you can see the purple box that I am about to click on:

Inline image 1

After clicking the same time range is being displayed, but there is no data:

Inline image 2

Zooming a bit leads to these artifacts:

Inline image 3

This happens every time I trace on canary.



Thanks for the help.
--
Bruce Dawson

Dan Sinclair

unread,
Feb 25, 2015, 1:05:42 PM2/25/15
to Bruce Dawson, Dan Sinclair, Annie Sullivan, tra...@chromium.org
On Wed Feb 25 2015 at 1:00:32 PM 'Bruce Dawson' via tracing <tra...@chromium.org> wrote:
If making it easy to return to the startup-defaults is important then we could add a reset-to-defaults button, but otherwise just a documentation should suffice, at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs. I'd fix it but I don't have comment or edit permissions on that page. Should I file a bug?

Permissions don't work even with a @chromium account?  

 
I'll see about adding a FLOW event to my callbacks, to see if that helps with tracing.


Awesome, let me know if you have any questions.
 

Bug 601 describes the main issue I have when zooming in really far. With the 'instant' events it seems inevitable that people will keep zooming in until the event becomes wide enough to easily be clicked on, so this may need a UI-design fix more than anything else -- clamp the zoom level and make sure that the instant events expand to 3-5 pixels wide by this point? Having unselectable events seems cruel :-(


They are selectable. Clicking on them is a pain, but you can drag over them and they should get selected. If that's not the case, that's a bug.

 
dan

Bruce Dawson

unread,
Feb 25, 2015, 1:32:50 PM2/25/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
I tried a few more times and managed to get logged in on the documentation page, with edit permissions, but it's not clear what the documentation should say. Annie's investigation showed that the two columns of check-boxes default to being all unchecked (with Continuous tracing and System tracing checked). However Dan said "I'm not sure how helpful that would be as tracing the default isn't really useful", so changing the docs to clarify what the defaults are wouldn't really be helpful, since that isn't what we want.

So, what is an appropriate set of flags? And is there documentation of what the various flags enable, to allow for easier experimentation? I would never have figured out ipc.flow and toplevel.flow without Nat's help.


--
Bruce Dawson

Dan Sinclair

unread,
Feb 25, 2015, 1:37:40 PM2/25/15
to Bruce Dawson, Dan Sinclair, nduca, Annie Sullivan, tra...@chromium.org
I think the problem is, it's highly context senstitve. This is the main driving force behind the preselected categories. So, if you want WebDeveloper stuff you select that. You should, really, only go into manual when you know what you want.

There is no master list of what is in there or what they are (we don't actually know in trace viewer, we get them from Chrome, it's the list of events that we've seen so far).

We could possibly list out the, if you want to see the flow of IPC messages turn on 'ipc.flow and toplevel.flow' but outside that, I'm not sure how much direction we can give.

Nat, any advice on this one?

dan

Nat Duca

unread,
Feb 25, 2015, 5:56:19 PM2/25/15
to Bruce Dawson, Charlie Andrews, Dan Sinclair, Annie Sullivan, tra...@chromium.org
​Wrt there being not enough documentation on what is appropriate, this is something you learn by doing.

I'm afraid that are expert tools --- we're focused on ensuring we have tech for solving hard problems but at chrome scale.... this means that its not easy to get going. Honestly, the best strategy here is determination, reaching out to this mailing list with questions anytime, writing down "how i used tracing to do X" posts here and elsewhere, and when it makes sense, helping us improve the UI.

I do think there's a world of good that could come from someone trying to learn the right UI for this and then overhauling our recording dialog experience. Its being maintained kinda, but only with incremental fixes. I do think that how we present this to users is murky at best.

Dan Sinclair

unread,
Feb 26, 2015, 1:19:39 PM2/26/15
to Bruce Dawson, Dan Sinclair, Annie Sullivan, tra...@chromium.org
What version of chrome are you seeing the rendering errors on? I'm on Linux 42.0.2305.3 (Official Build) dev (64-bit). Clicking on the purple box is fine here. Zooming in didn't get the rendering artifacts, though I did get the 'wait' message after holding down the zoom button for a while.

dan



On Wed Feb 25 2015 at 1:00:32 PM 'Bruce Dawson' via tracing <tra...@chromium.org> wrote:
The docs should be updated to specify what the recommend settings are, since saying "leave the default boxes checked" is not helpful advice. The functionality of remember the settings is good and shouldn't be changed.

If making it easy to return to the startup-defaults is important then we could add a reset-to-defaults button, but otherwise just a documentation should suffice, at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs. I'd fix it but I don't have comment or edit permissions on that page. Should I file a bug?



I'll see about adding a FLOW event to my callbacks, to see if that helps with tracing.



Bug 601 describes the main issue I have when zooming in really far. With the 'instant' events it seems inevitable that people will keep zooming in until the event becomes wide enough to easily be clicked on, so this may need a UI-design fix more than anything else -- clamp the zoom level and make sure that the instant events expand to 3-5 pixels wide by this point? Having unselectable events seems cruel :-(



For the TotalDiscardableMemoryUsage issue I'll file a bug later (I don't have my github credentials handy) but for now I've uploaded one of the traces that shows the problem to here (sorry, google.com only) and here are some screenshots:

In this screen-shot you can see the purple box that I am about to click on:



After clicking the same time range is being displayed, but there is no data:



Zooming a bit leads to these artifacts:



This happens every time I trace on canary.



Thanks for the help.

On Wed, Feb 25, 2015 at 8:44 AM, Dan Sinclair <dsin...@chromium.org> wrote:
On Wed Feb 25 2015 at 11:38:58 AM Annie Sullivan <sull...@google.com> wrote:
Oops, yeah, I just tried this on a fresh install, and none of the categories in either column is selected by default. "Continuous tracing" and "System tracing" are checked, and "State sampling" is unchecked.

Would it help to change the buttons to "None (default)" then?
 

I'm not sure how helpful that would be as tracing the default isn't really useful, heh.

Maybe just a matter of fixing up any docs which suggest to changing to the defaults?

dan

 



--
Bruce Dawson

--
You received this message because you are subscribed to the Google Groups "tracing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tracing+u...@chromium.org.
To post to this group, send email to tra...@chromium.org.

Bruce Dawson

unread,
Feb 26, 2015, 2:16:18 PM2/26/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
I just tried four different browsers (Stable on Linux and Windows, canary on Windows, locally built browser) and canary/locally-built browser are the only ones that shows the rendering errors.

So, as a workaround I can always load my canary traces into stable, for now. But, it looks like this is a recent regression.


--
Bruce Dawson

Jack Yang

unread,
Feb 27, 2015, 10:46:52 AM2/27/15
to tra...@chromium.org
Sorry! I clicked reply instead of reply all. See resolution to rendering issue in previous reply.

---------- Forwarded message ----------
From: Jack Yang <jack...@google.com>
Date: Fri, Feb 27, 2015 at 10:42 AM
Subject: Re: Tracing questions and bugs?
To: Bruce Dawson <bruce...@google.com>


Hey Bruce,

I believe the rendering issue should be fixed by https://github.com/google/trace-viewer/commit/a1344c66db02489f11613d725859cec2de374933 which went in last week. Once I removed that fix I was able to reproduce the exact problem you were having: 
Inline image 1

Jack



Nat Duca

unread,
Feb 27, 2015, 1:42:36 PM2/27/15
to Jack Yang, Dan Sinclair, tra...@chromium.org
Has someone rolled trace-viewer deps recently? I wonder if thats why this is slipped in.

Dan Sinclair

unread,
Feb 27, 2015, 1:44:31 PM2/27/15
to Nat Duca, Jack Yang, tra...@chromium.org
Last deps roll was 9 days ago when we rolled too: Rename source_prereqs to inputs in the rules.gni file. 

dan


Jack Yang

unread,
Feb 27, 2015, 1:47:38 PM2/27/15
to Bruce Dawson, tra...@chromium.org
+ Bruce's question from earlier. I'm not too familiar with the whole upstreaming to chromium process or how often the DEPS are updated


On Fri, Feb 27, 2015 at 12:42 PM, Bruce Dawson <bruce...@google.com> wrote:
Hmmm. That could explain my issues in my local builds (I'm synced to Feb 19th) but not canary. I'm running a canary build from yesterday and it shows the problem. Thoughts?

Do we have to wait for the trace viewer changes to roll into chrome?
--
Bruce Dawson

Jack Yang

unread,
Feb 27, 2015, 1:56:04 PM2/27/15
to Nat Duca, Bruce Dawson, tra...@chromium.org, Dan Sinclair
Haha.. I'd love to! And I think I just removed Bruce from this thread... re-adding now. I obviously don't have enough experience with email either.

On Fri, Feb 27, 2015 at 1:48 PM, Nat Duca <nd...@chromium.org> wrote:
Lets teach our fearless intern how to roll DEPS! :D

Bruce Dawson

unread,
Feb 27, 2015, 2:48:35 PM2/27/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
I think I'm missing something, and I think there are some bugs. I think my fundamental question is how the three times associated with a flow task (posting time, execution start, execution end) are stored and correlated together from a .json file -- if I understood that then I could reverse engineer the rest.


I've been trying to figure out the flow events (lines connecting various events) and why I'm not seeing them where I need them. My hope is that flow events would say what function/source-file/line-of-code posted a task, and what the task was. I'm not seeing that happening for any events. When I select a posted task I see src_func/src_file which tell me where the message was posted from, and Start/Wall Duration which tell me about when the task was executed. Missing data is:
  • The function that was actually executed (important if multiple tasks are posted from one function)
  • When the task was posted
This information is missing for all events, even when flow arrows are shown, although when flow arrows are shown you can at least attempt to locate the posting time by following the line, however the name of the function being run seems to always be missing:

Inline image 1

I tried finding this information in the actual .json files but I was unable to figure out how PostTask and RunTask events are correlated. The PostTask events have an "id" tag which allows matching up the "s" and "f" phases, however the RunTask data doesn't have an "id" tag so I don't know how those are matched up. It seems that every posted task must have three timestamps associated with it: posting time, starting time, and ending time. I can't figure out how those are stored in the traces -- I can find two times for PostTask and a time and duration for RunTask. What am I missing? If I understood the expectations here then I could debug why my code sometimes doesn't match them. Currently I can't figure out what task is associated with a PostTask event, and what posting time is associated with a RunTask event.



As suggested I tried to figured out why my tasks don't have flow arrows. The code (in JumpList::PostRunUpdate) followed the usual PostTask path which always uses the flow macros. So I added a loop that started five fake tasks. Once I did this then I started getting flow arrows for the fake tasks and for my real task as well!!! Well that's weird... The modified code is shown below -- I can share a CL if somebody is interested.

Finally, the tracking of when a command is issued does not seem to be accurate. In the test whose trace is shown below I issued six BrowserThread::FILE callbacks each time a tab closed -- one long and then five short. All of the events show up but for some reason the start time of the last event in each batch is wrong. There should be six with the same start time, and then six more starting about a second later. Instead the grouping is 5/6/1.

Inline image 1

The code to post the events is very simple:

  BrowserThread::PostTask(
      BrowserThread::FILE, FROM_HERE,
      base::Bind(&JumpList::RunUpdateOnFileThread,
                 this,
                 incognito_availability));

  PostTestCallback();



void JumpList::PostTestCallback() {
  for (int i = 0; i < 5; ++i) {
    BrowserThread::PostTask(
        BrowserThread::FILE, FROM_HERE,
        base::Bind(&JumpList::MyTestCallback,
                   this, true));
  }
}

void JumpList::MyTestCallback(bool value) {
  (void)value;
  Sleep(200);
}

Any thoughts to unblock my confused mind would be appreciated.
--
Bruce Dawson

Dan Sinclair

unread,
Feb 27, 2015, 2:57:55 PM2/27/15
to Bruce Dawson, Annie Sullivan, tra...@chromium.org
On Fri, Feb 27, 2015 at 2:48 PM Bruce Dawson <bruce...@google.com> wrote:
I think I'm missing something, and I think there are some bugs. I think my fundamental question is how the three times associated with a flow task (posting time, execution start, execution end) are stored and correlated together from a .json file -- if I understood that then I could reverse engineer the rest.


For the format of the .json you can take a look at: 

The async event and flow events sections would be the relevant parts.


 dan

 

Dan Sinclair

unread,
Feb 27, 2015, 3:04:10 PM2/27/15
to Bruce Dawson, Annie Sullivan, tra...@chromium.org
On Fri, Feb 27, 2015 at 2:48 PM Bruce Dawson <bruce...@google.com> wrote:
I've been trying to figure out the flow events (lines connecting various events) and why I'm not seeing them where I need them. My hope is that flow events would say what function/source-file/line-of-code posted a task, and what the task was. I'm not seeing that happening for any events. When I select a posted task I see src_func/src_file which tell me where the message was posted from, and Start/Wall Duration which tell me about when the task was executed. Missing data is:
  • The function that was actually executed (important if multiple tasks are posted from one function)
  • When the task was posted

We don't know what the content of the posted message is unless it's been recorded into a trace event. From what I've seen you should be able to follow the flow arc (If you're on an event that starts an arc the arrow keys will follow the arcs) it should take you to the point when the message was processed and you'll be able to see what was run at that point.

Not sure what you mean by when the task was posted. Do you mean when it was added to the queue, or when it was dispatched?

 

As suggested I tried to figured out why my tasks don't have flow arrows. The code (in JumpList::PostRunUpdate) followed the usual PostTask path which always uses the flow macros. So I added a loop that started five fake tasks. Once I did this then I started getting flow arrows for the fake tasks and for my real task as well!!! Well that's weird... The modified code is shown below -- I can share a CL if somebody is interested.


That's .... interesting. In the case where you don't post the 5 extra tasks the even never shows up in the .json file?


Finally, the tracking of when a command is issued does not seem to be accurate. In the test whose trace is shown below I issued six BrowserThread::FILE callbacks each time a tab closed -- one long and then five short. All of the events show up but for some reason the start time of the last event in each batch is wrong. There should be six with the same start time, and then six more starting about a second later. Instead the grouping is 5/6/1.

I don't think it's necessarily wrong. Is it possible that it jumped to doing other work so your thread was asleep for the gap of time?


dan


Bruce Dawson

unread,
Feb 27, 2015, 4:15:04 PM2/27/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
I found and looked at the .json documentation but I still couldn't figure out how it figures out the flow arrows. But I think I'm starting to figure out my confusion. It looks like the flow arrows (the "s" and "f" events) connect from a process/thread/time to a process/thread/time. I had assumed that the flow arrows also connect to a task that was run on the target thread, but it looks like they do not. It looks like that is just inferred by the person looking at the visualization. Is that correct?

The arrow keys for navigating along arcs look very handy. But that then confuses me again. If I select a task (MessageLoop::RunTask box) and type the left-arrow key then it selects a flow arrow -- how does it know which flow arrow to select? I think that is my fundamental question. My best guess is that the visualizer uses heuristics and just looks for a flow arrow that is 'nearby'. That could also explain why the flow arrows don't seem to end right at the start or end of the task boxes.

> In the case where you don't post the 5 extra tasks the even never shows up in the .json file?

When I don't post the 5 extra tasks then the flow arrows for the 'real' task don't show up in the visualization. The real task shows up in the .json as MessageLoop::RunTask events, with "ph":"X". I don't know if the flow data is in the .json file because I don't know how to find the flow data for a particular task in the .json file (see the fundamental question above).

I hope this makes sense.
--
Bruce Dawson

Dan Sinclair

unread,
Feb 27, 2015, 4:29:58 PM2/27/15
to Bruce Dawson, Annie Sullivan, tra...@chromium.org
On Fri, Feb 27, 2015 at 4:15 PM 'Bruce Dawson' via tracing <tra...@chromium.org> wrote:
I found and looked at the .json documentation but I still couldn't figure out how it figures out the flow arrows. But I think I'm starting to figure out my confusion. It looks like the flow arrows (the "s" and "f" events) connect from a process/thread/time to a process/thread/time. I had assumed that the flow arrows also connect to a task that was run on the target thread, but it looks like they do not. It looks like that is just inferred by the person looking at the visualization. Is that correct?


There are 3 types of events associated with flow events s, t and f. Start, step and finish. You can have a flow arc which goes through more then one stop in the code (I, think, input does this to track events through the system).  The events will have a process ID, thread ID, category, name and ID. Those things are how we track them, the ID has to be unique and is how they're linked together.

So, we output a MessageLoop::PostTask event with id 0x1 pid 1 tid 2 phase 's' and later we output a MessageLoop::FooTask even with id 0x1 pid 1 tid 2 phase 'f' then the viewer will link them together with the arrows. I can't remember if the arrows directly touch the events they come from/to, if they don't it's probably a bug due to me doing it badly.

The flow arcs don't know what task will be executed by the RunTask, they just know which thread picked it up and when it was processed. You're right in that you have to infer what is run based on the visualization but it isn't too bad as only one thing runs on a thread at a time and it should be the thing immediately after the task is received.

Does that make sense?

 
The arrow keys for navigating along arcs look very handy. But that then confuses me again. If I select a task (MessageLoop::RunTask box) and type the left-arrow key then it selects a flow arrow -- how does it know which flow arrow to select? I think that is my fundamental question. My best guess is that the visualizer uses heuristics and just looks for a flow arrow that is 'nearby'. That could also explain why the flow arrows don't seem to end right at the start or end of the task boxes.

The arrows aren't a special concept in the trace file. They're part of the flow event with name MessageLoop::RunTask. That MessageLoop::RunTask has only one arc coming from it so that's the arc it follows. It doesn't pick one near-by it picks the one for that event.


 
> In the case where you don't post the 5 extra tasks the even never shows up in the .json file?

When I don't post the 5 extra tasks then the flow arrows for the 'real' task don't show up in the visualization. The real task shows up in the .json as MessageLoop::RunTask events, with "ph":"X". I don't know if the flow data is in the .json file because I don't know how to find the flow data for a particular task in the .json file (see the fundamental question above).


It sounds like something else is going on. If the task has a phase 'X' then it was output as a differnt type of event then a FLOW event. The X events are output from the TRACE_EVENT{0,1,2} macros not the TRACE_EVENT_FLOW_* macros.

Could it be following a different code path or something?

dan

Bruce Dawson

unread,
Mar 2, 2015, 7:26:15 PM3/2/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
I hacked up some Python to load my .json file and do some investigations and it looks to my untutored eye as if there is data for some of my flow arrows, but no arrows.

My expectation is that each "s"/"f" pair should result in a flow arrow, and I am definitely seeing such pairs in my data that don't have flow arrows. So. Huh.

This is based on analysis of this trace:


I looked through the trace data for MessageLoop::RunTask events on the Chrome_FileThread (which I knew in this case had pid/tid of 14180/18072). For each of these events I then looked for the preceding MessageLoop::PostTask ("ph" == "f") on the same thread, and then used the id of that trace-event to find the associated MessageLoop::PostTask ("ph" == "s") on some other thread.

One immediate problem is that while I found 13 "f" events that met this criteria I found only 3 of the associated "s" events. I assume that this means that the others were lost off of the front of the circular buffers -- I guess that is normal and expected, but it is worrisome how quickly the buffers wrap around. The last two JumpList::PostRunUpdate events seem most relevant, so here are those two RunTask events along with the PostTask "s" and "f" events that seem to be associated with them:

{"pid":14180,"tid":13272,"ts":969319964806,"ph":"s","cat":"disabled-by-default-toplevel.flow","name":"MessageLoop::PostTask","args":{},"id":"0xaf54d93559f40933"},
{"pid":14180,"tid":13272,"ts":969320874795,"ph":"s","cat":"disabled-by-default-toplevel.flow","name":"MessageLoop::PostTask","args":{},"id":"0xaf54d93659f40933"},
{"pid":14180,"tid":18072,"ts":969319964857,"ph":"f","cat":"disabled-by-default-toplevel.flow","name":"MessageLoop::PostTask","args":{"queue_duration":0},"id":"0xaf54d93559f40933"},
{"pid":14180,"tid":18072,"ts":969319964858,"ph":"X","cat":"toplevel","name":"MessageLoop::RunTask","args":{"src_file":"c:\\b\\build\\slave\\win64\\build\\src\\chrome\\browser\\jumplist_win.cc","src_func":"JumpList::PostRunUpdate"},"dur":657889},
{"pid":14180,"tid":18072,"ts":969320874824,"ph":"f","cat":"disabled-by-default-toplevel.flow","name":"MessageLoop::PostTask","args":{"queue_duration":0},"id":"0xaf54d93659f40933"},
{"pid":14180,"tid":18072,"ts":969320874825,"ph":"X","cat":"toplevel","name":"MessageLoop::RunTask","args":{"src_file":"c:\\b\\build\\slave\\win64\\build\\src\\chrome\\browser\\jumplist_win.cc","src_func":"JumpList::PostRunUpdate"},"dur":573626},

In both cases the RunTask started just one microsecond after the PostTask "f" event, on the same thread, and I assume that is how the PostTask and RunTask events are connected together by the TraceViewer. But neither of these events get flow arrows when I load the trace.

So, I think the very specific question now is: why don't the two "s"/"f" pairs shown above result in flow arrows being drawn from thread 13272 to 18072? I'd be happy to try debugging this if given some pointers as  to where to look. It looks like some filter (sanity checking) is discarding these "s"/"f" pairs for some reason.

I've attached my hacked up Python script, FWIW.

The six lines above were filtered out with this very awesome DOS command:
type "c:\Users\brucedawson\Downloads\trace (1).json" | findstr "0xaf54d93559f40933 0xaf54d93659f40933 969320874825 969319964858"

--
Bruce Dawson
match_trace_events.py

Bruce Dawson

unread,
Mar 3, 2015, 3:18:40 PM3/3/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
It turns out most of my issues were user error (or perhaps, UI confusion). When the flow arrows cover a very small time range then they fade out entirely when zoomed out, and you may have to zoom in a very long way before they start becoming visible.

In other words, my complaints about missing flow errors were, in many cases, a lie.

I'll think about ways to avoid this in the future with UI and documentation tweaks. Thanks for your patience.
--
Bruce Dawson

Dan Sinclair

unread,
Mar 3, 2015, 3:20:56 PM3/3/15
to Bruce Dawson, Dan Sinclair, Annie Sullivan, tra...@chromium.org
Cool, thanks a lot for digging into this. When drawing there is a minimum size we can draw arrows at before they become, either noise or single pixels. Not sure of a better way we could handle this in the UI.

Since I've lost track, is there anything I need to look into/explain at this point, or has everything been explained sufficiently to let you keep moving forward?

Thanks,
dna




--
You received this message because you are subscribed to the Google Groups "tracing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tracing+u...@chromium.org.
To post to this group, send email to tra...@chromium.org.

Bruce Dawson

unread,
Mar 3, 2015, 4:21:26 PM3/3/15
to Dan Sinclair, Annie Sullivan, tra...@chromium.org
My ultimate goal was to get a customer to record a trace of JumpList::PostRunUpdate events. I now know how to do this, but the customer is hitting two problems:

1) The trace buffer fills to 100% extremely quickly which makes me worry that we won't get enough history to diagnose the problem. Is there a way to increase the buffer size? I tried --trace-to-file and --trace-to-file-name but then realized those were for testing only.
2) More seriously, when he tries starting tracing with the minimal settings for this problem (toplevel and toplevel.flow) he hits this error:
Error while recording: Error: Error occured at /json/categories at chrome://tracing/tracing.js:5615:22

Any thoughts on what might cause that? I feel like I'm really close to getting some great tracing data from this customer but that errors seems to be blocking us.

I have some UI ideas on how to make the flow events show up better -- I assume that filing bugs is probably the best way to suggest those?

--
Bruce Dawson

Dan Sinclair

unread,
Mar 3, 2015, 4:28:17 PM3/3/15
to Bruce Dawson, Dan Sinclair, nduca, Annie Sullivan, tra...@chromium.org
Crap, that's an open bug https://github.com/google/trace-viewer/issues/538. Nat, was there any resolution to that error categories thing?

There is no way to increase the buffer size. The best bet is to turn off any categories that you don't care about and trace the minimal amount of stuff. Sizing the trace buffer has been, historically, tricky in that we don't want to lost events, but we also don't want to run out of memory. If we SadTabPage we aren't going to get anything back either. We need a better way then the set (250k? we keep adjusting it down) events we have now.

For UI ideas, please file bugs against github.com/google/trace-viewer.

dan

Bruce Dawson

unread,
Mar 3, 2015, 4:35:02 PM3/3/15
to Dan Sinclair, nduca, Annie Sullivan, tra...@chromium.org
I've already trimmed down to the minimum events (toplevel and toplevel.flow), but those seem like the noisiest categories. It would be nice to have a command-line switch to allow overriding the limits, for expert users. On canary (now 64-bit) it will be possible for users with beefy machines to select very high numbers -- they will never run out of address space.

If there are any tests that this customer could run to investigate issue 538 then please let me know. I've had him attach windbg a few times which has been useful, although it is the highest latency debugging I've ever done.
--
Bruce Dawson

Bruce Dawson

unread,
Mar 3, 2015, 5:51:26 PM3/3/15
to Dan Sinclair, nduca, Annie Sullivan, tra...@chromium.org
Okay, I filed a few bugs. My github name is randomascii:

This is a regression in canary -- left arrow is misbehaving: https://github.com/google/trace-viewer/issues/786
This is an error (IMHO) in calculating which flow event to go to when using left arrow: https://github.com/google/trace-viewer/issues/787
And, this is my suggestion for making it easier to find short-duration flow events: https://github.com/google/trace-viewer/issues/788

I opened a couple of others but they are trivial/debatable.

And, I hope something can be figured out about issue 538.
--
Bruce Dawson

Ravi Kasibhatla

unread,
Mar 4, 2015, 12:46:40 AM3/4/15
to tra...@chromium.org, bruce...@google.com
I just verified on latest canary (windows) and linux (ToT) build. I am not seeing any error when enabling only toplevel and toplevel.flow categories.
Any particular build or platform where we are seeing the error of #538?

Regards,
kphanee

On Wednesday, February 25, 2015 at 6:54:33 AM UTC+5:30, Bruce Dawson wrote:
I've recently started using about://tracing to investigate some performance issues in Chrome, with help from Nat Duca, and he suggested that I bring the discussion out to the tracing mailing list.

The page at http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs says to select "Manually select settings" and then leave the default boxes checked, but that doesn't work if you've ever changed the settings. Is there a way to reset to the defaults? If not then we should document what the defaults are.

My test scenario is to start Canary, start tracing, open a couple of tabs (each time navigating to slashdot.org), close the tabs, and then stop tracing.

Each trace shows a large box on the TotalDiscardableMemoryUsage timeline. If I click that box then everything disappears. If I try zooming then the screen is redrawn with garbage lines on it. The only way to recover from this is to reload the trace.

On Nat's advice I checked ipc.flow and toplevel.flow when recording the trace, and checked Flow events when viewing the trace. My goal was to trace the JumpList::PostRunUpdate callbacks. With Flow events checked I can see lots of elegantly curve arrows connecting different events, but there are no arrows connecting to the JumpList::PostRunUpdate events on the Chrome_FileThread. I've manually traced back several levels using the debugger but clearly this doesn't scale well to customer machines. Is there some additional setting I need to check, or is there code that needs to be added to trace these tasks as well?

It would be very helpful to know where these tasks fit into the overall time-scale of the browser session. In particular, I'd like to see when I opened and closed the tabs on the timeline. Is that information there?

There also seem to be some presentation bugs at high zoom levels. If I zoom in far enough then sometimes the text and box get misaligned (not fatal) and sometimes the box disappears entirely! It feels like a float precision bug, although I'm not sure.

Any help, especially with how to get the complete ancestry for JumpList::PostRunUpdate, would be greatly appreciated.

--
Bruce Dawson

Bruce Dawson

unread,
Mar 4, 2015, 12:28:09 PM3/4/15
to Ravi Kasibhatla, tra...@chromium.org
Apologies -- I didn't mean to imply that toplevel+toplevel.flow was a full repro for #538. The partial information I have from the customer suggested that the error might have started when he switched to those categories, but he hasn't given details. I'll report back if I get more repro details from him.
--
Bruce Dawson

Bruce Dawson

unread,
Mar 4, 2015, 2:04:49 PM3/4/15
to Ravi Kasibhatla, tra...@chromium.org
Since the display of flow events in the trace viewer is a bit fussy (they are hard to find if they are very brief, the arrows will be omitted if the start event is missing, and the left-arrow key doesn't reliably select them) I wrote a quick-hack Python script to find all the start/finish/run events for a particular src_func. You run it like this:

python match_trace_events.py "trace (2).json" JumpList::PostRunUpdate

and it prints results like this (truncated horizontally to avoid excessive wrapping):

145418 events loaded from c:\Users\brucedawson\Downloads\trace (1).json.
Start time is 969313119586
Found 8 events that match JumpList::PostRunUpdate

StartEvents:
 0: None
 1: None
 2: None
 3: None
 4: None
 5: None
 6 at  6.845220: {u'name': u'MessageLoop::PostTask', u'args': {}, u'pid': 14180, u'ts': 969319964806L, 
 7 at  7.755209: {u'name': u'MessageLoop::PostTask', u'args': {}, u'pid': 14180, u'ts': 969320874795L, 

FinishEvents:
 0 at  1.383447: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 0}, u'pid': 14180, u'
 1 at  2.574230: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 1173}, u'pid': 14180,
 2 at  3.244510: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 1801}, u'pid': 14180,
 3 at  4.323136: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 905}, u'pid': 14180, 
 4 at  4.989111: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 1564}, u'pid': 14180,
 5 at  5.616327: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 2182}, u'pid': 14180,
 6 at  6.845271: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 0}, u'pid': 14180, u'
 7 at  7.755238: {u'name': u'MessageLoop::PostTask', u'args': {u'queue_duration': 0}, u'pid': 14180, u'

RunTaskEvents:
 0 at  1.383448: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 1 at  2.574232: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 2 at  3.244512: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 3 at  4.323137: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 4 at  4.989113: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 5 at  5.616329: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 6 at  6.845272: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 
 7 at  7.755239: {u'name': u'MessageLoop::RunTask', u'args': {u'src_func': u'JumpList::PostRunUpdate', 

The advantage of this, for my particular investigation, is that I can trivially see the queue_duration values for all calls to PostRunUpdate. I can also see that there will be flow arrows for the last two PostRunUpdate calls, although they will be difficult to see in the trace viewer because of the zero queue_duration for those events.

The code is pretty straightforward. It looks for events where src_func contains the parameter. Then it assumes that the previous event on the same thread is the finish event (it checks for "ph" == "f" to be sure), and then it does a second pass over the events using the "id" values to look for start events. I'm not sure if this has value for anybody else, but I thought I'd share. I can clean it up and check it in if it seems useful.

Note that looking at these results shows that "queue_duration" has different units from "ts" (ms versus microseconds), which is otherwise only obvious from looking at the code.

--
Bruce Dawson
match_trace_events.py
Reply all
Reply to author
Forward
0 new messages