Heap Profiling in the Field

68 views
Skip to first unread message

Alexander Yashkin

unread,
Sep 21, 2018, 1:39:29 AM9/21/18
to memor...@chromium.org, erik...@chromium.org
Hi all.
My name is Alexander, I am currently working in Yandex Browser team(fork of Chromium), and would like to ask several questions about Chromium memory infrastructure.
I am mostly interested in background tracing component that sends heap dumps to server for some users.
We are trying to build infrastructure to collect and analyse such dumps, to detect memory leaks in our (Yandex written) components.

As I understand heap dumps collecting is triggered from variations server by background tracing config.
As I see in content/browser/tracing/background_tracing_rule.cc various events can trigger heap dumps collecting.
Can you give advice on what kind trigger we should use? What histogram and value, or maybe random intervals?

As I understand, I will have to write tool that symbolyze received heap dump, decode it and extract callstack from it.
Do you have some automation in such heap dumps analyse? Or someone just manually looks through heap dump content?
Is where any existing code in chromium repo that could be helpful? I have found symbolize_trace and diff_heap_profiler scripts so far, is where anything else useful?

Thanks.
WBR, Alexander Yashkin.

Erik Chen

unread,
Sep 21, 2018, 10:13:48 AM9/21/18
to Alexander Yashkin, Etienne Bergeron, Alexei Filippov, memory-dev
Hello!
Thanks for reaching out.

As you've likely noticed, heap profiling in Chrome can be enabled via flags, command line, or Finch experiment. See chrome://flags and search for #memlog. The interesting fields are
  * #memlog -- chooses which processes to profile and
  * #memlog-sampling -- This should pretty much always be enabled. Sampling greatly reduces overhead, with minimal loss in accuracy. 

The code to automatically send reports in the field is pretty straight forward. 
  * background profiling triggers looks for interesting conditions to upload
  * ProfilingProcessHost::RequestProcessReport requests and uploads a heap dump
  * settings controls whether feature is enabled [by command line, finch, etc.]

The only public code we have for symbolization is symbolize_trace and diff_heap_profiler, as you've found. You should be able to make minimal modifications to them to support symbolization of non-Chrome binaries.

Do you have some automation in such heap dumps analyse? 

Yes -- We have code that aggregates and symbolizes all uploaded heap dumps, and basically runs the logic in diff_heap_profiler in bulk. We have very high filter thresholds to avoid false positives [100MB on desktop, 20MB on android]. The results are that most stack traces we see point at real bugs. 

alph@ is currently working on a faster implementation of heap profiling, but the public interface should stay the same.

Alexander Yashkin

unread,
Sep 26, 2018, 8:36:35 AM9/26/18
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Hi Erik, thanks a lot for your answers.
I have looked through code and current chrome experiments config and have even more questions :)

Do you plan to give access to http://memlog/<report_id> pages for non-googlers?
From chrome variations config I see that "OOPHeapProfiling" feature is enabled only on canary and dev channel.
Is it enough to get necessary data? Does it slow browser performance a lot?


21.09.2018 17:13, Erik Chen пишет:
> Hello!
> Thanks for reaching out.
>
> As you've likely noticed, heap profiling in Chrome can be enabled via
> flags, command line, or Finch experiment. See chrome://flags and search
> for #memlog. The interesting fields are
>   * #memlog -- chooses which processes to profile and
>   * #memlog-sampling -- This should pretty much always be enabled.
> Sampling greatly reduces overhead, with minimal loss in accuracy.
>
> The code to automatically send reports in the field is pretty straight
> forward.
>   * background profiling triggers
> <https://cs.chromium.org/chromium/src/chrome/browser/profiling_host/background_profiling_triggers.cc?q=background_profiling_&sq=package:chromium&g=0&l=1> looks
> for interesting conditions to upload
>   * ProfilingProcessHost::RequestProcessReport
> <https://cs.chromium.org/chromium/src/chrome/browser/profiling_host/profiling_process_host.cc?sq=package:chromium&g=0&l=149> requests
> and uploads a heap dump
>   * settings
> <https://cs.chromium.org/chromium/src/components/services/heap_profiling/public/cpp/settings.cc?sq=package:chromium&q=services.*heap_profiling.*set&g=0&l=1>
--
С уважением,
Александр Яшкин,
разработчик Нашего Браузера
http://staff.yandex-team.ru/a-v-y/

Erik Chen

unread,
Sep 26, 2018, 10:39:26 AM9/26/18
to Alexander Yashkin, Etienne Bergeron, Alexei Filippov, memory-dev
On Wed, Sep 26, 2018 at 8:36 AM, Alexander Yashkin <a-...@yandex-team.ru> wrote:
Hi Erik, thanks a lot for your answers.
I have looked through code and current chrome experiments config and have even more questions :)

Do you plan to give access to http://memlog/<report_id> pages for non-googlers?
Unfortunately not. There's nothing that prevents this from a theoretical perspective [there's no PII], but getting all the ACLs right would be a pretty big headache. 
 
From chrome variations config I see that "OOPHeapProfiling" feature is enabled only on canary and dev channel.
Is it enough to get necessary data? Does it slow browser performance a lot?

Is it enough to get necessary data?
We would like to eventually expand to a small set of beta/stable, but we want a super-low overhead implementation before doing so. +alph is working on this.
 
Does it slow browser performance a lot?
The current implementation has no visible effect, but according to UMA, there is a ~50% degradation to omnibox typing latency at higher percentiles [which is very allocation heavy]. None of the other performance metrics have noticeably moved. 

Alexander Yashkin

unread,
Nov 13, 2018, 11:10:30 AM11/13/18
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Hi Erik.
I see that bug
https://bugs.chromium.org/p/chromium/issues/detail?id=803276 is closed
as Fixed.
Does it mean that "sampling-v2" is preferred way of launching heap
profiling in field?

BTW, after looking at all code written and issues created - great work
is done by you guys in this area.

26.09.2018 17:39, Erik Chen пишет:
> http://staff.yandex-team.ru/a-v-y/ <http://staff.yandex-team.ru/a-v-y/>

Erik Chen

unread,
Nov 13, 2018, 11:12:04 AM11/13/18
to Alexander Yashkin, Etienne Bergeron, Alexei Filippov, memory-dev
Correct. I believe we're still running an A/B test on Android to confirm that results are comparable [alph@, could you comment on status?]

Once that's done, we should remove sampling-v1. 

Alexei Filippov

unread,
Nov 13, 2018, 1:41:49 PM11/13/18
to erik...@chromium.org, a-...@yandex-team.ru, etie...@chromium.org, memor...@chromium.org, ss...@chromium.org
Hi Erik,

Just checked the Android experiment. Performance-wise v2 is clearly better than v1. So I think we can remove v1 experiment from Android. I'll send a patch. +ssid@

Alexander Yashkin

unread,
Nov 23, 2018, 9:52:38 AM11/23/18
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Hello again.

I was able to run heap profiling experiment in our browser, collected
and analyzed a couple of traces manually.
Beside our leaks, I think I found mojo leak in chromium from
FieldTrialSynchronizer::NotifyAllRenderers, same as pointed here
https://bugs.chromium.org/p/chromium/issues/detail?id=798025#c12

I will file bug and I think I will submit fix as well.

I think about automating traces analyze, can you give some advice?
I am thinking about filering traces first by allocated size threshold -
100Mb and after that symbolizing and analyzing them manually.

Do you use any stacks aggregation as well? Is where a way to understand
that two collected traces belong to same user?

Thanks.

--
WBR Alexander Yashkin.

13.11.2018 19:12, Erik Chen пишет:
> Correct. I believe we're still running an A/B test on Android to confirm
> that results are comparable [alph@, could you comment on status?]
>
> Once that's done, we should remove sampling-v1.
>
> On Tue, Nov 13, 2018 at 11:10 AM, Alexander Yashkin
> <a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>> wrote:
>
> Hi Erik.
> I see that bug
> https://bugs.chromium.org/p/chromium/issues/detail?id=803276
> <https://bugs.chromium.org/p/chromium/issues/detail?id=803276> is
> closed as Fixed.
> Does it mean that "sampling-v2" is preferred way of launching heap
> profiling in field?
>
> BTW, after looking at all code written and issues created - great
> work is done by you guys in this area.
>
> 26.09.2018 17:39, Erik Chen пишет:
>
>
>
> On Wed, Sep 26, 2018 at 8:36 AM, Alexander Yashkin
> <a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> http://staff.yandex-team.ru/a-v-y/ <http://staff.yandex-team.ru/a-v-y/>
>
>

Erik Chen

unread,
Nov 26, 2018, 11:17:00 AM11/26/18
to Alexander Yashkin, Etienne Bergeron, Alexei Filippov, memor...@chromium.org
Super happy to see that this is working well for you, and that you're able to find bugs. :)

> I will file bug and I think I will submit fix as well.
Great, thank you!

> Is where a way to understand that two collected traces belong to same user?
We intentionally remove PII, so this is not possible.

> I think about automating traces analyze, can you give some advice?
> Do you use any stacks aggregation as well? 

Here's a rough overview of our pipeline:
(1) Symbolize traces and convert everything into a more manageable format [we use protos]. I assume you've already figured this out, but just in case, there's python symbolization logic here.
(2) We have a list of "known bugs", which maps simple regex expressions [e.g. stack contains 'P2PSocketUdp::Init'] to crbug numbers [in this case, issue 873785].
(3) Group all allocations in all traces by stack.
(4) For each group of allocations, if there is no matching regex expression for the stack, and the largest allocation > 100MB, then requires human follow up.

Parallel pipeline just in case previous pipeline misses something:
(5) Sort traces by total size of all allocations.
(6) Emit the top 20 traces, and for each, emit total size of all stacks matching regex signatures in (2).
   e.g. Report X has 5GB total allocations, 4.7GB allocations match "macOS Preferences leak", 58MB match "LevelDB-Extensions".
(7) If there are traces with large total allocation, but no matching signatures, then requires human follow up.

Finally, as we find and file bugs, we update the list of "known bugs".

Alexander Yashkin

unread,
Jan 14, 2019, 9:50:31 AM1/14/19
to Erik Chen, Etienne Bergeron, Alexei Filippov, memor...@chromium.org
Hello again.
Another question - I think about turning heap profiling for renderer
process type in out browser. Is there any caveats or differences from
browser process heap profiling?
As I understand renderer process can "legally" crash due to OOM, while
rendering extremely heavy web pages.
How do you differentiate such "legal" OOMs from leaks due to bugs in code?

26.11.2018 19:16, Erik Chen пишет:
> Super happy to see that this is working well for you, and that you're
> able to find bugs. :)
>
> > I will file bug and I think I will submit fix as well.
> Great, thank you!
>
> > Is where a way to understand that two collected traces belong to same
> user?
> We intentionally remove PII, so this is not possible.
>
> > I think about automating traces analyze, can you give some advice?
> > Do you use any stacks aggregation as well?
>
> Here's a rough overview of our pipeline:
> (1) Symbolize traces and convert everything into a more manageable
> format [we use protos]. I assume you've already figured this out, but
> just in case, there's python symbolization logic here
> <https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/extras/symbolizer/symbolize_trace.py?type=cs&q=symbolize_trace.py&sq=package:chromium&g=0&l=3>.
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>> wrote:
> >
> >     Hi Erik.
> >     I see that bug
> > https://bugs.chromium.org/p/chromium/issues/detail?id=803276
> >     <https://bugs.chromium.org/p/chromium/issues/detail?id=803276> is
> >     closed as Fixed.
> >     Does it mean that "sampling-v2" is preferred way of launching
> heap
> >     profiling in field?
> >
> >     BTW, after looking at all code written and issues created - great
> >     work is done by you guys in this area.
> >
> >     26.09.2018 17:39, Erik Chen пишет:
> >
> >
> >
> >         On Wed, Sep 26, 2018 at 8:36 AM, Alexander Yashkin
> >         <a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>

Erik Chen

unread,
Jan 14, 2019, 11:31:17 AM1/14/19
to Alexander Yashkin, Etienne Bergeron, Alexei Filippov, memory-dev
> Is there any caveats or differences from browser process heap profiling?
It's functionally identical, but the results are frequently less useful, since web-pages can and do [un]intentionally use too much memory. 

> How do you differentiate such "legal" OOMs from leaks due to bugs in code?
When there is a local repro of a known renderer leak, the heap profiler has proven to be useful. We have not yet found any renderer leaks in the wild, since we have no way of distinguishing WAI memory usage from actual Chrome bugs.

Alexander Yashkin

unread,
Jan 15, 2019, 12:40:15 AM1/15/19
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Thanks for quick and detailed answers.
With your help our progress in fighting leaks is much faster :)

14.01.2019 19:31, Erik Chen пишет:
> > Is there any caveats or differences from browser process heap profiling?
> It's functionally identical, but the results are frequently less useful,
> since web-pages can and do [un]intentionally use too much memory.
>
> > How do you differentiate such "legal" OOMs from leaks due to bugs in
> code?
> When there is a local repro of a known renderer leak, the heap profiler
> has proven to be useful. We have not yet found any renderer leaks in the
> wild, since we have no way of distinguishing WAI memory usage from
> actual Chrome bugs.
>
> On Mon, Jan 14, 2019 at 9:50 AM Alexander Yashkin <a-...@yandex-team.ru
> >     <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>>> wrote:
> >      >
> >      >     Hi Erik.
> >      >     I see that bug
> >      > https://bugs.chromium.org/p/chromium/issues/detail?id=803276
> >      >
>  <https://bugs.chromium.org/p/chromium/issues/detail?id=803276> is
> >      >     closed as Fixed.
> >      >     Does it mean that "sampling-v2" is preferred way of
> launching
> >     heap
> >      >     profiling in field?
> >      >
> >      >     BTW, after looking at all code written and issues
> created - great
> >      >     work is done by you guys in this area.
> >      >
> >      >     26.09.2018 17:39, Erik Chen пишет:
> >      >
> >      >
> >      >
> >      >         On Wed, Sep 26, 2018 at 8:36 AM, Alexander Yashkin
> >      >         <a-...@yandex-team.ru
> <mailto:a-...@yandex-team.ru> <mailto:a-...@yandex-team.ru
> <mailto:a-...@yandex-team.ru>>
> >     <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>>
WBR, Alexander Yashkin.

Alexander Yashkin

unread,
Jan 15, 2019, 1:46:30 AM1/15/19
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Thanks for quick and detailed answers.
With your help our progress in fighting leaks is much faster :)

14.01.2019 19:31, Erik Chen пишет:
> > Is there any caveats or differences from browser process heap profiling?
> It's functionally identical, but the results are frequently less useful,
> since web-pages can and do [un]intentionally use too much memory.
>
> > How do you differentiate such "legal" OOMs from leaks due to bugs in
> code?
> When there is a local repro of a known renderer leak, the heap profiler
> has proven to be useful. We have not yet found any renderer leaks in the
> wild, since we have no way of distinguishing WAI memory usage from
> actual Chrome bugs.
>
> On Mon, Jan 14, 2019 at 9:50 AM Alexander Yashkin <a-...@yandex-team.ru
> >     <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>>> wrote:
> >      >
> >      >     Hi Erik.
> >      >     I see that bug
> >      > https://bugs.chromium.org/p/chromium/issues/detail?id=803276
> >      >
>  <https://bugs.chromium.org/p/chromium/issues/detail?id=803276> is
> >      >     closed as Fixed.
> >      >     Does it mean that "sampling-v2" is preferred way of
> launching
> >     heap
> >      >     profiling in field?
> >      >
> >      >     BTW, after looking at all code written and issues
> created - great
> >      >     work is done by you guys in this area.
> >      >
> >      >     26.09.2018 17:39, Erik Chen пишет:
> >      >
> >      >
> >      >
> >      >         On Wed, Sep 26, 2018 at 8:36 AM, Alexander Yashkin
> >      >         <a-...@yandex-team.ru
> <mailto:a-...@yandex-team.ru> <mailto:a-...@yandex-team.ru
> <mailto:a-...@yandex-team.ru>>
> >     <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>
> <mailto:a-...@yandex-team.ru <mailto:a-...@yandex-team.ru>>>
WBR, Alexander Yashkin.

Alexander Yashkin

unread,
Feb 6, 2020, 2:28:51 AM2/6/20
to Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
Hi fellow developers.

I have several questions on memory profiling infrastructure:
1. Can you please give me access to design doc from
https://bugs.chromium.org/p/chromium/issues/detail?id=925151

https://docs.google.com/document/d/11K0Yq5-WSUDz1NgM-hj4pqijUT_K9_-9b93HBx3xKBY/edit#heading=h.gdq4vmqjyhau

2. Will the change in the issue above affect heap dumps tracing
collected with OOPHeapProfiling feature?

3. What's the difference between SamplingProfilerReporting,
HeapProfilerReporting, OOPHeapProfiling features?

4. Recently I have found that I could not collect android memory traces
from users of our browser. I was not able to look at code changes yet,
so I'll ask you instead :). Could the reason be switch of
"BackgroundTracingProtoOutput" feature under android in
https://chromium-review.googlesource.com/c/chromium/src/+/1725119?

WBR, Alexander Yashkin.


15.01.2019 9:46, Alexander Yashkin пишет:

Egor Pasko

unread,
Feb 6, 2020, 9:46:18 AM2/6/20
to Alexander Yashkin, ss...@chromium.org, Erik Chen, Etienne Bergeron, Alexei Filippov, memory-dev
On Thu, Feb 6, 2020 at 8:28 AM Alexander Yashkin <a-...@yandex-team.ru> wrote:
Hi fellow developers.

I have several questions on memory profiling infrastructure:
1. Can you please give me access to design doc from
https://bugs.chromium.org/p/chromium/issues/detail?id=925151 
https://docs.google.com/document/d/11K0Yq5-WSUDz1NgM-hj4pqijUT_K9_-9b93HBx3xKBY/edit#heading=h.gdq4vmqjyhau
 
+s...@chromium.org will be able to provide more details about the current state of trace uploads. What I know is that the upload format is being heavily changed, it will likely take some time to stabilize.

--
You received this message because you are subscribed to the Google Groups "memory-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/memory-dev/4beb61e5-48cb-0388-9d7c-13b37c2ca826%40yandex-team.ru.
Reply all
Reply to author
Forward
0 new messages