Support for Linux CPU/memory profiling (kernel and user spaces)

301 views
Skip to first unread message

KDS

unread,
Aug 14, 2023, 9:40:03 AM8/14/23
to Perfetto Development - www.perfetto.dev

Looking at possibilities of Perfetto for an embedded Linux environment.
Ref. https://perfetto.dev/docs/quickstart/linux-tracing

From this link, it looks like only basic tracing like ftrace with a system-level GUI view with timeline of processes and callstacks is supported on Linux. Would be good to get the following clarified:

1. Support for runtime attach to a user space process and tracing at function level latency, frequency of calls and other parameters at real-time (Kindly indicate if it is possible and if so, what is required to do this)

2. If application level tracing is only possible with instrumenting the code with tracelogs integrating Perfetto SDK, looks like only C++ libraries are supported. Kindly indicate possible library/SDK/other ways similar tracing can be done for Linux/C applications.

3. Has use of Linux uprobe possibility been explored, that may help Perfetto support function level tracing without having to require instrumentation application code with SDK tracelogs?

Daniele Di Proietto

unread,
Aug 14, 2023, 10:13:44 AM8/14/23
to KDS, Perfetto Development - www.perfetto.dev
Hi,


On Mon, Aug 14, 2023 at 2:40 PM KDS <kirubahara...@gmail.com> wrote:
>
>
> Looking at possibilities of Perfetto for an embedded Linux environment.
> Ref. https://perfetto.dev/docs/quickstart/linux-tracing
>
> From this link, it looks like only basic tracing like ftrace with a system-level GUI view with timeline of processes and callstacks is supported on Linux. Would be good to get the following clarified:
>
> 1. Support for runtime attach to a user space process and tracing at function level latency, frequency of calls and other parameters at real-time (Kindly indicate if it is possible and if so, what is required to do this)

There's no support without manual instrumentation.

>
> 2. If application level tracing is only possible with instrumenting the code with tracelogs integrating Perfetto SDK, looks like only C++ libraries are supported. Kindly indicate possible library/SDK/other ways similar tracing can be done for Linux/C applications.

I'm not sure if this is what you're looking for or can be useful to you, but we just landed an API for instrumenting C code.

Keep in mind that the API and ABI are relatively new and are subject to change.

>
> 3. Has use of Linux uprobe possibility been explored, that may help Perfetto support function level tracing without having to require instrumentation application code with SDK tracelogs?

We haven't invested heavily in uprobe instrumentation mostly because it's not available on Android user builds.

For completeness, I should probably mention that perfetto also supports stack sampling using the perf subsystem: https://perfetto.dev/docs/quickstart/callstack-sampling

Cheers,

Daniele

KDS

unread,
Aug 14, 2023, 11:11:21 AM8/14/23
to Perfetto Development - www.perfetto.dev
Thanks Daniele.

For callstack sampling, looks like that needs an Android device or emulator to connect to.
adb devices -l
export ANDROID_SERIAL=SER123456
For Linux callstack sampling, would be good to know if there are other possibilities like a TCP connection to localhost, not sure what the device ID would be in that case.
Thanks.

Daniele Di Proietto

unread,
Aug 14, 2023, 11:53:38 AM8/14/23
to KDS, Perfetto Development - www.perfetto.dev
Yes, the example explains how this works on an Android device.

You can also run it on your non-Android system. Rough steps:

* Compile [traced_perf](https://github.com/google/perfetto/blob/master/src/profiling/perf/main.cc)
* Run it with enough privileges
* Run the perfetto traced tracing service
* Generate an appropriate config (tools/cpu_profile --print-config can help you)
* Pass the config to the perfetto cmdline client


--
You received this message because you are subscribed to the Google Groups "Perfetto Development - www.perfetto.dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perfetto-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/829e5f0f-6c10-4abd-b0ab-1101521b28f3n%40googlegroups.com.

KDS

unread,
Aug 15, 2023, 4:38:49 AM8/15/23
to Perfetto Development - www.perfetto.dev
Looks like just running tracebox with the sample perf config (test/configs/traced_perf.cfg) includes callstack sampling. Hope it is the same as the steps you indicated.

buffers {
  size_kb: 10240
  fill_policy: RING_BUFFER
}

data_sources {
  config {
    name: "linux.perf"
    perf_event_config {
      all_cpus: true
      sampling_frequency: 10
    }
  }
}

data_sources {
  config {
    name: "linux.process_stats"
    target_buffer: 0
    process_stats_config {
      proc_stats_poll_ms: 100
    }
  }
}

duration_ms: 60000

sudo out/linux/tracebox -o tp.perfetto-trace --txt -c test/configs/traced_perf.cfg
There seems to be some flag pole markers, that may help find the latencies. But that is just a snapshot of latency at some point in the function call. It is not necessarily the aggregate latency as the callstack may be distributed throughout the timeline due to the process being scheduled out.
So the aggregate latency may have to figured out from identifying the start of the function call in the timeline to the end probably? Thanks.

KDS

unread,
Aug 15, 2023, 4:52:50 AM8/15/23
to Perfetto Development - www.perfetto.dev
Also, does Perfetto Linux build work for ARM32 CPU? If so, would be good to get the steps to do the build configuration for ARM32.
There seems to be some dependencies like protobuf
buildtools/protobuf/src/google/protobuf/port_def.inc
with the following comment:
// Compilation fails on ARM32: b/195943306

Thanks.

Daniele Di Proietto

unread,
Aug 15, 2023, 5:01:45 AM8/15/23
to KDS, Perfetto Development - www.perfetto.dev
On Tue, Aug 15, 2023 at 9:38 AM KDS <kirubahara...@gmail.com> wrote:
>
> Looks like just running tracebox with the sample perf config (test/configs/traced_perf.cfg) includes callstack sampling. Hope it is the same as the steps you indicated.
>
> buffers {
> size_kb: 10240
> fill_policy: RING_BUFFER
> }
>
> data_sources {
> config {
> name: "linux.perf"
> perf_event_config {
> all_cpus: true
> sampling_frequency: 10
> }
> }
> }
>
> data_sources {
> config {
> name: "linux.process_stats"
> target_buffer: 0
> process_stats_config {
> proc_stats_poll_ms: 100
> }
> }
> }
>
> duration_ms: 60000
>
> sudo out/linux/tracebox -o tp.perfetto-trace --txt -c test/configs/traced_perf.cfg

Even easier!

>
> There seems to be some flag pole markers, that may help find the latencies. But that is just a snapshot of latency at some point in the function call. It is not necessarily the aggregate latency as the callstack may be distributed throughout the timeline due to the process being scheduled out.
> So the aggregate latency may have to figured out from identifying the start of the function call in the timeline to the end probably? Thanks.


Yes, I don't think this is the perfect tool to use for understanding
latency, I was just showing another perfetto tool that work on linux.
> To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/eb46223d-3041-48a9-983b-a9ec6d77da82n%40googlegroups.com.

Daniele Di Proietto

unread,
Aug 15, 2023, 5:07:49 AM8/15/23
to KDS, Perfetto Development - www.perfetto.dev
It certainly works on ARM32 android (we have continuous integration there).

It should work on ARM32 linux (it's possible that chromium has some
continuous integration there, but I'm not sure). If you find some
problem we're happy to accept patches.

The line you're referencing just avoids using the clang::musttail
attribute on __arm__ because at some point clang had a bug on that
architecture.
> To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/c6123e37-5987-43b1-af9d-ea86215d158cn%40googlegroups.com.

KDS

unread,
Sep 4, 2023, 10:40:54 AM9/4/23
to Perfetto Development - www.perfetto.dev
Daniele,

Looks like a proc fs polling interval of less than 100ms is not supported in Perfetto. Essentially this puts constraints on tracing processes or timers with 10ms or less granularity.
If proc stats poll interval is set to 10ms, it automatically increases to 100ms.
[132.201] tats_data_source.cc:155 proc_stats_poll_ms 10 is less than minimum of 100ms. Increasing to 100ms.
Are there techniques that can be used to do callstack tracing at 10ms or less granularity (C/Linux) with Perfetto?
Thanks.
perfetto_granularity.JPG

Daniele Di Proietto

unread,
Sep 4, 2023, 1:14:08 PM9/4/23
to KDS, Perfetto Development - www.perfetto.dev
Hi,

the message you're seeing should have nothing to do with the callstack sampling period.

The message talks about the proc_stats_poll interval. What you're looking for is the the callstack sampling period.

You should be able to set the callstack sampling period to a lower value.

KDS

unread,
Sep 5, 2023, 7:41:26 AM9/5/23
to Perfetto Development - www.perfetto.dev
Daniele,
Thanks for the information on sampling configuration.
With a 10ms timer, callstack is still seen at approximately 100ms intervals. It is the same with callstack sampling configuration also.

period

uint64

Per-cpu sampling will occur every period counts of event.
Prefer frequency by default, as it's easier to oversample with a fixed period.

buffers {
  size_kb: 10240
  fill_policy: RING_BUFFER
}

data_sources {
  config {
    name: "linux.perf"
    perf_event_config {
      all_cpus: true
      timebase {
          period: 1
      }
      callstack_sampling {
          scope {
              target_pid: 743

          }
      }
    }
  }
}

data_sources {
  config {
    name: "linux.process_stats"
    target_buffer: 0
    process_stats_config {
      proc_stats_poll_ms: 100
    }
  }
}
Thanks.

Daniele Di Proietto

unread,
Sep 5, 2023, 10:47:41 AM9/5/23
to KDS, Perfetto Development - www.perfetto.dev
With a configuration like:

buffers {
  size_kb: 10240
  fill_policy: RING_BUFFER
}

data_sources {
  config {
    name: "linux.perf"
    perf_event_config {
      all_cpus: true
      timebase {
        frequency: 100
      }
      callstack_sampling {
        scope {

        }
      }
    }
  }
}

data_sources {
  config {
    name: "linux.process_stats"
    target_buffer: 0
    process_stats_config {
      proc_stats_poll_ms: 100
    }
  }
}

duration_ms: 10000

I can see samples every 10ms (because I set frequency to 100 Hz).

Reply all
Reply to author
Forward
0 new messages