Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] perf tools: Fix build-id matching on vmlinux

725 views
Skip to first unread message

Namhyung Kim

unread,
Aug 26, 2014, 2:40:02 AM8/26/14
to
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.

When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.

After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.

During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux() since the default value of the symbol_conf.try_
vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.0-rc2+/build/vmlinux)
if one is running on a custom kernel.

It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently
(which might break the result). I think it's worth going to the
stable tree.

I can think of a couple of ways to fix it. In this patch, I changed
to use the name of "[kernel.kallsyms]" for the kernel build-id event
instead of not trying vmlinux paths. This way we can provide maximum
info (like annotation) with minimum change IMHO.

Before:

$ perf record -a usleep 1

$ perf buildid-list
00d5ff078efe1d30b8492854f259215fd877ce30 /lib/modules/3.16.0-rc2+/build/vmlinux
78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so
4eadca6cb82e0a85edb87c15b5e3980742514501 /usr/lib64/ld-2.17.so
1e272ca30081e81ef41935a630eb2f4c636798b4 /usr/lib64/dri/swrast_dri.so

$ perf buildid-list -H
0000000000000000000000000000000000000000 [kernel.kallsyms]
78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so
4eadca6cb82e0a85edb87c15b5e3980742514501 /usr/lib64/ld-2.17.so
1e272ca30081e81ef41935a630eb2f4c636798b4 /usr/lib64/dri/swrast_dri.so
0000000000000000000000000000000000000000 /tmp/perf-2523.map

After:

$ perf record -a usleep 1

$ perf buildid-list
00d5ff078efe1d30b8492854f259215fd877ce30 [kernel.kallsyms]
78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so
4eadca6cb82e0a85edb87c15b5e3980742514501 /usr/lib64/ld-2.17.so
1e272ca30081e81ef41935a630eb2f4c636798b4 /usr/lib64/dri/swrast_dri.so

$ perf buildid-list -H
00d5ff078efe1d30b8492854f259215fd877ce30 [kernel.kallsyms]
78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so
4eadca6cb82e0a85edb87c15b5e3980742514501 /usr/lib64/ld-2.17.so
1e272ca30081e81ef41935a630eb2f4c636798b4 /usr/lib64/dri/swrast_dri.so
0000000000000000000000000000000000000000 /tmp/perf-2523.map

Cc: sta...@vger.kernel.org
Signed-off-by: Namhyung Kim <namh...@kernel.org>
---
tools/perf/util/header.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 158c787ce0c4..5c4093dee467 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -263,6 +263,9 @@ static int __dsos__write_buildid_table(struct list_head *head,
machine__mmap_name(machine, nm, sizeof(nm));
name = nm;
name_len = strlen(nm) + 1;
+ } else if (dso__is_vmlinux(pos)) {
+ name = pos->name;
+ name_len = strlen(pos->name) + 1;
} else {
name = pos->long_name;
name_len = pos->long_name_len + 1;
--
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Namhyung Kim

unread,
Sep 3, 2014, 10:50:02 PM9/3/14
to
Ping!

Arnaldo Carvalho de Melo

unread,
Sep 4, 2014, 10:20:03 AM9/4/14
to
We are losing information, namely the pathname for the kernel used, that
may be useful in analysis.

Why not make sure that if there is a build-id in the perf.data header,
then we completely refusing anything that doesn't match the build-id?
I.e. the name is irrelevant for this purpose, the contents, as keyed by
the build-id, is what matters.

- Arnaldo

Stephane Eranian

unread,
Sep 4, 2014, 10:40:02 AM9/4/14
to
There is something I don't understand in your example above.
The -H option shows only DSO with samples. So why do you get the
buildid without -H
and you get no buildid with -H? In other words, I don't connect the
dots between what
-H does on the buildid change for the kernel. Looks like you have the
buildid in the
perf.data file.

Namhyung Kim

unread,
Sep 4, 2014, 8:20:02 PM9/4/14
to
Hi Arnaldo,
Right. That's a problem.


>
> Why not make sure that if there is a build-id in the perf.data header,
> then we completely refusing anything that doesn't match the build-id?
> I.e. the name is irrelevant for this purpose, the contents, as keyed by
> the build-id, is what matters.

The perf report rebuilds machine states from the event records only. In
this case, the kernel map was recorded in the name of [kernel.kallsyms]
so it couldn't find the build-id from the table.

Thanks,
Namhyung

Namhyung Kim

unread,
Sep 4, 2014, 8:30:02 PM9/4/14
to
Hi Stephane,
Without -H, it just prints all DSOs found in build-id table (rebuilt
during read perf data file header) and skips processing events. But
with -H, it'd process the event records and so set kernel map to
'[kernel.kallsyms]' - since the kernel mmap event always has the name -
and mark it as hit. Thus the actual vmlinux can't be marked and then
cannot be printed.

Hmm.. now I'm curious that why the -H option is needed at all.. the perf
record already wrote build-ids that are actually hits..

Thanks,
Namhyung

Arnaldo Carvalho de Melo

unread,
Sep 4, 2014, 9:20:02 PM9/4/14
to
Em Fri, Sep 05, 2014 at 09:09:49AM +0900, Namhyung Kim escreveu:
> >> Before:
> >> $ perf record -a usleep 1

> >> $ perf buildid-list
> >> 00d5ff078efe1d30b8492854f259215fd877ce30 /lib/modules/3.16.0-rc2+/build/vmlinux
> >> 78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so

> >> $ perf buildid-list -H
> >> 0000000000000000000000000000000000000000 [kernel.kallsyms]
> >> 78186287bba77069a056a5ccbeb14b7fd2ca3a4b /usr/lib64/libc-2.17.so

> >> After:
> >> $ perf record -a usleep 1

> >> $ perf buildid-list
> >> 00d5ff078efe1d30b8492854f259215fd877ce30 [kernel.kallsyms]

> > We are losing information, namely the pathname for the kernel used, that
> > may be useful in analysis.

> Right. That's a problem.

> > Why not make sure that if there is a build-id in the perf.data header,
> > then we completely refusing anything that doesn't match the build-id?
> > I.e. the name is irrelevant for this purpose, the contents, as keyed by
> > the build-id, is what matters.

> The perf report rebuilds machine states from the event records only. In
> this case, the kernel map was recorded in the name of [kernel.kallsyms]
> so it couldn't find the build-id from the table.

Ok, but then we can special case this one, no?

Somehow mark in the buildid table that that entry is the one for the
kernel and hook it up to the synthesized event that has the
[kernel.kallsyms].ref_reloc_sym entry.

- Arnaldo

Arnaldo Carvalho de Melo

unread,
Sep 4, 2014, 9:20:02 PM9/4/14
to
Em Fri, Sep 05, 2014 at 09:23:56AM +0900, Namhyung Kim escreveu:
> Hmm.. now I'm curious that why the -H option is needed at all.. the perf
> record already wrote build-ids that are actually hits..

Probably because when 'perf buildid-list' was introduced 'perf record'
was inserting all the PERF_RECORD_MMAP dsos into the buildid-list, not
just the ones with hits.

Then, later on, 'perf record' probably moved to register just the ones
with hits.

This way -H ends up being a nop, i.e. should produce the same result as
'perf buildid-list' with no options.

If it doesn't, then it is a bug.

- Arnaldo

Namhyung Kim

unread,
Sep 4, 2014, 9:50:01 PM9/4/14
to
On Fri, Sep 5, 2014 at 10:18 AM, Arnaldo Carvalho de Melo
<ac...@kernel.org> wrote:
> Em Fri, Sep 05, 2014 at 09:09:49AM +0900, Namhyung Kim escreveu:
>> The perf report rebuilds machine states from the event records only. In
>> this case, the kernel map was recorded in the name of [kernel.kallsyms]
>> so it couldn't find the build-id from the table.
>
> Ok, but then we can special case this one, no?
>
> Somehow mark in the buildid table that that entry is the one for the
> kernel and hook it up to the synthesized event that has the
> [kernel.kallsyms].ref_reloc_sym entry.

Maybe we can search vmlinux in machine->kernel_dsos first when
processing kernel mmap event. And in this case do you want replace
the name of mapping to a fullname of vmlinux?

Thanks,
Namhyung

Namhyung Kim

unread,
Sep 4, 2014, 10:40:01 PM9/4/14
to
On Fri, Sep 5, 2014 at 10:42 AM, Namhyung Kim <namh...@kernel.org> wrote:
> On Fri, Sep 5, 2014 at 10:18 AM, Arnaldo Carvalho de Melo
> <ac...@kernel.org> wrote:
>> Em Fri, Sep 05, 2014 at 09:09:49AM +0900, Namhyung Kim escreveu:
>>> The perf report rebuilds machine states from the event records only. In
>>> this case, the kernel map was recorded in the name of [kernel.kallsyms]
>>> so it couldn't find the build-id from the table.
>>
>> Ok, but then we can special case this one, no?
>>
>> Somehow mark in the buildid table that that entry is the one for the
>> kernel and hook it up to the synthesized event that has the
>> [kernel.kallsyms].ref_reloc_sym entry.
>
> Maybe we can search vmlinux in machine->kernel_dsos first when
> processing kernel mmap event. And in this case do you want replace
> the name of mapping to a fullname of vmlinux?

Ah.. the name of mapping is not used at all after finding a matching
dso. Ok then, I'll post a new patch to search vmlinux before
kallsyms.

Thanks

Stephane Eranian

unread,
Sep 4, 2014, 11:20:02 PM9/4/14
to
Still don't follow this. You're saying because as part of processing the events,
you create or replace the mmap record corresponding to the kernel from the
synthesized mmap (actual kernel filename) to the generic kernel.kallsyms, you
lose the buildid. Why not just transfer it? It has to be the one
listed without -H.
This would certainly be much less confusing (to me at least)! Seems to me
you have one piece of information or the other (buildid or filename) but
never both.

Namhyung Kim

unread,
Sep 5, 2014, 1:00:01 AM9/5/14
to
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.

When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.

After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.

During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux() since the default value of the symbol_conf.try_
vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.0-rc2+/build/vmlinux)
if one is running on a custom kernel.

It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.

Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.

In this patch, it's now changed to try to search a kernel dso using
"vmlinux" shortname (which always has a build-id) and, if not found,
search "[kernel.kallsyms]".

Before:

$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...

After (for the same perf.data):

72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...

Cc: sta...@vger.kernel.org
Signed-off-by: Namhyung Kim <namh...@kernel.org>
---
tools/perf/util/machine.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b2ec38bf211e..d9f828f3b54f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1060,10 +1060,14 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
strlen(kmmap_prefix));
/*
* Should be there already, from the build-id table in
- * the header.
+ * the header (but maybe with a different name: "vmlinux").
*/
- struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
- kmmap_prefix);
+ struct dso *kernel = dsos__find(&machine->kernel_dsos,
+ "vmlinux", true);
+
+ if (kernel == NULL)
+ kernel = __dsos__findnew(&machine->kernel_dsos,
+ kmmap_prefix);
if (kernel == NULL)
goto out_problem;

--
2.1.0

Adrian Hunter

unread,
Sep 5, 2014, 3:30:03 AM9/5/14
to
Isn't "vmlinux" just the basename of the original file name, so if it had a
different name this wouldn't work e.g. if the filename had been
/boot/vmlinuz-3.11.0-26-generic then you would need
"vmlinuz-3.11.0-26-generic" for this to work?

> +
> + if (kernel == NULL)
> + kernel = __dsos__findnew(&machine->kernel_dsos,
> + kmmap_prefix);
> if (kernel == NULL)
> goto out_problem;
>
>

--

Arnaldo Carvalho de Melo

unread,
Sep 5, 2014, 10:20:02 AM9/5/14
to
Em Fri, Sep 05, 2014 at 10:22:40AM +0300, Adrian Hunter escreveu:
> On 09/05/2014 07:59 AM, Namhyung Kim wrote:
> > +++ b/tools/perf/util/machine.c
> > @@ -1060,10 +1060,14 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
> > strlen(kmmap_prefix));
> > /*
> > * Should be there already, from the build-id table in
> > - * the header.
> > + * the header (but maybe with a different name: "vmlinux").
> > */
> > - struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
> > - kmmap_prefix);
> > + struct dso *kernel = dsos__find(&machine->kernel_dsos,
> > + "vmlinux", true);

> Isn't "vmlinux" just the basename of the original file name, so if it had a
> different name this wouldn't work e.g. if the filename had been
> /boot/vmlinuz-3.11.0-26-generic then you would need
> "vmlinuz-3.11.0-26-generic" for this to work?

Yeah, looking for well known pathnames to then check if the build-id
matches the one we're looking for, be it because we obtained it from
/sys/kernel/notes (for the running kernel), or from the perf.data file
build-id table is ok, as we don't know where it is.

Plain sticking "vmlinux" there is not.

- Arnaldo

Namhyung Kim

unread,
Sep 5, 2014, 11:20:01 AM9/5/14
to
Hi Adrian and Arnaldo,

2014-09-05 (금), 11:11 -0300, Arnaldo Carvalho de Melo:
> Em Fri, Sep 05, 2014 at 10:22:40AM +0300, Adrian Hunter escreveu:
> > On 09/05/2014 07:59 AM, Namhyung Kim wrote:
> > > +++ b/tools/perf/util/machine.c
> > > @@ -1060,10 +1060,14 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
> > > strlen(kmmap_prefix));
> > > /*
> > > * Should be there already, from the build-id table in
> > > - * the header.
> > > + * the header (but maybe with a different name: "vmlinux").
> > > */
> > > - struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
> > > - kmmap_prefix);
> > > + struct dso *kernel = dsos__find(&machine->kernel_dsos,
> > > + "vmlinux", true);
>
> > Isn't "vmlinux" just the basename of the original file name, so if it had a
> > different name this wouldn't work e.g. if the filename had been
> > /boot/vmlinuz-3.11.0-26-generic then you would need
> > "vmlinuz-3.11.0-26-generic" for this to work?
>
> Yeah, looking for well known pathnames to then check if the build-id
> matches the one we're looking for, be it because we obtained it from
> /sys/kernel/notes (for the running kernel), or from the perf.data file
> build-id table is ok, as we don't know where it is.
>
> Plain sticking "vmlinux" there is not.

I don't get it. AFAIK when perf record runs, it uses [kernel.kallsyms]
name for kernel map. But it can be changed only if it found a "vmlinux"
file in the vmlinux_path[]. So short name will always be "vmlinux" -
okay, it might be vmlinux-$(uname -r) too; I can add it.

As you know, the vmlinux file is a ELF image that created during kernel
build process so I guess its name is fixed. It's different from the
vmlinu"z" which resides in /boot directory.

Am I missing something?

Thanks,
Namhyung

Arnaldo Carvalho de Melo

unread,
Sep 5, 2014, 11:50:03 AM9/5/14
to
[acme@zoo linux]$ perf report --help 2>&1 | grep vmlin
-k, --vmlinux=<file>
vmlinux pathname
[acme@zoo linux]$

Yes, it is using [kernel.kallsyms].ref_reloc_sym for the synthesized
KERNEL mmap event, but it stores the full path in the build-id table.

It seems we need a way to state that an entry in the build-id table is
for the kernel, without looking at its file name. That or to put the
build-id into the synthesized kernel mmap event. Which is better,
because:

That leads to another problem that needs to get solved eventually: We
need to have the build-id into PERF_RECORD_MMAP, because we're now using
just the mmap filename as the key, not the contents, and for long
running sessions, DSOs can get updated, etc.

- Arnaldo

Stephane Eranian

unread,
Sep 7, 2014, 4:30:01 PM9/7/14
to
Can't you figure out it is from the kernel (in perf record) by remembering
you got the buildid from /sys/kernel/notes?

Arnaldo Carvalho de Melo

unread,
Sep 8, 2014, 9:50:02 AM9/8/14
to
Sure, the question was more about how to encode that in the build-id
table in the perf.data file, i.e. we don't have a flags field there that
we could use a bit for this purpose (hey, this entry is for the kernel,
please match it to the PERF_RECORD_MMAP [kernel_kallsyms] synthesized
for the kernel).

We know it is a kernel because the name is [kernel.kallsyms] in the
synthesized PERF_RECORD_MMAP. So perhaps we should have the kernel
pathname right after the reference relocation symbol, that way we would
use it to get the entry in the build-id table.

I.e. parts of what we need are in the synthesized PERF_RECORD_MMAP (how
to do relocation, start-end kernel mmap, and we have parts of it in the
build-id table: the build-id and the full pathname. How to connect both
is what we're trying to achieve here, and in a backwards compatible
way.

The (ugly) way I found to achieve all this is to concatenate the full
pathname right after the ref reloc symbol in the synthesized
PERF_RECORD_MMAP for the kernel mmap.

Older tools would just ignore what is after the ref reloc symbol (need
to check if we put the zero there, if not, we can add the zero + the
full pathname.).

Newer tools would look if the size of the record is longer than the
strlen(ref_reloc_symbol), and if it is, it means we have the kernel full
pathname after the ref_reloc_symbol, that we can use to find the
build-id in the build-id table.

In the future this is all moot if we add the build-id to all
PERF_RECORD_MMAP events, as described below:

Namhyung Kim

unread,
Sep 12, 2014, 2:20:02 AM9/12/14
to
Hi Arnaldo,
Hmm.. this is a report option to specify a vmlinux file rather than
finding it using recorded info. When this option was given by user, it
doesn't even try to search the vmlinux path (or build-id table) and just
uses the given path.


>
> Yes, it is using [kernel.kallsyms].ref_reloc_sym for the synthesized
> KERNEL mmap event, but it stores the full path in the build-id table.
>
> It seems we need a way to state that an entry in the build-id table is
> for the kernel, without looking at its file name.

Maybe we can add a new build_id2_event (like mmap2) that has a new field
to record that info.


> That or to put the build-id into the synthesized kernel mmap
> event. Which is better, because:
>
> That leads to another problem that needs to get solved eventually: We
> need to have the build-id into PERF_RECORD_MMAP, because we're now using
> just the mmap filename as the key, not the contents, and for long
> running sessions, DSOs can get updated, etc.

But it'd require a realtime processing of events at record time.

Thanks,
Namhyung

Namhyung Kim

unread,
Sep 12, 2014, 2:30:02 AM9/12/14
to
On Mon, 8 Sep 2014 10:38:47 -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Sep 07, 2014 at 10:24:12PM +0200, Stephane Eranian escreveu:
>> Can't you figure out it is from the kernel (in perf record) by remembering
>> you got the buildid from /sys/kernel/notes?
>
> Sure, the question was more about how to encode that in the build-id
> table in the perf.data file, i.e. we don't have a flags field there that
> we could use a bit for this purpose (hey, this entry is for the kernel,
> please match it to the PERF_RECORD_MMAP [kernel_kallsyms] synthesized
> for the kernel).

So why not adding a new build_id2_event? :)


>
> We know it is a kernel because the name is [kernel.kallsyms] in the
> synthesized PERF_RECORD_MMAP. So perhaps we should have the kernel
> pathname right after the reference relocation symbol, that way we would
> use it to get the entry in the build-id table.
>
> I.e. parts of what we need are in the synthesized PERF_RECORD_MMAP (how
> to do relocation, start-end kernel mmap, and we have parts of it in the
> build-id table: the build-id and the full pathname. How to connect both
> is what we're trying to achieve here, and in a backwards compatible
> way.
>
> The (ugly) way I found to achieve all this is to concatenate the full
> pathname right after the ref reloc symbol in the synthesized
> PERF_RECORD_MMAP for the kernel mmap.
>
> Older tools would just ignore what is after the ref reloc symbol (need
> to check if we put the zero there, if not, we can add the zero + the
> full pathname.).
>
> Newer tools would look if the size of the record is longer than the
> strlen(ref_reloc_symbol), and if it is, it means we have the kernel full
> pathname after the ref_reloc_symbol, that we can use to find the
> build-id in the build-id table.

Or else, what about synthesizing a build-id event right after the kernel
mmap event (in perf_event__synthesize_kernel_mmap)? This way we can
have same effect of extending the mmap event while keeping backward
compatibility IMHO. At report time, we might need to set a flag that it
has just seen the kernel mmap event so that it expects to see a matching
build-id event (use it as a kernel dso).

Thanks,
Namhyung

Arnaldo Carvalho de Melo

unread,
Sep 12, 2014, 10:20:02 AM9/12/14
to
Em Fri, Sep 12, 2014 at 03:14:51PM +0900, Namhyung Kim escreveu:
> On Fri, 5 Sep 2014 12:44:02 -0300, Arnaldo Carvalho de Melo wrote:
> > It seems we need a way to state that an entry in the build-id table is
> > for the kernel, without looking at its file name.

> Maybe we can add a new build_id2_event (like mmap2) that has a new field
> to record that info.

Humm, take a look at machine__write_buildid_table() -> write_buildid(),
we already seem to set build_id_event.header.misc suitably, no?

> > That or to put the build-id into the synthesized kernel mmap
> > event. Which is better, because:

> > That leads to another problem that needs to get solved eventually: We
> > need to have the build-id into PERF_RECORD_MMAP, because we're now using
> > just the mmap filename as the key, not the contents, and for long
> > running sessions, DSOs can get updated, etc.

> But it'd require a realtime processing of events at record time.

Been there, done that, backtracked, yes, we can't do that at 'record
time', else we would be adding way too much noise to the recording
phase.

The idea is for the _kernel_ to do that, would take sizeof(buildid)
(about 20 bytes) on the per-DSO structure in the kernel.

When ELF loading it, stashing the contents of an ELF session
(.note.gnu.build-id) somewhere accessible at PERF_RECORD_MMAP3
generation time.

Doing that at the time we load the DSO from disk should add negligible
overhead.

With that we would not need to go over all the perf.data stream to
generate the build-id table after all record sessions, they would be at
the struct map already.

We could then have multiple entries in the build-id table for the same
pathname, with different build ids, that is something we don't support
now and that provides bogus results when it happens.

- Arnaldo

Namhyung Kim

unread,
Sep 19, 2014, 2:30:02 AM9/19/14
to
On Fri, 12 Sep 2014 11:11:16 -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 12, 2014 at 03:14:51PM +0900, Namhyung Kim escreveu:
>> On Fri, 5 Sep 2014 12:44:02 -0300, Arnaldo Carvalho de Melo wrote:
>> > It seems we need a way to state that an entry in the build-id table is
>> > for the kernel, without looking at its file name.
>
>> Maybe we can add a new build_id2_event (like mmap2) that has a new field
>> to record that info.
>
> Humm, take a look at machine__write_buildid_table() -> write_buildid(),
> we already seem to set build_id_event.header.misc suitably, no?

AFAIK the PERF_RECORD_MISC_KERNEL is set for both of kernel image and
modules so we cannot distinguish a kernel from others (without checking
names).

>
>> > That or to put the build-id into the synthesized kernel mmap
>> > event. Which is better, because:
>
>> > That leads to another problem that needs to get solved eventually: We
>> > need to have the build-id into PERF_RECORD_MMAP, because we're now using
>> > just the mmap filename as the key, not the contents, and for long
>> > running sessions, DSOs can get updated, etc.
>
>> But it'd require a realtime processing of events at record time.
>
> Been there, done that, backtracked, yes, we can't do that at 'record
> time', else we would be adding way too much noise to the recording
> phase.
>
> The idea is for the _kernel_ to do that, would take sizeof(buildid)
> (about 20 bytes) on the per-DSO structure in the kernel.
>
> When ELF loading it, stashing the contents of an ELF session
> (.note.gnu.build-id) somewhere accessible at PERF_RECORD_MMAP3
> generation time.
>
> Doing that at the time we load the DSO from disk should add negligible
> overhead.
>
> With that we would not need to go over all the perf.data stream to
> generate the build-id table after all record sessions, they would be at
> the struct map already.

Yes, it'll solve the problem and simplifies the perf record. However I
suspect it's not acceptable to keep such info in the kernel only for the
sake of perf.


>
> We could then have multiple entries in the build-id table for the same
> pathname, with different build ids, that is something we don't support
> now and that provides bogus results when it happens.

Maybe we can save and compare timestamp of build-id/map event and sample
event. I'm experimenting to separate meta events (comm, mmap, ...) from
sample events with a similar idea... :)

Thanks,
Namhyung

Arnaldo Carvalho de Melo

unread,
Sep 19, 2014, 10:20:01 AM9/19/14
to
Em Fri, Sep 19, 2014 at 03:26:25PM +0900, Namhyung Kim escreveu:
> On Fri, 12 Sep 2014 11:11:16 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 12, 2014 at 03:14:51PM +0900, Namhyung Kim escreveu:
> >> On Fri, 5 Sep 2014 12:44:02 -0300, Arnaldo Carvalho de Melo wrote:
> >> > It seems we need a way to state that an entry in the build-id table is
> >> > for the kernel, without looking at its file name.

> >> Maybe we can add a new build_id2_event (like mmap2) that has a new field
> >> to record that info.

> > Humm, take a look at machine__write_buildid_table() -> write_buildid(),
> > we already seem to set build_id_event.header.misc suitably, no?

> AFAIK the PERF_RECORD_MISC_KERNEL is set for both of kernel image and
> modules so we cannot distinguish a kernel from others (without checking
> names).

Ok, not perfect, but I think we improve the situation by using this
piece of information while looking if the file ends in .ko, in which
case we can mark it as a kernel module, if it doesn't end in .ko, we
have a kernel.

Better than today, probably good enough, no?

> >> > That or to put the build-id into the synthesized kernel mmap
> >> > event. Which is better, because:

> >> > That leads to another problem that needs to get solved eventually: We
> >> > need to have the build-id into PERF_RECORD_MMAP, because we're now using
> >> > just the mmap filename as the key, not the contents, and for long
> >> > running sessions, DSOs can get updated, etc.

> >> But it'd require a realtime processing of events at record time.

> > Been there, done that, backtracked, yes, we can't do that at 'record
> > time', else we would be adding way too much noise to the recording
> > phase.

> > The idea is for the _kernel_ to do that, would take sizeof(buildid)
> > (about 20 bytes) on the per-DSO structure in the kernel.

> > When ELF loading it, stashing the contents of an ELF session
> > (.note.gnu.build-id) somewhere accessible at PERF_RECORD_MMAP3
> > generation time.

> > Doing that at the time we load the DSO from disk should add negligible
> > overhead.

> > With that we would not need to go over all the perf.data stream to
> > generate the build-id table after all record sessions, they would be at
> > the struct map already.

> Yes, it'll solve the problem and simplifies the perf record. However I
> suspect it's not acceptable to keep such info in the kernel only for the
> sake of perf.

Hey, perf is a crucial part of development, even if it was the sole
user, I think this would be acceptable.

But this is not perf specific at all, any other tool that needs to map
from sample to a symtab _needs_ this, there is no other way to, on a
system with running apps and its libraries, to safely go from DSO
pathname to its corresponding binary.

Also this enables one to unambiguously describe a running environment
without packing tons of duplicated payloads for multiple workload runs,
be it for profiling or for any other problem characterization task.

And we have build-ids available for quite a while in all DSOs in the
system.

I bet gdb, systemtap, etc would be glad to use this if available.

> > We could then have multiple entries in the build-id table for the same
> > pathname, with different build ids, that is something we don't support
> > now and that provides bogus results when it happens.

> Maybe we can save and compare timestamp of build-id/map event and sample
> event. I'm experimenting to separate meta events (comm, mmap, ...) from
> sample events with a similar idea... :)

The point is to go from an entry in /proc/pid/maps to the file where it
was loaded from, in a system where updates may have taken place for that
pathname.

- Arnaldo

Namhyung Kim

unread,
Sep 19, 2014, 11:50:02 AM9/19/14
to
2014-09-19 (금), 11:16 -0300, Arnaldo Carvalho de Melo:
> Em Fri, Sep 19, 2014 at 03:26:25PM +0900, Namhyung Kim escreveu:
> > On Fri, 12 Sep 2014 11:11:16 -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, Sep 12, 2014 at 03:14:51PM +0900, Namhyung Kim escreveu:
> > >> On Fri, 5 Sep 2014 12:44:02 -0300, Arnaldo Carvalho de Melo wrote:
> > >> > It seems we need a way to state that an entry in the build-id table is
> > >> > for the kernel, without looking at its file name.
>
> > >> Maybe we can add a new build_id2_event (like mmap2) that has a new field
> > >> to record that info.
>
> > > Humm, take a look at machine__write_buildid_table() -> write_buildid(),
> > > we already seem to set build_id_event.header.misc suitably, no?
>
> > AFAIK the PERF_RECORD_MISC_KERNEL is set for both of kernel image and
> > modules so we cannot distinguish a kernel from others (without checking
> > names).
>
> Ok, not perfect, but I think we improve the situation by using this
> piece of information while looking if the file ends in .ko, in which
> case we can mark it as a kernel module, if it doesn't end in .ko, we
> have a kernel.
>
> Better than today, probably good enough, no?

Yep, I'll check .ko suffix in v3.
Hmm.. okay then. But IIUC kernel only knows about an executable (and
dynamic linker) at load time not libraries as they're loaded by the
linker. So kernel needs to check whether a file is an ELF and has
build-id for every file, right?

Thanks,
Namhyung

Arnaldo Carvalho de Melo

unread,
Sep 19, 2014, 12:50:02 PM9/19/14
to
Em Sat, Sep 20, 2014 at 12:42:46AM +0900, Namhyung Kim escreveu:
> 2014-09-19 (금), 11:16 -0300, Arnaldo Carvalho de Melo:
> > Em Fri, Sep 19, 2014 at 03:26:25PM +0900, Namhyung Kim escreveu:
> > > On Fri, 12 Sep 2014 11:11:16 -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Fri, Sep 12, 2014 at 03:14:51PM +0900, Namhyung Kim escreveu:
> > > >> On Fri, 5 Sep 2014 12:44:02 -0300, Arnaldo Carvalho de Melo wrote:
> > > > Humm, take a look at machine__write_buildid_table() -> write_buildid(),
> > > > we already seem to set build_id_event.header.misc suitably, no?
> >
> > > AFAIK the PERF_RECORD_MISC_KERNEL is set for both of kernel image and
> > > modules so we cannot distinguish a kernel from others (without checking
> > > names).
> >
> > Ok, not perfect, but I think we improve the situation by using this
> > piece of information while looking if the file ends in .ko, in which
> > case we can mark it as a kernel module, if it doesn't end in .ko, we
> > have a kernel.
> >
> > Better than today, probably good enough, no?
>
> Yep, I'll check .ko suffix in v3.

Thanks!
Yes, when it figures out it is an ELF file and invokes this DSO loading
routines, there is where we should insert the code that will read/store
the build id so that when we emit the PERF_RECORD_MMAP3 we can read it.

Namhyung Kim

unread,
Sep 21, 2014, 10:50:01 PM9/21/14
to
tools/perf/util/machine.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b2ec38bf211e..14880b81e98c 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1062,8 +1062,22 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
* Should be there already, from the build-id table in
* the header.
*/
- struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
- kmmap_prefix);
+ struct dso *kernel = NULL;
+ struct dso *dso;
+
+ list_for_each_entry(dso, &machine->kernel_dsos, node) {
+ const char *suffix;
+
+ suffix = dso->long_name + strlen(dso->long_name) - 3;
+ if (strcmp(suffix, ".ko")) {
+ kernel = dso;
+ break;
+ }
+ }
+
+ if (kernel == NULL)
+ kernel = __dsos__findnew(&machine->kernel_dsos,
+ kmmap_prefix);
if (kernel == NULL)
goto out_problem;

--
2.1.0

Ingo Molnar

unread,
Sep 22, 2014, 1:20:01 AM9/22/14
to
Is strlen(dso->long_name) guaranteed to be >= 3? If not then you
should probably check for it, otherwise we might strcmp into
invalid memory.

Thanks,

Ingo

Namhyung Kim

unread,
Sep 22, 2014, 3:40:02 AM9/22/14
to
Hi Ingo,
I *think* they are all valid kernel modules so length should be greater
than 3. But I also think that adding such an extra check won't cause
any harm. So I'll change to check it in v4. :)

Thanks for your review!
Namhyung

Namhyung Kim

unread,
Sep 22, 2014, 4:10:03 AM9/22/14
to
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.

When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.

After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.

During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux() since the default value of the symbol_conf.try_
vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.0-rc2+/build/vmlinux)
if one is running on a custom kernel.

It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.

Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.

In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".

Before:

$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...

After (for the same perf.data):

72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...

Cc: sta...@vger.kernel.org
Signed-off-by: Namhyung Kim <namh...@kernel.org>
---
tools/perf/util/machine.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b2ec38bf211e..5661195a8cf6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -12,6 +12,7 @@
#include <stdbool.h>
#include <symbol/kallsyms.h>
#include "unwind.h"
+#include "asm/bug.h"

int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
{
@@ -1062,8 +1063,26 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
* Should be there already, from the build-id table in
* the header.
*/
- struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
- kmmap_prefix);
+ struct dso *kernel = NULL;
+ struct dso *dso;
+
+ list_for_each_entry(dso, &machine->kernel_dsos, node) {
+ const char *suffix;
+ size_t len = strlen(dso->long_name);
+
+ if (WARN_ONCE(len <= 3, "Too short dso name"))
+ continue;
+
+ suffix = dso->long_name + len - 3;
+ if (strcmp(suffix, ".ko")) {
+ kernel = dso;
+ break;
+ }
+ }
+
+ if (kernel == NULL)
+ kernel = __dsos__findnew(&machine->kernel_dsos,
+ kmmap_prefix);
if (kernel == NULL)
goto out_problem;

--
2.1.0

Ingo Molnar

unread,
Sep 24, 2014, 3:40:01 AM9/24/14
to

* Namhyung Kim <namh...@kernel.org> wrote:

Please don't break the line just to pacify checkpatch.pl. Other
than that:

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ing

Namhyung Kim

unread,
Sep 29, 2014, 12:50:02 AM9/29/14
to
Hi Ingo and Arnaldo,

On Wed, 24 Sep 2014 09:33:56 +0200, Ingo Molnar wrote:
> * Namhyung Kim <namh...@kernel.org> wrote:
>> @@ -1062,8 +1063,26 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>> * Should be there already, from the build-id table in
>> * the header.
>> */
>> - struct dso *kernel = __dsos__findnew(&machine->kernel_dsos,
>> - kmmap_prefix);
>> + struct dso *kernel = NULL;
>> + struct dso *dso;
>> +
>> + list_for_each_entry(dso, &machine->kernel_dsos, node) {
>> + const char *suffix;
>> + size_t len = strlen(dso->long_name);
>> +
>> + if (WARN_ONCE(len <= 3, "Too short dso name"))
>> + continue;
>> +
>> + suffix = dso->long_name + len - 3;
>> + if (strcmp(suffix, ".ko")) {
>> + kernel = dso;
>> + break;
>> + }

I just noticed that the modules can be gzip'ed on some system
(e.g. Arch) so that it no longer has the ".ko" suffix.

$ ls /lib/modules/`uname -r`/kernel/fs/btrfs/
btrfs.ko.gz


Actually in this case, the dso->long_name cannot be set since when perf
record synthesizes module map events, it checks the ".ko" suffix
also. :/

And I also guess that if one loads a custom module not in a canonical
path, it again cannot find the long name (absolute path) of the module
and it results in no ".ko" suffix in the long name - so the check will
be broken too.


>> + }
>> +
>> + if (kernel == NULL)
>> + kernel = __dsos__findnew(&machine->kernel_dsos,
>> + kmmap_prefix);
>
> Please don't break the line just to pacify checkpatch.pl. Other
> than that:

For non string literals too? Anyway in this case, it's already broken. :)


>
> Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks anyway! :)

Namhyung
0 new messages