/*
* we currently do not support PERF_FORMAT_GROUP on inherited events
*/
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto done;
plus there is a comment "XXX PERF_FORMAT_GROUP vs inherited events
seems difficult" next to perf_output_read_group() (but there isn't a
similar comment on perf_read_hw()).
First, what is the difficulty referred to here?
Secondly, if the difficulty is just to do with the intersection of
sampling counters, inheritance, and group readout (as seems to be the
case), could we please allow group readout on ordinary counting
(non-sampling) counters? That is, change the test above to something
like:
if (attr->inherit && attr->sample_period &&
(attr->read_format & PERF_FORMAT_GROUP))
goto done;
Any objections to that change? If it's OK, could we get it into .33
and .32-stable?
Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
IIRC its the fact that we have to go collect the count delta from all
the child counters, which can be quite a lot of work depending on the
number of cpus and children around.
> Secondly, if the difficulty is just to do with the intersection of
> sampling counters, inheritance, and group readout (as seems to be the
> case), could we please allow group readout on ordinary counting
> (non-sampling) counters? That is, change the test above to something
> like:
>
> if (attr->inherit && attr->sample_period &&
> (attr->read_format & PERF_FORMAT_GROUP))
> goto done;
>
> Any objections to that change? If it's OK, could we get it into .33
> and .32-stable?
Yeah, that's still broken, you can't do a read without collecting all
the child counts.
But we don't go and collect the count delta from children without
PERF_FORMAT_GROUP, so why would we with it?
There are two situations where PERF_FORMAT_GROUP makes a difference:
with PERF_SAMPLE_READ when storing a sample in the ring buffer, and
when you do a read() system call on a perf_event fd. In both
situations, if the counter is inherited, we don't go collecting up
child counts, we just store the value of the counter that overflowed
in the sampling case, or the value of the top-level counter in the
read() case.
Now, I can see a possible difficulty in the sampling case if you have
a group that has some inherited members and some non-inherited
members. In that case if you get an overflow on a child counter, the
group it's in will have fewer members that the group that the
top-level counter is part of, which could get confusing. But there is
no such problem for read() since it is always returning the value of
the top-level counter.
> > Secondly, if the difficulty is just to do with the intersection of
> > sampling counters, inheritance, and group readout (as seems to be the
> > case), could we please allow group readout on ordinary counting
> > (non-sampling) counters? That is, change the test above to something
> > like:
> >
> > if (attr->inherit && attr->sample_period &&
> > (attr->read_format & PERF_FORMAT_GROUP))
> > goto done;
> >
> > Any objections to that change? If it's OK, could we get it into .33
> > and .32-stable?
>
> Yeah, that's still broken, you can't do a read without collecting all
> the child counts.
We do a read without collecting all the child counts if
PERF_FORMAT_GROUP is not set -- why would that be any different when
PERF_FORMAT_GROUP is set? PERF_FORMAT_GROUP is about the "horizontal"
dimension (across group members) not the "vertical" dimension (down to
all the child counters).
Paul.
Yes we do, see perf_event_read_value().
But now that I look at it we don't seem to do so in
perf_output_read_one()... I guess we should fix that.
There is of course the lock inversion in the .read() code reported by
stephane, but other than that is seems to actually support inherited &&
group just fine.
So I think if we fix that lock inversion and make the PERF_SAMPLE_READ
code look like the .read() code it should all work out.
Ah, true, I should have read the code more carefully.
> But now that I look at it we don't seem to do so in
> perf_output_read_one()... I guess we should fix that.
I suppose it should give the same value as read() would, but the
possibly unbounded interrupt latency is a bit of a worry. I can't
think of a way to avoid it, though (other than not using
PERF_SAMPLE_READ with inherited sampling events :).
> There is of course the lock inversion in the .read() code reported by
> stephane, but other than that is seems to actually support inherited &&
> group just fine.
>
> So I think if we fix that lock inversion and make the PERF_SAMPLE_READ
> code look like the .read() code it should all work out.
Cool.
Paul.
I now realize that this is going to be very complicated because it
involves sending IPIs from NMI context, which is rather involved.
So I might have meant:
attr->inherit && (attr->sample_format & PERF_SAMPLE_READ)
to be mutually exclusive.