Simplifying the kprof interface

14 views
Skip to first unread message

Davide Libenzi

unread,
Oct 31, 2015, 1:06:45 PM10/31/15
to Akaros
The current kprof interface is a bit cumbersome to use.
You have to configure it, start the timers, start the interface, stop the interface, stop the timers.
Moreover, I think the kprofinit is called at boot time, which allocates and sets up the profiler, even though nobody is using it.
IMO the profiler should have zero cost if nobody is using it, and the setup should be done once the first user attaches, and be undone after the last detaches.
Also, timers should be started/stopped automatically, and not as a separate operation.
Even more, from userspace POV, the whole interaction could be mediated by a script, which, like Linux perf, starts the kprof, runs a program, stop kprof, and spew data onto the output file passed as script parameter.
Comments?

ron minnich

unread,
Oct 31, 2015, 1:17:17 PM10/31/15
to Akaros
I like the current interface, actually, it's one I've used before. It's a set of simple primitve operations, and I'm a primitive person.

But what you might do is add another file to the synthetic that combines the operations in the way that you would prefer; or add a new command that works the way you want. Then we all get what we want, how often can we say that!

There are some cases where people set everything up, and then run the profiler at fixed intervals for fixed times, e.g. 5 minutes out of an hour. 

I agree about your attach/detach idea for setup, that's very nice. 

If you'd be willing to let me retain timer control, i.e manual start and stop or timers, the rest of what you are saying seems quite reasonable.

ron

--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Davide Libenzi

unread,
Oct 31, 2015, 1:28:11 PM10/31/15
to Akaros
But the timers have no use alone. There is nothing that uses them, if the profiler is not running.
You could still have your run-kprof-at-intervals thing. It's just that with the simplified interface nothing will be disrupting kernel and userspace code, when prof is not active.

ron minnich

unread,
Oct 31, 2015, 1:29:55 PM10/31/15
to Akaros
Actually I've used the timers for other things. They're a handy way to poke interrupts at cores. 

ron

Davide Libenzi

unread,
Oct 31, 2015, 1:37:27 PM10/31/15
to Akaros
Oh, OK ☺
Wouldn't a /dev/timer be a better location for a generic timer based IRQ poking?

Davide Libenzi

unread,
Oct 31, 2015, 1:42:11 PM10/31/15
to Akaros
Like:

$ echo start KERNEL_FUNCTION INTERVAL_US CORES > /dev/timer

Example:

$ echo start my_timer 1000 0,1,2 > /dev/timer
$ echo start my_timer 1000 all > /dev/timer
$ echo stop my_timer > /dev/timer



ron minnich

unread,
Oct 31, 2015, 1:46:44 PM10/31/15
to Akaros
Sure, but to me it still gets back to the logic analyzer like interface of the current setup. Set up probes, start timer, stop timer, change or remove probes, start timer, stop timer, and so on. 

I prefer it. But if you want to submit a CL and it gets approved, I'm fine with that too.

ron

Davide Libenzi

unread,
Oct 31, 2015, 1:55:14 PM10/31/15
to Akaros
Maybe I don't know what those probes you are talking about are, from a SW POV.
At that point, even w/out a dedicated timer device, the timer itself could be a separate virtual file, with an interface like the one shown above, when you can actually call specific functions.

Davide Libenzi

unread,
Oct 31, 2015, 6:40:22 PM10/31/15
to Akaros
What is the requirement for you for these timers?
Do they need to be able to trigger kernel functions, or you are using them only to generate IRQ (either local or IPI)?
Can you give me a concrete example on how you use them?

ron minnich

unread,
Nov 2, 2015, 8:05:36 AM11/2/15
to Akaros
My use is very primitive :-)
Just to generate apic interrupts easily. No other use. 

I like your idea of a separate file.

ron

Davide Libenzi

unread,
Nov 2, 2015, 9:03:18 AM11/2/15
to Akaros
So /dev/kron? ☺
If the profiler is running while the timer IRQs are being generated, would that still be OK for you?
If not, I will leave optimer as is, and have opstart start them (if not already running), and opstop stop them.
BTW these "op" prefixes will be gone, as the new profiler has absolutely nothing to do with oprofile anymore.

ron minnich

unread,
Nov 2, 2015, 11:07:19 AM11/2/15
to Akaros
I would just change it as you think best, and I'll figure it all out once you're done :-)

kron, wow, nice ring to it.

Oh, wait, I confused it with https://www.youtube.com/watch?v=RBMY3VV5AMA crom

Dan Cross

unread,
Nov 2, 2015, 11:14:49 AM11/2/15
to aka...@googlegroups.com
Hey! That's not hyperboria; that's Southern California!

Davide Libenzi

unread,
Nov 2, 2015, 3:06:36 PM11/2/15
to Akaros
I also removed the constraint that data vanishes first time read, and that you can only "cat" it, and not "cp" it.
Basically data will be there until the next profiler start command happen.

Barret Rhoden

unread,
Nov 5, 2015, 1:52:49 PM11/5/15
to aka...@googlegroups.com
On 2015-10-31 at 10:28 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> But the timers have no use alone. There is nothing that uses them, if
> the profiler is not running.

The separation between the timer and the start/stop was done
intentionally.

Here's the rationale from when that code was written:

commit 383648aa099a3974b397cf81d1a82969bf7814fc
Author: Barret Rhoden <br...@cs.berkeley.edu>
Date: Tue Dec 9 22:44:37 2014 -0800

Per-cpu timer control for oprofile sampling

To use oprof now, you need to set the timer, then enable the profiling.
If/when we add other tracers that can be turned on and off, we'll
continue to use this model: set the collection of things to trace, then
start them all at once.

If you don't have the timers turned on, but you run opstart, other
traces, such as TRACE_ME and whatnot (like perhaps writing traces from
userspace via kpoprofile) will still be collected.

I don't have a control for various TRACE_MEs yet. Maybe we can add one
(like printx).

Some examples:

To turn on the timer on core 0. This turns on an alarm/IRQ:

$ echo optimer 0 on > /prof/kpctl

Then start the actual profiling/collection:

$ echo opstart > /prof/kpctl

To stop collecting:

$ echo opstop > /prof/kpctl

To turn off all the timers:

$ echo optimer all off > /prof/kpctl

Oh, and you can adjust the timer period if you like. Default is 1ms.

$ echo optimer period 1000 > /prof/kpctl (period is in usec)

Davide Libenzi

unread,
Nov 5, 2015, 2:05:46 PM11/5/15
to Akaros
Cough ... cough ... I did not notice TRACEME() 😀
New code does not support it, should I add back?


Barret Rhoden

unread,
Nov 5, 2015, 2:07:36 PM11/5/15
to aka...@googlegroups.com
On 2015-10-31 at 10:06 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> Moreover, I think the kprofinit is called at boot time, which
> allocates and sets up the profiler, even though nobody is using it.
> IMO the profiler should have zero cost if nobody is using it, and the
> setup should be done once the first user attaches, and be undone
> after the last detaches.

One thing is that the only cost to kprofinit() is RAM. Tracking the
attach/detach is a bit more painful, but if you want to do it, then go
for it. It'll also require being more careful about removing the
per-cpu buffers (need to make sure there are no samples in progress
across the entire machine in a race-free manner). You also need to
know when the last user detaches (there is no 'detach' command). The
hassle didn't seem worth the RAM.

Another alternative would be to only do the init stuff the first time
attach is called (use the run_once() macro). That way the RAM usage of
init() is only taken on attach, but we don't need to deal with removing
it.

As far as other cleanup goes, there's other stuff in there that might
be deletable. Anything related to Kprofdataqid can probably be
removed. That's the old sampling profiler from Plan 9, and I don't
imagine we'll be using it again. (Let me know if I'm wrong, Ron.)

Barret

Davide Libenzi

unread,
Nov 5, 2015, 2:13:23 PM11/5/15
to Akaros
Currently "start" starts the timers in all CPUs, and start profling.
Stop does the contrary.
Profile data is no more "volatile" and you can use "cp" instead of baing forced to use "cat", which was kind of weird.
The cost of profiling is not only RAM.
With hooks now in MM and process creation, there is a little cost there as well.



Davide Libenzi

unread,
Nov 5, 2015, 2:22:03 PM11/5/15
to Akaros
There is only one call to TRACEME(), in kthread.c.
IMO mixing time-based sampling, with call based, has not much sense.
Maybe the TRACEME could generate data for kprof_write_sysrecord(), which is a different data stream, which is isolated from the time based sampling (kind of dmesg).

Davide Libenzi

unread,
Nov 5, 2015, 4:14:19 PM11/5/15
to Akaros
OK, I made TRACEME() (only used in one place), to be calling trace_printk(), which in turns feeds into the /prof/kptrace.
The trace_printk() has been enhanced to emit a backtrace as well.
Same branch.

Davide Libenzi

unread,
Nov 5, 2015, 5:22:16 PM11/5/15
to Akaros
Example:

/ $ cat /prof/kptrace
[      8.102085467]:cpu0: kern/drivers/dev/kprof.c(227)
        Backtrace:
        #01 [<0xffffffffc207233e>] in trace_printk
        #02 [<0xffffffffc207250f>] in kprof_init
        #03 [<0xffffffffc203d19c>] in devtabinit
        #04 [<0xffffffffc200b40d>] in kernel_init


Davide Libenzi

unread,
Nov 5, 2015, 8:41:22 PM11/5/15
to Akaros
The way it is now, /prof/kptrace looks like dmesg.
Data does not vanish at first retrieval.
Data goes in there, kernel side, via trace_printk() (TRACEME calls that).
The user can also add data in there:

$ echo Fubar > /prof/kptrace

Data pumped from user side, has not bracktrace with it (will be pointing to the same trace all the time ☺).
A circular buffer behind it, takes care of dropping old data if necessary (default buffer size if 512KB).

Barret Rhoden

unread,
Nov 9, 2015, 11:56:11 AM11/9/15
to aka...@googlegroups.com
On 2015-11-05 at 14:22 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> Example:
>
> / $ cat /prof/kptrace
> [ 8.102085467]:cpu0: kern/drivers/dev/kprof.c(227)
> Backtrace:
> #01 [<0xffffffffc207233e>] in trace_printk

Can we make trace_printk just do a print instead of a backtrace? Then
we can have something else for the backtrace? I basically tried to
make it just like ftrace's trace_printk(), where you can inject text
into the trace: https://lwn.net/Articles/365835/. (grep trace_printk).

Barret



Davide Libenzi

unread,
Nov 9, 2015, 11:58:22 AM11/9/15
to Akaros
Yes, the new API accept a "bool btrace" parameter. When, for example, you write to printk(), you get the raw data (with timestamp), while when used in TRACAME(), it is used with btrace==TRUE.



Barret



Reply all
Reply to author
Forward
0 new messages