Code Review - Use core_id_early() in kprof trace buffer print

4 views
Skip to first unread message

Davide Libenzi

unread,
Dec 7, 2015, 8:49:18 PM12/7/15
to Akaros
After *a whole lot* of fun with Gan (his Akaros box reset button melted basically), we traced the stuck issue to calls to core_id() which access the GS:8 address when it has not been setup.
Strange that Gan changes triggered this, and that before was working fine.


https://github.com/brho/akaros/compare/master...dlibenzi:use_core_id_early


The following changes since commit 2fa42319139e4cc5ca853546363f84443d0ead00:

  Rename 'reallocarray' to 'kreallocarray'. (2015-11-25 18:02:04 -0500)

are available in the git repository at:

  g...@github.com:dlibenzi/akaros use_core_id_early

for you to fetch changes up to a2c903e3f58df5d0bcffa6a2f64b1c8a207a1040:

  Use core_id_early() in kprof trace buffer print code (2015-12-07 17:42:25 -0800)

----------------------------------------------------------------
Davide Libenzi (1):
      Use core_id_early() in kprof trace buffer print code

 kern/drivers/dev/kprof.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c
index ed16bda..db2739f 100644
--- a/kern/drivers/dev/kprof.c
+++ b/kern/drivers/dev/kprof.c
@@ -574,7 +574,7 @@ static struct trace_printk_buffer *kprof_get_printk_buffer(void)
  spin_unlock_irqsave(&ktrace_lock);
  }
 
- return cpu_tpbs + core_id();
+ return cpu_tpbs + core_id_early();
 }
 
 void trace_vprintk(bool btrace, const char *fmt, va_list args)
@@ -619,7 +619,7 @@ void trace_vprintk(bool btrace, const char *fmt, va_list args)
  if (likely(system_timing.tsc_freq))
  tsc2timespec(read_tsc(), &ts_now);
  snprintf(hdr, sizeof(hdr), "[%lu.%09lu]:cpu%d: ", ts_now.tv_sec,
- ts_now.tv_nsec, core_id());
+ ts_now.tv_nsec, core_id_early());
 
  pb.ptr = usrbuf + vsnprintf(usrbuf, usr_bufsz, fmt, args);
  pb.top = usrbuf + usr_bufsz;

Davide Libenzi

unread,
Dec 7, 2015, 9:16:59 PM12/7/15
to Akaros
As side note, given that here we are in C-land, that gs:8 hard-coding could be turned into reading the proper offset.
I see a comment on include loops there, but once the gen offsets is in place, the offsets are going to be inside an include file which has no dependency at all, so we can turn that 8 into a proper defined offset.

Kevin Klues

unread,
Dec 7, 2015, 9:19:26 PM12/7/15
to aka...@googlegroups.com
The only reason it exists at all is because of .h file dependency
hell. We would just use the struct directly if we could get the
header file included.
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akaros+un...@googlegroups.com.
> To post to this group, send email to aka...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
~Kevin

Davide Libenzi

unread,
Dec 7, 2015, 9:21:09 PM12/7/15
to Akaros
Hey Kevin! 😀
Yes, I tried with offsetof(), it was hell.
I am adding these to asm-offsets.c, so that we can use pcpui_coreid in there.

EMIT_OFFSET(pcpui_stacktop, struct per_cpu_info, stacktop);
EMIT_OFFSET(pcpui_coreid, struct per_cpu_info, coreid);


Davide Libenzi

unread,
Dec 8, 2015, 9:27:11 AM12/8/15
to Akaros
After looking a bit more about this bug, this was, /methink, more triggered by memory structures being just at different positions due to new Gan code, more than fiddling with GS_BASE MSR.
There is basically a window of time, where num_cores >0 and GS/percpu is not setup correctly, but still pointing somewhere we do not GPF.
So core_id(), if we were lucky, would just be 0 (or less than num_cores), or we would just scribble random memory until GS/percpu was setup correctly.
Once the new percpu code is in, we should be able to drop the ghetto one in kprof.c.

Barret Rhoden

unread,
Dec 8, 2015, 11:22:51 AM12/8/15
to aka...@googlegroups.com
On 2015-12-08 at 06:27 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> After looking a bit more about this bug, this was, /methink, more
> triggered by memory structures being just at different positions due
> to new Gan code, more than fiddling with GS_BASE MSR.
> There is basically a window of time, where num_cores >0 and GS/percpu
> is not setup correctly, but still pointing somewhere we do not GPF.

I guess this is after topology_init() and before smp_boot()
(arch_init() -> smp_boot() -> smp_final_core_init()).

This is somewhat true. Core 0's GS should be set up in assembly before
we even get into kernel_init(). k/a/x/entry64.S. So the above window
only should exist for cores other than core 0.

Any idea which core was flipping out? (some non-zero core could be
printing during smp_boot during an APIC access, which is my guess).

Also, I'm not sure this fix is enough:

static struct trace_printk_buffer *kprof_get_printk_buffer(void)
{
static struct trace_printk_buffer boot_tpb;
static struct trace_printk_buffer *cpu_tpbs;

if (unlikely(!num_cores))
return &boot_tpb;
if (unlikely(!cpu_tpbs)) {
/* Poor man per-CPU data structure. I really do no like
* littering global data structures with module specific data.
*/
spin_lock_irqsave(&ktrace_lock);
if (!cpu_tpbs)
cpu_tpbs = kzmalloc(num_cores * sizeof(struct
trace_printk_buffer), 0);
spin_unlock_irqsave(&ktrace_lock);
}

return cpu_tpbs + core_id_early();
}

(this is with your change).

If we get a printk from a core != 0, but before smp_final_core_init()
finishes (specifically, when we set core_id_ready = TRUE;), then
core_id_early will return 0, and some core other than core 0 will get
the trace buffer for core 0.

Odds are, there's a race associated with that. For instance:

char *usrbuf = tpb->buffer, *kpbuf = tpb->buffer + usr_bufsz;
const char *utop, *uptr;
char hdr[64];

if (tpb->in_use)
return;
tpb->in_use++;

That is racy if there are concurrent users of tpb.

One solution is to not use trace_printk for all printks before
smp_boot. That's lousy since we'd like to have all of that saved.

Another would be to not use per-core buffers for trace_printk. It's not
clear that we need that level of optimization for regular printk (which
is already serialized, btw), but maybe we would like it for the
tracer. I'd also like to keep the "cpuX:" formatting. So that stinks
too.

Another option would be to have a call to disable trace_vprintk
temporarily, and smp_boot can turn it off and on.

In general, it's hard to do per-cpu things when we're in the window of
having multiple cores but not having completed smp initialization.

FWIW, we'll probably run into this problem in the future with other
things.

Barret

p.s. https://www.youtube.com/watch?v=6bOy3RNyWME

Davide Libenzi

unread,
Dec 8, 2015, 11:29:06 AM12/8/15
to Akaros
Yeah, during the tests we did with Gan, I had a CAS on the in_use. Let me put that in.
I would not want the tracing code disabled in the early stages, if it is fixable.
We would lose a bunch of info spitted out before.
Setting up GS base as first thing on remote core boots, will help.


Davide Libenzi

unread,
Dec 8, 2015, 11:43:37 AM12/8/15
to Akaros
Code in the same branch.
This also should fix a potential race in case kzmalloc() ended up printk()-ing (another thing we tried with Gan yesterday - but wasn't it).

Barret Rhoden

unread,
Dec 8, 2015, 4:33:26 PM12/8/15
to aka...@googlegroups.com
On 2015-12-07 at 17:49 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> The following changes since commit
> 2fa42319139e4cc5ca853546363f84443d0ead00:
>
> Rename 'reallocarray' to 'kreallocarray'. (2015-11-25 18:02:04
> -0500)
>
> are available in the git repository at:
>
> g...@github.com:dlibenzi/akaros use_core_id_early

Merged to master at 594cd9d76318..495a723da7db (from, to]

You can see the entire diff with 'git diff' or at
https://github.com/brho/akaros/compare/594cd9d76318...495a723da7db
Reply all
Reply to author
Forward
0 new messages