Question

14 views
Skip to first unread message

Davide Libenzi

unread,
Oct 17, 2015, 2:57:49 PM10/17/15
to Akaros
I was browsing in TCP code, and I see some of code like this:

{
    ERRSTACK(1);
    if (waserror()) {
        qunlock(&s->qlock);
        nexterror();
    }
    qlock(&s->qlock);
    // do some work which might "throw"
    qunlock(&s->qlock);
    poperror();
}

Shouldn't the qlock() be before the waserror()?
If qlock throws, it looks like we will be performing an unlock on a not locked object.
Even if it doesn't, shouldn't as general rule, "undo" operations done in a waserror(), happen after the matching "do" operation succeeded?

{
    ERRSTACK(1);
    qlock(&s->qlock);
    if (waserror()) {
        qunlock(&s->qlock);
        nexterror();
    }
    // do some work which might "throw"
    qunlock(&s->qlock);
    poperror();
}

Dan Cross

unread,
Oct 17, 2015, 3:40:13 PM10/17/15
to aka...@googlegroups.com
On Sat, Oct 17, 2015 at 2:57 PM, 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:
I was browsing in TCP code, and I see some of code like this:

{
    ERRSTACK(1);
    if (waserror()) {
        qunlock(&s->qlock);
        nexterror();
    }
    qlock(&s->qlock);
    // do some work which might "throw"
    qunlock(&s->qlock);
    poperror();
}

Shouldn't the qlock() be before the waserror()?

Not necessarily. Something may jump to the true case of the waserror(); but whatever that thing is will be *after* the qlock() call. It's just that it will jump back to a point that was before the qlock call with respect to the program text ordering, but it must be the case that qlock() was called before true-branch for waserror() is taken in the time-ordering of program execution. The error handling code will just unlock and jump somewhere else, so this construct is safe.

But I agree that it's ugly and confusing.

If qlock throws, it looks like we will be performing an unlock on a not locked object.

qlock() is really sem_down(), and sem_down doesn't use the error stack. I jus took a look at the Plan 9 code as well, and it doesn't appear to jump down the error stack either. Whether that's documented to be a part of the contract of that interface is another matter, though (it should be, if it's not).

Even if it doesn't, shouldn't as general rule, "undo" operations done in a waserror(), happen after the matching "do" operation succeeded?

Absolutely.

{
    ERRSTACK(1);
    qlock(&s->qlock);
    if (waserror()) {
        qunlock(&s->qlock);
        nexterror();
    }
    // do some work which might "throw"
    qunlock(&s->qlock);
    poperror();
}

Yes; this is better.

        - Dan C.


ron minnich

unread,
Oct 17, 2015, 6:34:09 PM10/17/15
to aka...@googlegroups.com
On Sat, Oct 17, 2015 at 11:57 AM 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:
I was browsing in TCP code, and I see some of code like this:

{
    ERRSTACK(1);
    if (waserror()) {
        qunlock(&s->qlock);
        nexterror();
    }
    qlock(&s->qlock);
    // do some work which might "throw"
    qunlock(&s->qlock);
    poperror();
}

Shouldn't the qlock() be before the waserror()?

No. The waserror case here is for undoing work done after the waserror, which includes the qlock. 

thanks for looking at that code, however :-)
The more eyes the better.

ron

Davide Libenzi

unread,
Oct 17, 2015, 6:40:39 PM10/17/15
to Akaros
Hmm, are you sure?
Forget the locks case, that might not longjmp. Take this example (assuming an error()-ing kzmalloc):

void* ptr;

if (waserror()) {
  kfree(ptr);
  nexterror();
}
ptr = kzmalloc();


What happens if kzmalloc() longjumps and waserror() frees ptr?
Now this below is safe. Once kzmalloc() returns, ptr is a validly allocated block of memory:

ptr = kzmalloc();
if (waserror()) {
  kfree(ptr);
  nexterror();
}





--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ron minnich

unread,
Oct 17, 2015, 6:43:09 PM10/17/15
to Akaros
But that's an improper use of waserror if kzmalloc can throw. If you find code like that let us know.

ron

Davide Libenzi

unread,
Oct 17, 2015, 6:43:12 PM10/17/15
to Akaros
Ron, note that 98% of the code, besides the one I posted in my patch (though, there might be some more around), follows the pattern of waserror-releases-resource-after-acquisition.

Message has been deleted

Dan Cross

unread,
Oct 17, 2015, 7:40:38 PM10/17/15
to aka...@googlegroups.com
On Sat, Oct 17, 2015 at 6:40 PM, 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:
Hmm, are you sure?

I addressed this specifically; did you see my email?

Forget the locks case, that might not longjmp. Take this example (assuming an error()-ing kzmalloc):

kzmalloc() doesn't error().

void* ptr;

if (waserror()) {
  kfree(ptr);
  nexterror();
}
ptr = kzmalloc();


What happens if kzmalloc() longjumps and waserror() frees ptr?

We don't do that.
 
Now this below is safe. Once kzmalloc() returns, ptr is a validly allocated block of memory:

ptr = kzmalloc();
if (waserror()) {
  kfree(ptr);
  nexterror();
}

Non-local control flows often result in non-linearity in ways that aren't evident from the program source. Don't get me wrong: your version is much clearer and, I would argue, all around better; but the earlier version was correct.

Davide Libenzi

unread,
Oct 17, 2015, 7:47:42 PM10/17/15
to Akaros
The kzmalloc() was just an example 😉
This is coming straight from inferno man page for waserror&co:

A complex function can have nested error handlers.  A
          waserror block will follow the acquisition of a resource,
          releasing it on error before calling nexterror, and a
          poperror will precede its release in the normal case.  For
          example:

void
               outer(Thing *t)
               {
>>> 1              qlock(t);
>>> 2              if(waserror()){      /* A */
                       qunlock(t);
                       nexterror();
                   }
                   m = mallocz(READSTR, 0);
                   if(m == nil)
                       error(Enomem);
                   if(waserror()){     /* B */
                       free(m);
                       nexterror();    /* invokes A */
                   }
                   inner(t);
                   poperror();         /* pops B */
                   free(m);
                   poperror();         /* pops A */
                   qunlock(t);
               }


Dan Cross

unread,
Oct 17, 2015, 7:51:07 PM10/17/15
to aka...@googlegroups.com
On Sat, Oct 17, 2015 at 7:47 PM, 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:
The kzmalloc() was just an example 😉
This is coming straight from inferno man page for waserror&co:

As I've said twice now, I really do think your code is an improvement....
A complex function can have nested error handlers.  A
          waserror block will follow the acquisition of a resource,
          releasing it on error before calling nexterror, and a
          poperror will precede its release in the normal case.  For
          example:
But strictly speaking, that still holds. The 'waserror' *is* happening after the resource acquisition *in the time domain*. You are not incorrect, however; the code is awkward and as I'm saying for the fourth time, now, what you've done is an improvement.

Dan Cross

unread,
Oct 17, 2015, 7:53:51 PM10/17/15
to aka...@googlegroups.com
PS: If I had to hazard a guess, I would imagine at least some of that code may be due to Michael Baldwin. He was a fan of the Icon programming language and it's exception model, which this construct vaguely resembles. Unfortunately, he passed away from cancer some years ago; he was a really great person.

Davide Libenzi

unread,
Oct 17, 2015, 7:58:49 PM10/17/15
to Akaros
I know that was not introduced by us, as the same code seems matching here:



Dan Cross

unread,
Oct 17, 2015, 8:06:19 PM10/17/15
to aka...@googlegroups.com
(Michael wasn't an Akaros person.)

ron minnich

unread,
Oct 17, 2015, 8:49:47 PM10/17/15
to aka...@googlegroups.com
I agree with Dan who agrees with you, the qlock movement is probably clearer. I just don't believe it changes correctness one way or another, though I"m not sure. 

ron

ron minnich

unread,
Oct 19, 2015, 1:38:00 AM10/19/15
to aka...@googlegroups.com
I got curious and asked a buddy of mine who used to be at the Labs who's been in the community a bit longer than me :-)


His comment: 
----
I don't have easy access to the Plan 9 codebase (a neverending story,
mostly my fault), but yes, that looks wrong.

 The usual sequence would be

        do thing
        if(waserror()){
                undo thing
                nexterror() or return as appropriate
        }
        ...
        poperror();
        undo thing

There's also an idiom (not used by me)

        say, allocate something
        if(!waserror()){
                do something that might make an error return
                poperror();
        }
        say, free the allocated thing

where you only care about catching the error, not about returning that
to a higher level.
----
So I think I'm glad davide is reading the code.

ron 

Barret Rhoden

unread,
Oct 19, 2015, 1:27:13 PM10/19/15
to aka...@googlegroups.com
On 2015-10-19 at 05:37 ron minnich <rmin...@gmail.com> wrote:
> I got curious and asked a buddy of mine who used to be at the Labs
> who's been in the community a bit longer than me :-)
>
>
> His comment:
> ----
> I don't have easy access to the Plan 9 codebase (a neverending story,
> mostly my fault), but yes, that looks wrong.
>
> The usual sequence would be
>
> do thing
> if(waserror()){
> undo thing
> nexterror() or return as appropriate
> }
> ...
> poperror();
> undo thing

This seems to be the more correct way. Doing the mallocs and qlocks
later assumes they do not throw.

It is true, however, that those do not throw - yet! Eventually, I want
to allow any function that blocks to throw. That would allow syscall
aborts in more places. Right now, you can only abort when in a
rendez. I looked into this very thing a year or so ago when I did the
syscall aborting, and I saw that qlocks are not allowed to throw (based
on their current usage).

There are some places where the waserror handling gets tricky. Check
out devwalk:

volatile int alloc; /* to keep waserror from optimizing this out */
struct walkqid *wq;

[...]

alloc = 0;
wq = kzmalloc(sizeof(struct walkqid) + nname * sizeof(struct qid),
KMALLOC_WAIT);
if (waserror()) {
if (alloc && wq->clone != NULL)
cclose(wq->clone);
kfree(wq);
poperror();
return NULL;
}

Later on, alloc gets set, so we know that wq exists. But the compiler
is smart, and it saw that alloc == 0, and it optimized out the cclose()
(until I made alloc a volatile).

> There's also an idiom (not used by me)
>
> say, allocate something
> if(!waserror()){
> do something that might make an error return
> poperror();
> }
> say, free the allocated thing
>
> where you only care about catching the error, not about returning that
> to a higher level.

Yeah, I call those the 'discard' style.

/* "discard the error" style (we run the conditional code) */
if (!waserror()) {
rendez_init(&rv);
rendez_sleep_timeout(&rv, ret_zero, 0, usec);
}

Barret


ron minnich

unread,
Oct 19, 2015, 3:59:49 PM10/19/15
to aka...@googlegroups.com
So, here's my take. And it differs from my friend, but I've been doing error/waserror stuff for 15 years so I claim to know something too. 

I've always thought of waserror as a demarcation line, and the contract you sign is simple:
if (waserror()) { restore status quo ante as of this line; all bets are off for anything that came before}

That's it. So to me, putting the qlock before waserror is confusing; waserror is restoring a status quo ante that existed before the call to it!

But jmk has been doing this for 25 years and he worked in the Unix Room so I'll take his word before mine.

ron

Davide Libenzi

unread,
Oct 19, 2015, 4:52:24 PM10/19/15
to Akaros
Ron, it should not restore the status before its line.
Example:

irq_disable();
if (waserror()) {
    irq_enable();
    nexterror();
}

If waserror() was restoring the status right before its call, it would have restored the IRQ disabled flag.
IMHO a more correct way to describe its contract is, the waserror==true path should restore whatever status was changed between its position, and the closest between the function entry, or the previous waserror.



ron minnich

unread,
Oct 19, 2015, 5:40:31 PM10/19/15
to Akaros
so far, everyone agrees with davide, so that code needs fixin'.

ron

Barret Rhoden

unread,
Oct 19, 2015, 6:38:39 PM10/19/15
to aka...@googlegroups.com
On 2015-10-19 at 19:59 ron minnich <rmin...@gmail.com> wrote:
> So, here's my take. And it differs from my friend, but I've been doing
> error/waserror stuff for 15 years so I claim to know something too.
>
> I've always thought of waserror as a demarcation line, and the
> contract you sign is simple:
> if (waserror()) { restore status quo ante as of this line; all bets
> are off for anything that came before}
>
> That's it. So to me, putting the qlock before waserror is confusing;
> waserror is restoring a status quo ante that existed before the call
> to it!

I get this, actually. It's a "something went wrong, so clean up,
restore any invariants, and pop on out". Back to the devwalk case,
which does the two types of cleanup we're discussing:

volatile int alloc;
struct walkqid *wq;

[...]
alloc = 0;
wq = kzmalloc(sizeof(struct walkqid) + nname * sizeof(struct qid),
KMALLOC_WAIT);
if (waserror()) {
if (alloc && wq->clone != NULL)
cclose(wq->clone);
kfree(wq);
poperror();
return NULL;
}

that cclose(wq->clone) is cleaning up things done later, not a thing
done before. but the kfree(wq) is cleaning up a thing done before,
not a thing done later.

it's used for both types of cleanup. the 'later' stuff is for "i mucked
with something, now i need to set up back" and the 'before' stuff is
"since i'm calling nexterror() or otherwise returning, i need to undo
things since my last waserror (or function call)".

Perhaps the 'later' style is what is responsible for the volatile flag
on alloc, or maybe we shouldn't be doing the 'later' style at all, but
I'd have to think through a few use cases to know for sure.

Barret

ron minnich

unread,
Oct 19, 2015, 6:51:39 PM10/19/15
to aka...@googlegroups.com
Both Charles and jmk confirm Davide's interpretation. That's good enough for me :-)

ron

Davide Libenzi

unread,
Oct 19, 2015, 7:36:53 PM10/19/15
to Akaros
Unfortunately this is one of the problems with setjmp/longjmp code.
In order to be done safely, you'd have to tell GCC to discard any sort of optimizations across it (neither returns_twice nor asm("":::"memory") do it), or be very careful of conditions like the one above.
And even if you actually have a way to tell it, you'd narrow the GCC optimization window to only span among waserror calls, reducing its effectiveness.



ron minnich

unread,
Oct 20, 2015, 12:19:01 AM10/20/15
to aka...@googlegroups.com
On Mon, Oct 19, 2015 at 4:36 PM 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:


Unfortunately this is one of the problems with setjmp/longjmp code.
In order to be done safely, you'd have to tell GCC to discard any sort of optimizations across it (neither returns_twice nor asm("":::"memory") do it), or be very careful of conditions like the one above.
And even if you actually have a way to tell it, you'd narrow the GCC optimization window to only span among waserror calls, reducing its effectiveness.


I would not want to limit my programming model according to the limitations of a not very good compiler like gcc.  Also let us not forget that no intermediate functions in the call stack  have conditionals on return values when we use error/waserror. It's important to consider the picture outside as well as inside the function.

Minix builds with both gcc and clang because, as Andy puts it, "we wrote it in C, not the gcc dialect of C". That's a good policy. We're following it in Harvey. 

ron

Davide Libenzi

unread,
Oct 20, 2015, 9:40:14 AM10/20/15
to Akaros
I do not quite agree about GCC not being a good compiler.
It is not about conditional return values.
Here GCC optimization (gimplification ☺) saw "alloc" having a single instance within the SSA tree, at the point it was checked, so it thought it to be safe to make assumptions about it.
Maybe the returns_twice attribute should have been enough hint to discard any optimizations on anything after a function with such tagging (I tried even the typical compiler barrier, and did not help here).
Or maybe a further __attribute__((fuggetaboudit)) should be introduced 😀
Clearly bugs like the one reported by Barret can be quite a PITA to trace. Tagging every local variable as volatile, is neither a natural, not a desired thing to do with coding in C.

But the point I was trying to make, is that waserror is killer for compiler optimizations, no matter what compiler you use.
Not only any register has to be spilled, but any assumption on anything before or after the waserror, should not be used after or before it.
Note that I am not trying to change your mind about it. I gave up on that ☺

About the "we wrote it in C, not GCC". Without the special GCC attribute tagging on setjmp() (register clobber, returns twice), any compiler that attempts any sort of optimization, would certainly get it wrong.
We use so many GCC extensions (any sort of attributes, plus any sort of not C99 things, like // comments, variables in the middle of code blocks, array initializations, ...), that porting to any compiler that does not support them would be some major rewrite.
But do not worry about CLang, because today, GCC is so much of a standard, that any new compiler (clang, icc, llvm, ...) which does not support its syntax and extensions, is pretty much dead in the water.



--

ron minnich

unread,
Oct 20, 2015, 1:08:04 PM10/20/15
to aka...@googlegroups.com
I have no objections, if the sense of our community is that we should remove waserror(). But I would still argue that the tradeoffs are not that simple. 

Yes, in a local sense, waserror stymies optimizations. But in the global sense, as in a deep call chain, the overall logic can be simpler as lots of error checking at intermediate layers of this form:
if (err = x()) return err
need not be present. This also means fewer uses of unlikely :-)

I regret that akaros is tied to gcc; there are many of us who have the same regrets for coreboot. I think Andy is right; code to C, not a C compiler. 

That's the direction we've taken on Harvey; removing Ken c dependencies but adding no dependencies on gcc.

ron

Davide Libenzi

unread,
Oct 27, 2015, 10:07:23 PM10/27/15
to Akaros
One more thing on it, I was reasoning today.
Given do() and undo() two functions that "do" and "undo" a given action.
Usual pattern:

do();
if (waserror()) {
    undo();
    nexterror();
}
...

That undo() there, better not be throwing 😀
So basically undo() actions, like C++ destructors, shall not be throwing.




--

Davide Libenzi

unread,
Oct 28, 2015, 10:44:40 AM10/28/15
to Akaros
I just realized this might be related to our current implementation, more than the general Plan9 one.
Our implementation of error() is jump-to-tos (tos == top of stack), while nexterror() does pop-tos-and-jump-to-new-tos.
What happen above, if undo() throws, is that we loop forever, because the waserror() target is till to TOS when undo() throws.
IMO we should have both error() and nexterror() doing pop-tos-and-jump-to-it.

That way, even though throws from within an undo() hierarchy are still evil (you might miss undoing certain actions - if a "do" is composed by many sub-actions), we avoid infinite loops.


ron minnich

unread,
Oct 28, 2015, 11:31:42 AM10/28/15
to Akaros
That's a good point. I think you caught a little thing there we need to get right.

ron

Kevin Klues

unread,
Oct 28, 2015, 5:00:20 PM10/28/15
to aka...@googlegroups.com
> IMO we should have both error() and nexterror() doing pop-tos-and-jump-to-it.

I agree. What are the original plan9 semantics? If we cahnge it to
this, we will likely need to do a full survey of all locations where
we throw an error to make sure aren't breaking anything with this
change.

ron minnich

unread,
Oct 28, 2015, 5:15:03 PM10/28/15
to aka...@googlegroups.com
The original semantics are pop tos and jump, if my quick look at the code is right.

ron

Barret Rhoden

unread,
Oct 28, 2015, 5:38:37 PM10/28/15
to aka...@googlegroups.com
On 2015-10-28 at 21:14 ron minnich <rmin...@gmail.com> wrote:
> The original semantics are pop tos and jump, if my quick look at the
> code is right.

I think the original plan 9 didn't have poperror. We talked about this
a couple years ago when we brought it over. I don't recall the exact
reasons for the way it is now. It might have had something to do with
ERRSTACK and ERRSTACKBASE, which was an Akaros addition.

The rule that differentiates Akaros waserror from Plan 9 is that every
waserror needs a matching poperror. Here's a commit talking about it a
little:

1d9a2fff786b ("Cleans up perrbuf and error documentation")

The original waserror (at least in the repo) looks like:

+#define waserror() setjmp(&(errbuf[0].jmp_buf))
+#define error(x,e) {set_errstr(x); longjmp(&e->jmp_buf, (void *)x);}
+#define nexterror(x) longjmp(&perrbuf->jmp_buf, x)

Anyway, perhaps now that we understand it, it's an artifact of the
implementation and we can get rid of it. ('it' being poperror, and have
waserror do a poperror right after we get back). I'd be careful
though.

Also, if code in a waserror block throws unexpectedly, it might be
better to deadlock in a loop than to silently return. We'll notice if
a core deadlocks and can debug it. Not so with the partial unwind.

Barret

Davide Libenzi

unread,
Oct 28, 2015, 5:39:58 PM10/28/15
to Akaros
We still need poperror() for cleaning stuff on the success path.


ron minnich

unread,
Oct 28, 2015, 6:09:32 PM10/28/15
to Akaros
no, plan 9 has poperror, but it's simple, it decrements a "stack" pointer

ron

Barret Rhoden

unread,
Oct 28, 2015, 6:18:51 PM10/28/15
to aka...@googlegroups.com
On 2015-10-28 at 14:39 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> We still need poperror() for cleaning stuff on the success path.

Ah right, so it's not as easy as getting rid of it.

That also might be why we have it and use it everywhere (both waserror
and !waserror). Instead of saying you needed a poperror in one branch
and not the other (for places that didn't have a nexterror, like in
sysfile.c), we opted to explicitly need it.

I think there was also something about Plan 9's implementation of
waserror didn't require the ERRSTACKs. Perhaps there was a fixed
number of errbufs in Plan 9, and we didn't like that so we had each
function bring its own errbuf (in the form of the ERRSTACK macro).
There might have been other differences that led to the poperror
decision.

Or maybe we were lucky we got it working at all, then didn't go back to
it. =)


ron minnich

unread,
Oct 28, 2015, 6:42:23 PM10/28/15
to aka...@googlegroups.com
you don't need it in plan 9 because plan 9 has an a per-process array, errlab, that is used by waserror/poperror. We don't have that per-process kernel structure so had to use the ERRSTACK.

ron

Davide Libenzi

unread,
Oct 28, 2015, 7:12:40 PM10/28/15
to Akaros
Any reason why we did not have errbuf in struct kthread being an array/stack, so no ERRSTACK was needed?
Push/pop would have been faster as well, since all they had to is incr/decr counters and check overflow.

Barret Rhoden

unread,
Oct 28, 2015, 8:35:01 PM10/28/15
to aka...@googlegroups.com
On 2015-10-28 at 16:12 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> Any reason why we did not have errbuf in struct kthread being an
> array/stack, so no ERRSTACK was needed?
> Push/pop would have been faster as well, since all they had to is
> incr/decr counters and check overflow.

I think it was because we didn't want a fixed size array of errbufs in
the kthread.

Davide Libenzi

unread,
Oct 28, 2015, 9:14:16 PM10/28/15
to aka...@googlegroups.com
I think we can do better than static array if that is a problem.
That would still get rid of errstack.

ron minnich

unread,
Oct 28, 2015, 9:29:06 PM10/28/15
to aka...@googlegroups.com
I would love to get rid of errstack.

Davide Libenzi

unread,
Oct 29, 2015, 1:13:44 AM10/29/15
to aka...@googlegroups.com
I'll give it a shot. Barrett if you are OK, just ignore the error optimization code review for now. 
There is still the open question about infinite loop or missed undo ops, which can be eventually on top of the one talked here. 

Barret Rhoden

unread,
Oct 29, 2015, 10:05:11 AM10/29/15
to aka...@googlegroups.com
On 2015-10-28 at 22:13 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> I'll give it a shot. Barrett if you are OK, just ignore the error
> optimization code review for now. There is still the open question
> about infinite loop or missed undo ops, which can be eventually on
> top of the one talked here.

As far as ERRSTACK goes, the main benefit was not having a large
structure in the kthread and not requiring dynamic memory allocation.

If we can get rid of ERRSTACK and still have nothing but a pointer in
the kthread, then I'll be happy.

Are you thinking about using alloca or something?

And regarding the infinite loop vs missed undo ops, we can talk about
that after the errbuf stuff. =)

Barret

Davide Libenzi

unread,
Oct 29, 2015, 11:01:56 AM10/29/15
to Akaros
Possible solutions:

1) A simple array. A 4KB array, given jmpbuf being 24 bytes, buys us about 170 deep call chains. It's unlikely a correctly behaving kernel goes even close to that. We might blow up the kernel stack before we get to 170 call deep.

2) Allocated vector, with growth handling code.



ron minnich

unread,
Oct 29, 2015, 11:08:33 AM10/29/15
to Akaros
The plan9 stack was 64 elements deep and in 25 years that's never had an overflow.

ron

Barret Rhoden

unread,
Oct 29, 2015, 12:16:29 PM10/29/15
to aka...@googlegroups.com
On 2015-10-29 at 15:08 ron minnich <rmin...@gmail.com> wrote:
> The plan9 stack was 64 elements deep and in 25 years that's never had
> an overflow.

I recall seeing this and opting for the dynamic approach.

I would like to keep the struct kthread very small so that I can put it
on the top of the stack at some point, so that the entirety of the
kthread is a page. We talked about this several years ago, with the
intent being to ease pressure on the allocator, and I don't want to
make this more difficult.

Allocating a buffer, whether it's 64 entries or 4KB worth or whether
it's in the kthread or a separate alloc, will have the same issue:
making kthread allocation more expensive.

Anyways, I don't see why the current code needs to change.

Barret

ron minnich

unread,
Oct 29, 2015, 12:17:48 PM10/29/15
to aka...@googlegroups.com
Given there's no evidence the current code is wrong, maybe we can file a BUG on it for later looking at and move on for now?

This is a good discussion however.

Davide Libenzi

unread,
Oct 29, 2015, 3:45:15 PM10/29/15
to Akaros
IMO a, say, 2048 bytes array (~80 deep stack) is small enough to be on the stack, and big enough to never get to EOS.
IIRC the max callstack depth measured on Linux was like 36.
I am not sure about the AVG life of a kthread, but a kmalloc() at start, and a kfree() at end, should not waste many cycles.
This would get rid of ERRSTACK() and makes {was,next,pop}error() execution faster.



On Thu, Oct 29, 2015 at 9:17 AM, ron minnich <rmin...@gmail.com> wrote:
Given there's no evidence the current code is wrong, maybe we can file a BUG on it for later looking at and move on for now?

This is a good discussion however.

--

Barret Rhoden

unread,
Oct 29, 2015, 5:15:58 PM10/29/15
to aka...@googlegroups.com
On 2015-10-29 at 12:45 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> IMO a, say, 2048 bytes array (~80 deep stack) is small enough to be
> on the stack, and big enough to never get to EOS.

We only use 4KB stacks. You can build with a CONFIG variable to get
2-page stacks, which is useful for debugging some parts of the Plan 9
code that use too much stack. Or at least they used to, til I changed
them.

> IIRC the max callstack depth measured on Linux was like 36.
> I am not sure about the AVG life of a kthread, but a kmalloc() at
> start, and a kfree() at end, should not waste many cycles.

You're right that it's probably not a big deal. Likewise, the current
approach to finding a kthread's pointer is to do a memory deref from
pcpui->cur_kthread. That's not too much slower than doing math on the
stack pointer. (maybe a cache miss, if we're unlucky). I'd like to not
limit ourselves from those optimizations in the future though. Either
way, we don't need to do it now.

> This would get rid of ERRSTACK() and makes {was,next,pop}error()
> execution faster.

I didn't find the ERRSTACK to be too painful - more of a minor
inconvenience. =)

Barret
Reply all
Reply to author
Forward
0 new messages