Code Review - Add support for libc's nanosleep, usleep, and sleep calls

2 views
Skip to first unread message

Kevin Klues

unread,
Nov 19, 2015, 9:18:49 PM11/19/15
to aka...@googlegroups.com
The changes in this request can be viewed online at:

https://github.com/brho/akaros/compare/63c72d8...6b9f50f

The following changes since commit 63c72d8a58dc11457604164207075d6351e9c0bb:

Create ak-kill-9pserver.sh that kills ufs server (2015-11-18 10:37:28 -0800)

are available in the git repository at:

g...@github.com:klueska/akaros sleep-support

for you to fetch changes up to 6b9f50f49f873c663c29b6f2234af23b99de4125:

A utest to test nanosleep, sleep, and usleep (2015-11-19 18:15:46 -0800)

----------------------------------------------------------------
Kevin Klues (3):
Add the nanosleep syscall
Add support for glibc sleep, nanosleep, and usleep
A utest to test nanosleep, sleep, and usleep

kern/include/ros/bits/syscall.h | 1 +
kern/src/syscall.c | 50 +++++++++
.../glibc-2.19-akaros/sysdeps/akaros/nanosleep.c | 34 ++++++
.../glibc-2.19-akaros/sysdeps/akaros/sleep.c | 75 -------------
.../glibc-2.19-akaros/sysdeps/akaros/usleep.c | 25 +++++
user/utest/sleep.c | 120 +++++++++++++++++++++
6 files changed, 230 insertions(+), 75 deletions(-)
create mode 100644
tools/compilers/gcc-glibc/glibc-2.19-akaros/sysdeps/akaros/nanosleep.c
delete mode 100644
tools/compilers/gcc-glibc/glibc-2.19-akaros/sysdeps/akaros/sleep.c
create mode 100644
tools/compilers/gcc-glibc/glibc-2.19-akaros/sysdeps/akaros/usleep.c
create mode 100644 user/utest/sleep.c

Davide Libenzi

unread,
Nov 19, 2015, 11:52:10 PM11/19/15
to Akaros
Didn't the compiler give you a possible uninitialized use (of tsc) at line 415 of syscall.c?
That longjmp based waserror, the more I think, the more it scares me.
IMO all the local variables accessed on the longjmp return path, should be made volatile.
No matter how hard you tell GCC to clobber registers and memory.
Here it is an example, compiled with GCC 4.8.4, which uses returns_twice and all the trimmings on setjmp():
#include <setjmp.h>
jmp_buf env;
extern int goo(int);

int foo(int l)
{
	int k = goo(l);

	if (setjmp(env)) {
		return k;
	}
	k += 11;
	goo(9);

	return k;
}

$ gcc -O2 -S

foo:
	pushq	%rbx
	subq	$16, %rsp
	call	goo
	movl	$env, %edi
	movl	%eax, 12(%rsp) ; Stores goo(l) call result (k) to rsp[12]
	call	_setjmp
	testl	%eax, %eax     ; Checks setjmp() return code
	movl	12(%rsp), %edx ; Loads back rsp[12] (k) to %edx
	jne	.L2            ; L2 is the longjmp() return path
	leal	11(%rdx), %ebx ; Stores k+11 to %ebx. L2 will never see this +11 operation
	movl	$9, %edi
	call	goo
	movl	%ebx, %edx
.L2:
	addq	$16, %rsp
	movl	%edx, %eax     ; Loads the original k (simply the result of goo(l), from the %edx load after testl)
	popq	%rbx
	ret

Pretty messed up. Easily verifiable by putting goo() in a separate file, and printing results of foo() calls.
That is Kevin what you do with tsc = read_tsc() at line 421.
IMO, all the local variable accessed within a waserror() return path, and (re)assigned after the waserror(), must be marked as volatile.
Or bad things happen.

One more thing Kevin about the timing tests.
These can become jittery with high load, if the test run on a node where the test app is not the only thing running.
Ask me how I know ... (Jenkins build going red at 0300 in the morning) 😀
OTOH some testing needs to be there. What I resorted in the past, was to run that test at high priority, do the time();sleep(n);time() sequence 5 times, and take the lowest of the deltas as sleep time.
Not sure we have to care of these cases for Akaros.





--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Klues

unread,
Nov 20, 2015, 12:28:18 PM11/20/15
to aka...@googlegroups.com
On Thu, Nov 19, 2015 at 8:52 PM, 'Davide Libenzi' via Akaros
<aka...@googlegroups.com> wrote:
> Didn't the compiler give you a possible uninitialized use (of tsc) at line
> 415 of syscall.c?

I didn't get a warning, though looking at the code, I clearly should have.

> That longjmp based waserror, the more I think, the more it scares me.
> IMO all the local variables accessed on the longjmp return path, should be
> made volatile.

I've never been 100% comfortable with waserror() myself, but the
syscall abort mechanism uses error() under the hood, so I have to use
waserror() here in order to detect an abort and compensate for it. I
ran what I have by Ron, and he seemed to think it was OK, but it's
clear there are problems with accessing the computed tsc value, as you
point out below.

> No matter how hard you tell GCC to clobber registers and memory.

There is no need for a contrived example. If you look at the assembly
generated for nanosleep itself, it is clearly wrong. Below is the
code inside the waserror() block. It never even tries to do the
subtraction of the original tsc (presumably because it decides it's
value must be 0 at this point, which is a little weird because I never
initialized it to 0).

ffffffffc204dee4: 0f 31 rdtsc
ffffffffc204dee6: 48 89 d7 mov %rdx,%rdi
ffffffffc204dee9: 89 c0 mov %eax,%eax
ffffffffc204deeb: 48 8d 75 a8 lea -0x58(%rbp),%rsi
ffffffffc204deef: 48 c1 e7 20 shl $0x20,%rdi
ffffffffc204def3: 48 09 c7 or %rax,%rdi
ffffffffc204def6: e8 c5 45 00 00 callq
ffffffffc20524c0 <tsc2timespec>

Contrast this with when I make it volatile:

ffffffffc204dee4: 0f 31 rdtsc
ffffffffc204dee6: 48 8b 4d 90 mov -0x70(%rbp),%rcx
ffffffffc204deea: 48 89 d7 mov %rdx,%rdi
ffffffffc204deed: 89 c0 mov %eax,%eax
ffffffffc204deef: 48 c1 e7 20 shl $0x20,%rdi
ffffffffc204def3: 48 8d 75 a8 lea -0x58(%rbp),%rsi
ffffffffc204def7: 48 09 c7 or %rax,%rdi
ffffffffc204defa: 48 29 cf sub %rcx,%rdi
ffffffffc204defd: e8 de 45 00 00 callq
ffffffffc20524e0 <tsc2timespec>

It clearly loads the tsc value into %rcx and subtracts it.

> IMO, all the local variable accessed within a waserror() return path, and
> (re)assigned after the waserror(), must be marked as volatile.
> Or bad things happen.

Whether it's true in general or not, it's clearly true here. So it's
good that we went down this path of discussion. However, the better
answer in this specific case is to move the read_tsc() call above the
waserror() block, as that provides better timing accuracy anyway.
When this is done, we get the expected output because the
'returns_twice' attribute on the setjmp()in the waserror() macro
provides the required compiler memory barrier for us.

> One more thing Kevin about the timing tests.
> These can become jittery with high load, if the test run on a node where the
> test app is not the only thing running.
> Ask me how I know ... (Jenkins build going red at 0300 in the morning)
> OTOH some testing needs to be there. What I resorted in the past, was to run
> that test at high priority, do the time();sleep(n);time() sequence 5 times,
> and take the lowest of the deltas as sleep time.
> Not sure we have to care of these cases for Akaros.

By default, all of the utests link in the pthread library, which will
turn the process into an mcp before the test runs. Since mcps have
dedicated access to their cores, it's probably not much of an issue on
Akaros to run at "high priority" compared to Linux. That said, running
each test 5 times and taking the lowest delta seems reasonable. I've
updated the tests accordingly.

Davide Libenzi

unread,
Nov 20, 2015, 12:45:51 PM11/20/15
to Akaros
It not much even about volatile. Yes, volatile can work, but this is beyond what volatile is meant for.

void foo(void) {
    volatile int k;

    k = something();
    if (setjmp()) {
A:
        use(k);
        ...
    }
B:
    k = k + 9;
C:
    something_which_might_longjmps();
}

Here A and B, for GCC, are two disjoint domains.
At the time A is possibly executing, B cannot possibly have run, so yes, the use(k) will reload k from stack storage, but nothing tells GCC to flush k, a local variable, no more accessed after C, to storage.
The original meaning of volatile, meant, always reload from storage.
I am not sure if adding the "always flush to backing storage after modify, even if no more users follow", is a wise forward looking assumption.
IMO, on top of volatile, it would be wise not to use, within the waserror section, variables whose values are written after it.


    

ron minnich

unread,
Nov 20, 2015, 12:53:46 PM11/20/15
to Akaros
Yeah, my brain was clearly in neutral when I looked at that code.

ron

Davide Libenzi

unread,
Nov 20, 2015, 1:05:21 PM11/20/15
to Akaros
Let me rephrase my last statement:

"IMO, on top of volatile, it would be wise not to assume, within the waserror section, that the values written into variables after it, will stick."

IOW, the code within a waserror section should be as dumb as possible, using only assign-once variables whose assign happened before the waserror using them.
Typical example:

{
    res r;

    r = res_alloc();
    if (waserror()) {
        res_free(r);
        nexterror();
    }
    // from here 'r' is to be assumed read only

}




ron minnich

unread,
Nov 20, 2015, 1:13:33 PM11/20/15
to Akaros
I think you're absolutely right.

Barret Rhoden

unread,
Nov 23, 2015, 3:58:26 PM11/23/15
to aka...@googlegroups.com
On 2015-11-19 at 20:52 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:
> IMO, all the local variable accessed within a waserror() return path,
> and (re)assigned after the waserror(), must be marked as volatile.
> Or bad things happen.

absolutely. the canonical example is in devwalk:

struct walkqid *devwalk(struct chan *c,
struct chan *nc, char **name, int nname,
struct dirtab *tab, int ntab, Devgen * gen)
{
ERRSTACK(1);
int i, j;
volatile int alloc; /* to keep waserror from optimizing this out */
struct walkqid *wq;


that was a case where bad things were happening.


> Let me rephrase my last statement:
>
> "IMO, on top of volatile, it would be wise not to assume, within the
> waserror section, that the values written into variables after it,
> will stick."

that's a great rule.

barret


Kevin Klues

unread,
Nov 24, 2015, 12:10:30 PM11/24/15
to aka...@googlegroups.com
To be clear, what I patched up last Friday shouldn't run into this
issue and it's ready for review.
> --
> You received this message because you are subscribed to the Google Groups "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
> To post to this group, send email to aka...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
~Kevin

Barret Rhoden

unread,
Nov 24, 2015, 3:04:33 PM11/24/15
to aka...@googlegroups.com
Hi -

On 2015-11-19 at 18:18 Kevin Klues <klu...@gmail.com> wrote:
> The changes in this request can be viewed online at:
>
> https://github.com/brho/akaros/compare/63c72d8...6b9f50f
>
> The following changes since commit
> 63c72d8a58dc11457604164207075d6351e9c0bb:
>
> Create ak-kill-9pserver.sh that kills ufs server (2015-11-18
> 10:37:28 -0800)
>
> are available in the git repository at:
>
> g...@github.com:klueska/akaros sleep-support

One minor thing:

> From 85efbd5dc5142353d8fd824e36f516e99cf76424 Mon Sep 17 00:00:00 2001
> From: Kevin Klues <klu...@cs.berkeley.edu>
> Date: Thu, 19 Nov 2015 18:07:45 -0800
> Subject: Add the nanosleep syscall
>
> +static int sys_nanosleep(struct proc *p,
> + const struct timespec *req,
> + struct timespec *rem)
> +{

> + usec += kreq.tv_nsec ? (kreq.tv_nsec - 1)/1000 + 1 : 0;

We have a helper for this, which I think does what you want:

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

Barret

Kevin Klues

unread,
Nov 24, 2015, 3:21:01 PM11/24/15
to aka...@googlegroups.com
This only works in cases where we don't have to worry about overflow
for (n) (such as this because kreq.tv_nsec is maxed out at 999999999).
If ((n) + (d) -1) were to overflow, this would cause problems.

Either way, i've updated it to make this call and pushed it.

>
> Barret

Barret Rhoden

unread,
Nov 24, 2015, 3:34:34 PM11/24/15
to aka...@googlegroups.com
On 2015-11-24 at 12:20 Kevin Klues <klu...@gmail.com> wrote:
> >> + usec += kreq.tv_nsec ? (kreq.tv_nsec - 1)/1000 + 1 : 0;
> >
> > We have a helper for this, which I think does what you want:
> >
> > #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
>
> This only works in cases where we don't have to worry about overflow
> for (n) (such as this because kreq.tv_nsec is maxed out at 999999999).
> If ((n) + (d) -1) were to overflow, this would cause problems.
>
> Either way, i've updated it to make this call and pushed it.

Thanks, merged to master at fafa68356f11..aecd50eb71c2 (from, to]

You can see the entire diff with 'git diff' or at
https://github.com/brho/akaros/compare/fafa68356f11...aecd50eb71c2

---------
btw, these tests take a long time:

/ $ /bin/tests/utest/sleep
<-- BEGIN_USERSPACE_SLEEP_TESTS -->
PASSED [test_nanosleep](35.622515s)
PASSED [test_usleep](15.017327s)
PASSED [test_sleep](75.003688s)
<-- END_USERSPACE_SLEEP_TESTS -->
/ $

is that going to mess with the timeout you set up with jenkins? (not
sure if that was a thing or not).

barret
Reply all
Reply to author
Forward
0 new messages