setitimer vs. threads: SIGALRM returned to which thread? (process master or individual child)

Frantisek Rysanek

unread,

Apr 9, 2010, 4:10:02 PM4/9/10

to

Dear everyone,

I hope I'm not way too much off topic in this list... specifically, I
hope the issue takes place in the kernel, as opposed to the user-
space part of NPTL that ships with libc, distroes etc.
At the same time, I feel shame for asking this noob question in the
very LKML - except that there doesn't seem to be a better place to
ask... :->

Some years ago, I've written a couple programs that tend to use the
setitimer() syscall in a threaded environment, making use of its
special property at the time: setitimer() had per-thread granularity.
It used to deliver a SIGALRM from the timer to the particular thread
that called setitimer(). I believe that was around RH8 to Fedora 5.

Recently I've recompiled the programs on a newer distro (Fedora 10)
and voila: setitimer() now yields a SIGALRM to the program's master
thread, no matter what child thread called setitimer()...

Based on further reading, I assume this is related to making the NPTL
more POSIX-compliant. The latter is a correct POSIX behavior, the
former was not. See "man pthreads", and under the NPTL heading,
find a note saying
"Threads do not share interval timers (fixed in kernel 2.6.12)."

Yes, it used to be quite a relief to have Linux do the management of
timers for me. Now I have two options to choose from:
1) write my own "timer queueing" (timekeeping) code to order the
timers for me in the master thread
2) find another function, similar to setitimer(), that would function
the way setitimer() used to work in the old days...

Obviously option #2 is much easier for me to abuse :-)
Such as, does select() work in the desired per-thread way?
In the app that I'm trying to update right now, I have a serial
device open per thread, and I need to detect character timeouts
(frame breaks).
But I have other apps where I have a *myriad* of stand-alone timers,
not related to a "file descriptor like" device of any kind,
generating "spurious events" for me, used to propel a bunch of
threads doing some polling on various dumb "networked" devices
(external bus slaves)...

For a moment I was wondering how complex the relevant kernel patch
was, how difficult it would be to revert it - but then again such a
revert might disrupt various other pieces of user-space code in my
distro, so it's probably not such a good idea anyway :-) Also, if I
resort to patching my kernel, it makes my user-space code fairly non-
portable to other people's machines. Let alone the bulk of code
evolution in Linux kernel timekeeping and process management since
2.6.12, overlaying the original patch.
AIX appears to have ITMER_REAL_TH [sob]. Not that I'm going to try
AIX for this particular reason :-)

Wouldn't it be in fact more straightforward and "cheaper" (in terms
of processing overhead) to have the timers thread-aware? If I just
call a setitimer() in each thread, that requires some number of
ioctl() calls. Now if I need to do my own timekeeping (event
queueing) in user space, I'll probably need to call getitimer() or
gettimeofday() ahead of every setitimer(), every time a thread needs
to set a timer. Not sure about the required number of pointer
indirections in the kernel for either case :-)

I understand that POSIX compliance is a good thing, for portability
reasons. At the same time, resorting to per-process granularity of
timers somehow "feels backwards" - from thread awareness, back to the
old "no threads" UNIX world. It seems to remind me of the occasional
debate whether GCC extensions to standard C are a good thing to use,
or whether they should be avoided...

I haven't found much debate about this "timers vs. threads
granularity" point in mailinglist archives or on the web.
Any further hints/pointers/kicks in the right direction/recommended
reading are welcome :-)
If you've read this far, thanks for your time...

Frank Rysanek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Frantisek Rysanek

unread,

Apr 10, 2010, 3:30:02 AM4/10/10

to

On 9 Apr 2010 at 23:26, bill o gallmeister wrote:
>
> Check out timer_create() rather than setitimer().
>
Oh I *see* :-) There seems to be a way to deliver an event to a
specific thread. Just a quick guess, haven't validated this by a
compiler:

============ PSEUDOCODE SNIPPET ==========
struct my_thr_data
{
pthread_t ID; /* to be set upon pthread_create() */
/* ...further members... */
};

void* my_fn(void* my_user_data)
{
pthread_kill( ((my_thr_data*)my_user_data)->ID, SIGALRM);
}

struct my_thr_data this_thread;
timer_t my_timer;
struct sigevent my_event =
{
sigev_notify: SIGEV_THREAD,
sigev_notify_function: my_fn,
sigev_value.sival_ptr: &this_thread,
sigev_notify_attributes: NULL
}

timer_create(CLOCK_REALTIME, &my_event, &my_timer);

/* by now we're set up, but the timer doesn't tick yet. */

/* someplace later in the code: */
timer_settime(my_timer, ... );

=========== /PSEUDOCODE SNIPPET ==============
thank you :-)

Andi Kleen

unread,

Apr 11, 2010, 5:00:02 PM4/11/10

to

"Frantisek Rysanek" <Frantise...@post.cz> writes:

> Yes, it used to be quite a relief to have Linux do the management of
> timers for me. Now I have two options to choose from:
> 1) write my own "timer queueing" (timekeeping) code to order the
> timers for me in the master thread
> 2) find another function, similar to setitimer(), that would function
> the way setitimer() used to work in the old days...

POSIX timers (timer_create et.al.) allow specifying the signal.

So if you use custom RT signals for each threads and block them in the
threads you don't want them it should work. This would limit the
maximum number of threads though because there's only a limited
range of RT signals.

There are probably other ways to do this too, e.g. with some clever
use of timerfd_create in recent kernels.

Or you could overwrite the clone in the thread library to not
set signal sharing semantics. This might have other bad side effects
though.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Davide Libenzi

unread,

Apr 11, 2010, 6:10:02 PM4/11/10

to

On Sun, 11 Apr 2010, Andi Kleen wrote:

> "Frantisek Rysanek" <Frantise...@post.cz> writes:
>
> > Yes, it used to be quite a relief to have Linux do the management of
> > timers for me. Now I have two options to choose from:
> > 1) write my own "timer queueing" (timekeeping) code to order the
> > timers for me in the master thread
> > 2) find another function, similar to setitimer(), that would function
> > the way setitimer() used to work in the old days...
>
> POSIX timers (timer_create et.al.) allow specifying the signal.
>
> So if you use custom RT signals for each threads and block them in the
> threads you don't want them it should work. This would limit the
> maximum number of threads though because there's only a limited
> range of RT signals.
>
> There are probably other ways to do this too, e.g. with some clever
> use of timerfd_create in recent kernels.

Definitely timerfd allows you to handle the timer event wherever you
like, independently from signals. Much much simpler routing.
But if you need to be compatible with multiple unixes, of even older linux
kernel, you are out of luck with timerfd.

- Davide

Thomas Gleixner

unread,

Apr 11, 2010, 6:20:01 PM4/11/10

to

On Sun, 11 Apr 2010, Andi Kleen wrote:

> "Frantisek Rysanek" <Frantise...@post.cz> writes:
>
> > Yes, it used to be quite a relief to have Linux do the management of
> > timers for me. Now I have two options to choose from:
> > 1) write my own "timer queueing" (timekeeping) code to order the
> > timers for me in the master thread
> > 2) find another function, similar to setitimer(), that would function
> > the way setitimer() used to work in the old days...
>
> POSIX timers (timer_create et.al.) allow specifying the signal.
>
> So if you use custom RT signals for each threads and block them in the
> threads you don't want them it should work. This would limit the
> maximum number of threads though because there's only a limited
> range of RT signals.
>
> There are probably other ways to do this too, e.g. with some clever
> use of timerfd_create in recent kernels.
>
> Or you could overwrite the clone in the thread library to not
> set signal sharing semantics. This might have other bad side effects
> though.

Nonsense. Just use the right flags when creating the posix
timer. posix timers support per thread delivery of a signal, i.e. you
can use the same signal for all threads.

sigev.sigev_notify = SIGEV_THREAD_ID | SIGEV_SIGNAL;
sigev.sigev_signo = YOUR_SIGNAL;
sigev.sigev_notify_thread_id = gettid();
timer_create(CLOCK_MONOTONIC, &sigev, &timer);

That signal for that timer will not be delivered to any other thread
than the one specified in sigev.sigev_notify_thread_id as long as that
thread has not exited w/o canceling the timer.

Thanks,

tglx

Frantisek Rysanek

unread,

Oct 17, 2010, 4:50:02 PM10/17/10

to

Dear Everyone,

apologies for following up on a thread after half a year :-)
I'm not gonna pretend it took me half a year to discover the points
presented below - I just got buried by a dumptruck of other stuff,
then did my homework, and then couldn't find the time to post my
follow-up...
Before this LKML thread, I couldn't find this sort of information
anywhere (anywhere except for the source code itself). Maybe I didn't
look into enough places where Google cannot see... anyway, I guess
it's worth leaving a trace about the things I've learned, at a
relevant place for the cyber crawlers to find it - for the benefit of
future wondering apprentices who come after me.
So here it goes...

On 12 Apr 2010 at 0:09, Thomas Gleixner wrote:
>
> Just use the right flags when creating the posix
> timer. posix timers support per thread delivery of a signal, i.e. you
> can use the same signal for all threads.
>
> sigev.sigev_notify = SIGEV_THREAD_ID | SIGEV_SIGNAL;
> sigev.sigev_signo = YOUR_SIGNAL;
> sigev.sigev_notify_thread_id = gettid();
> timer_create(CLOCK_MONOTONIC, &sigev, &timer);
>
> That signal for that timer will not be delivered to any other thread
> than the one specified in sigev.sigev_notify_thread_id as long as that
> thread has not exited w/o canceling the timer.
>

Thanks for that gem of ultra-compact yet precise information :-)
It does work precisely as advertised after all - except that for me,
it was not without further homework.

I have to confess that when writing code in user space, I'm a bit
ignorant of details - such as, whether it's bare kernel syscalls or
some higher-level glibc abstraction that I'm talking to.
This snippet gave me a neat lesson in that particular "grey" area :-)
Well I shouldn't be surprised, if I ask kernel people, that I obtain
a response in kernel terms :-)

I first pasted your code snippet into my program verbatim.
Followed by some timer_settime() of course...
It took a little bit of massage to get it to compile - such as, glibc
didn't offer me a member called sigev_notify_thread_id, but I figured
(by analogy with other macros in the relevant header) that it was
pointing to a member called _tid in a union inside struct sigevent,
as declared in /usr/include/bits/siginfo.h. I merely added
#define sigev_notify_thread_id _sigev_un._tid
just below my #defines on top of the relevant C file.
Next, I couldn't find gettid() anywhere within the libraries (nothing
to link to in user space) - so I decided to instead use
* the pthread_t provided by pthread_create(). *
After all, in LinuxThreads in the old days, pthread_t and pid_t were
the same.

Guess what happened :-)
At a first run, I got an immediate SIGSEGV.

What ho? Let's ask GDB for some advice...
Hmm... timer_settime() segfaulting? Why? Old libc?
Tried compiling on a much newer distro, with the same result.
Google suggested that I was submitting a 0 for the timer_t...
How could that happen? Well maybe I should check the return value
from timer_create(), and try perror(errno), right?
Uh oh, that was correct, timer_create() returns EINVAL.
Why is that?
(...shuffling the various parameters, trying CLOCK_MONOTONIC instead
of CLOCK_REALTIME, googling some more...)
Found an old e-mail thread from back in 2005, suggesting in vague
terms that timer_create(SIGEV_THREAD_ID) really still woked with
PID's, rather than TID's, and that the per-thread logic is somehow
completely bogus and void... so, reluctantly, I tried
_tid = getpid() instead of "pthread_t my_thr_ID". That worked to the
extent that timer_create() didn't yell and timer_settime() did set up
a timer - except that of course the SIGALARM got again delivered to
the process master thread. Ah well... now, why on earth is there
something called a _tid, embedded in the struct sigevent?
Time to take a dive into more source code, right?

I happened to have the source code of Libc 2.6 lying around, so I
looked at that. And Linux 2.6.35.7.
The code did try my mediocre coding & code reading skills, but
finally it started to dawn on me. I tried further googling more about
the precise mapping between NPTL and the Linux kernel threading
arrangement, and found nothing other than the usual PR factoids (N:1
vs. M:N vs. 1:1) - which meant I really had to find out the hard way
= by reading the code :-)

It turns out that:

NPTL (a part of Libc in the user space) uses something called "struct
pthread" internally. It is declared in some private header inside the
glibc source code (namely nptl/descr.h), but not in the public
headers that end up in the systemwide /usr/include. The "pthread_t"
that gets passed around among the various pthread_create() et al.
library functions, although it looks like an opaque "unsigned int" or
what on the outside, is really assigned the value of a
struct pthread *
(pointer to the NPTL-private pthread struct). Outside of the glibc
source tree, you don't know that such a struct exists, and you have
no chance to access its internal members, such as the one called
pid_t tid.

Within the kernel, it seems that the processes or threads behind the
NPTL's threading model are called just a "task". Each task is
described by an instance of a uniform "struct task_struct", declared
in $KERNEL_SRC/include/linux/sched.h. Each task has its own pid (and
this one is a genuine integer). Interesting point: struct task_struct
contains a member called
struct task_struct* group_leader;

And that's it. In the kernel space, there's a group of mostly equal
tasks who have a leader. This group and their leader correspond to a
user-space NPTL process containing several lightweight threads. The
kernel-space PID of the task group leader is equal to the user-space
PID, used to refer to the whole multi-threaded process.

Okay... so how do we get our hands on the back-end "tid" (really a
PID in kernel vocabulary) of a single user-space thread? We already
know that we need a function called gettid(). It turns out that this
is a syscall, implemented in the kernel, even known to glibc, but not
exported by glibc to the user space. In the kernel space,
interestingly this syscall is implemented in a file called
kernel/timer.c (I'd expect it in kernel/pid.c or maybe
kernel/sched.c) - well maybe the choice of translation unit hints at
the practical use of this syscall :-) If you follow gettid(), through
an inline function called task_pid_vnr(), all the way to
__task_pid_nr_ns(PIDTYPE_PID), you'll find out that indeed this stack
of calls will retrieve task->pid (and the function __task_pid_nr_ns
also mentions task->group_leader in a different context).

So essentially in the user space (using glibc) you have a choice
whether to
1) copy and paste the declaration of "struct pthread" from your glibc
version's source code into your program, or "publish" the relevant
header, or some such
2) call the gettid() syscall (in)directly.

I chose the latter option. In my program, I added
#include <sys/syscall.h>
#define gettid() syscall(__NR_gettid)
...all of the gears can be found in the public headers.
This way of invoking a syscall by the generic syscall() function and
the integer syscall number, is called an "indirect" invocation of a
syscall, and can only be used for syscalls with simple argument sets,
which luckily is the case of gettid().

So yes, I can have my cake and eat it too.
I can deliver timer-based SIGALRM directly to a particular user-space
thread, without "rethrowing" via the process master or another
dedicated "signal dispatch" thread.
Only to get my hands on the "tid" (really the PID of a kernel-space
task corresponding to my user-space thread), I have to call a Linux
syscall fairly explicitly. It feels like less of a sin than accessing
some private (however obvious) struct under the hood of glibc/NPTL.
Calling gettid() directly doesn't seem "posixly correct", but it
would appear that neither is SIGEV_THREAD_ID (what use would that be,
without a possibility to get your hands on the internal TID?)
The important point for me is that it gets the job done, over a wide
range of glibc and kernel versions.

It's been an exciting adventure. The kernel guts around pid.c and
sched.c are a fantastic read - the code is almost amazingly clean and
straight-forward, split into neat small functions. An interesting
discovery after all the past claims that programming language purity
and beauty doesn't mix well with system-level programming :-)

Thanks for your time and attention...

Frank Rysanek