EAGAIN with pthread_create() on linux (kernel 2.6.x) after only 78 threads

Roberto Nibali

unread,

Nov 30, 2004, 4:59:03 PM11/30/04

to

Hello,

I've got yet another "silly" programme that just doesn't really run as I
want it to and I seem to be a bit out of knowledge on how to debug it.
The problem is that under Linux (2.6.x kernel, 1.5GHz CPU, 1GB RAM) I
can only create about 80 threads before I get an EAGAIN.

Reading the man page for pthread_create() I learn:

EAGAIN not enough system resources to create a process for
the new thread.

EAGAIN more than PTHREAD_THREADS_MAX threads are already
active.

Ok, 78 time a couple of bytes is not really what I call resource
exhaustion and my PTHREAD_THREADS_MAX is set to 16384:

ratz@webphish:~> grep -r PTHREAD_THREADS_MAX /usr/include/* 2>/dev/null
/usr/include/bits/local_lim.h:#define PTHREAD_THREADS_MAX 16384
/usr/include/nptl/bits/local_lim.h:#undef PTHREAD_THREADS_MAX
ratz@webphish:~>

I've written an ugly extension via ifdef to make it run, however I
cannot believe that my system is not capable of creating more than 80
threads. When run under Windows XP there is virtually "no limit" on the
number of threads I can create using this programme.

I've tested it like follows:

ratz@webphish:~> make distclean
rm -f *.o *~ *# core
rm -f matrix_serial matrix_threads
ratz@webphish:~> make matrix_threads
gcc -W -Wall -O -o matrix_threads matrix_threads.c -lpthread
ratz@webphish:~> ./matrix_threads >/dev/null
thread creation: Cannot allocate memory
ratz@webphish:~> make distclean
rm -f *.o *~ *# core
rm -f matrix_threads
ratz@webphish:~> CFLAGS="-DUGLY_WORKAROUND" make matrix_threads
gcc -DUGLY_WORKAROUND -W -Wall -O -o matrix_threads matrix_threads.c
-lpthread
ratz@webphish:~> ./matrix_threads >/dev/null
ratz@webphish:~>

Here's the code

--------------------------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>

#define MIN_REQ_SSIZE 81920
#define ARRAY_SIZE 100
#define FEATURE_SIZE 50

typedef int matrix_t[ARRAY_SIZE][ARRAY_SIZE];

typedef struct {
int id;
int size;
int Arow;
int Bcol;
matrix_t *MA, *MB, *MC;
} package_t;

matrix_t MA, MB, MC;

void mult(int size, int row, int column, matrix_t MA, matrix_t MB,
matrix_t MC){
int position;

MC[row][column] = 0;
for(position = 0; position < size; position++) {
MC[row][column] = MC[row][column] +
(MA[row][position] * MB[position][column]) ;
}
}

void *mult_worker(void *arg) {
package_t *p=(package_t *)arg;

printf("MATRIX THREAD %d: processing A row %d, B col %d\n",
p->id, p->Arow, p->Bcol);

mult(p->size, p->Arow, p->Bcol, *(p->MA), *(p->MB), *(p->MC));

printf("MATRIX THREAD %d: complete\n", p->id);
free(p);
pthread_exit(EXIT_SUCCESS);
//return(NULL);
}

int main(void) {
int size, row, column, num_threads, i;
int res;
pthread_t *threads;
package_t *p;

size = ARRAY_SIZE;

/* one thread will be created for each element of the matrix. */
threads = (pthread_t *)malloc(size*size*sizeof(pthread_t));

/* Fill in matrix values, currently values are hardwired */
for (row = 0; row < size; row++) {
for (column = 0; column < size; column++) {
MA[row][column] = 1;
}
}
for (row = 0; row < size; row++) {
for (column = 0; column < size; column++) {
MB[row][column] = row + column + 1;
}
}

/* Process Matrix, by row, column, Create a thread to process
each element in the resulting matrix*/
num_threads = 0;
for(row = 0; row < size; row++) {
for (column = 0; column < size; column++) {
p = (package_t *)malloc(sizeof(package_t));
p->id = num_threads;
p->size = size;
p->Arow = row;
p->Bcol = column;
(p->MA) = &MA;
(p->MB) = &MB;
(p->MC) = &MC;

res = pthread_create(&threads[num_threads],
NULL,
mult_worker,
(void *) p);
if (res != 0) {
perror("thread creation");
exit(1);
}
#ifdef UGLY_WORKAROUND
if ((num_threads - FEATURE_SIZE) >= 0) {
pthread_join(threads[num_threads - FEATURE_SIZE], NULL);
threads[num_threads - FEATURE_SIZE] = -1;
}
#endif /* UGLY_WORKAROUND */
printf("MATRIX MAIN THREAD: thread %d created\n", num_threads);
num_threads++;
}
}
/* Synchronize on the completion of the element in each thread. */
for (i = 0; i < (size*size); i++) {
#ifdef UGLY_WORKAROUND
if ((int) threads[i] != -1) {
pthread_join(threads[i], NULL);
}
#else
pthread_join(threads[i], NULL);
#endif /* UGLY_WORKAROUND */
printf("MATRIX MAIN THREAD: child %d has joined\n", i);
}
printf("MATRIX MAIN THREAD: The resulting matrix C is;\n");
for(row = 0; row < size; row ++) {
for (column = 0; column < size; column++) {
printf("%5d ",MC[row][column]);
}
printf("\n");
}
pthread_exit(EXIT_SUCCESS);
}
------------------------------------------------

Thanks in advance for any pointers on how to debug or address my issue,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

Loic Domaigne

unread,

Dec 2, 2004, 3:37:23 PM12/2/04

to

Hi Roberto!

> I've got yet another "silly" programme that just doesn't really run as I
> want it to and I seem to be a bit out of knowledge on how to debug it.
> The problem is that under Linux (2.6.x kernel, 1.5GHz CPU, 1GB RAM) I
> can only create about 80 threads before I get an EAGAIN.

Just detach you thread.

> Here's the code

[snip]

> int main(void) {
> int size, row, column, num_threads, i;
> int res;
> pthread_t *threads;
> package_t *p;
>
> size = ARRAY_SIZE;

[snip]

> /* Process Matrix, by row, column, Create a thread to process
> each element in the resulting matrix*/
> num_threads = 0;
> for(row = 0; row < size; row++) {
> for (column = 0; column < size; column++) {
> p = (package_t *)malloc(sizeof(package_t));
> p->id = num_threads;
> p->size = size;
> p->Arow = row;
> p->Bcol = column;
> (p->MA) = &MA;
> (p->MB) = &MB;
> (p->MC) = &MC;
>
> res = pthread_create(&threads[num_threads],
> NULL,
> mult_worker,
> (void *) p);

pthread_detach ( threads[num_threads] );

( You might want to remove the unless /threads/ array and have instead
one variable thrid )

If you detach the threads, you do not need your 'hugly' workaround.

Cheers,
Loic.

el_ba...@nospam.com

unread,

Dec 7, 2004, 1:41:33 PM12/7/04

to

Hello Loic,

>> I've got yet another "silly" programme that just doesn't really run as
>> I want it to and I seem to be a bit out of knowledge on how to debug
>> it. The problem is that under Linux (2.6.x kernel, 1.5GHz CPU, 1GB
>> RAM) I can only create about 80 threads before I get an EAGAIN.
>
> Just detach you thread.

It does not work the way I want it to, see below:

Also I cannot believe that 80 forks would already give me this
behaviour. How costly is one thread in memory if this question may be
asked so simple?

>> res = pthread_create(&threads[num_threads],
>> NULL,
>> mult_worker,
>> (void *) p);
>
> pthread_detach ( threads[num_threads] );

But then I need to spawn another thread and use a condition variable to
synchronise the peer threads. Only when the whole matrix is calculated I
can have it print out by the main thread. To me it looks like I have 3
choices:

1. Spawn off the ARRAY_SIZE^2 threads and wait for completion using
pthread_join(). I need threads[num_threads] for that I believe.
2. Spawn off the ARRAY_SIZE^2 threads and have the MC struct locked with
a hierachy of mutexes. Each thread creates a lock and upon
pthread_exit() does an unlock. The main thread tries to acquire a
lock as well and will have to wait until all other threads have
completed.
3. Spawn off the ARRAY_SIZE^2 threads and have the global int array with
size ARRAY_SIZE^ARRAY_SIZE as condition variable. A separate peer
threads is created that will pthread_cond_wait() and check each time
when woken up if the the content on index 1 is 0. Each peer will put
a 1 into the array at tid mod ARRAY_SIZE^2 and of course read_lock
it.

> ( You might want to remove the unless /threads/ array and have instead
> one variable thrid )

This I don't understand. I needed it to join all the threads for
synchronisation before I can print out the final matrix. It's not about
the matrix calculation (for that I could just save it and wait for the
program to complete) but it's a general issue. Also my first test
program had the worker threads complete too early. This is fixed in my
second version.

> If you detach the threads, you do not need your 'hugly' workaround.

For me the problem remains and with detached threads I would also have
to add another synchronisation primitive. Colour me stupid but it still
fails to create more than 250 threads using this stupid code below:

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>

#define ARRAY_SIZE 50

typedef int matrix_t[ARRAY_SIZE][ARRAY_SIZE];

typedef struct {
int id;
int size;
int Arow;
int Bcol;
matrix_t *MA, *MB, *MC;
} package_t;

matrix_t MA, MB, MC;

void mult(int size, int row, int column, matrix_t MA, matrix_t MB,
matrix_t MC)
{
int position;

MC[row][column] = 0;

for (position = 0; position < size; position++) {

MC[row][column] = MC[row][column] +
(MA[row][position] * MB[position][column]);
}
}

void *mult_worker(void *arg)
{
package_t *p = (package_t *) arg;

struct timespec my_timespec;
struct timespec rem;
int retval;

my_timespec.tv_sec = 2;
my_timespec.tv_nsec = 500*1000*1000;
retval = nanosleep(&my_timespec,&rem);

printf("MATRIX THREAD %d: processing A row %d, B col %d\n",
p->id, p->Arow, p->Bcol);

mult(p->size, p->Arow, p->Bcol, *(p->MA), *(p->MB), *(p->MC));

free(p);

printf("This is thread [%d] going to sleep\n", (int)
pthread_self());
retval = nanosleep(&my_timespec,&rem);
if (retval != 0) {
perror("nanosleep()");

}
printf("MATRIX THREAD %d: complete\n", p->id);

pthread_exit(EXIT_SUCCESS);
}

int main(void)
{
int row, column, num_threads, i;

int res;
pthread_t *threads;
package_t *p;

/* one thread will be created for each element of the matrix. */
threads = (pthread_t *) malloc(ARRAY_SIZE * ARRAY_SIZE *
sizeof(pthread_
t));

/* Fill in matrix values, currently values are hardwired */

for (row = 0; row < ARRAY_SIZE; row++) {
for (column = 0; column < ARRAY_SIZE; column++) {

MA[row][column] = 1;
}
}

for (row = 0; row < ARRAY_SIZE; row++) {
for (column = 0; column < ARRAY_SIZE; column++) {

MB[row][column] = row + column + 1;
}
}

/* Process Matrix, by row, column, Create a thread to process

each element in the resulting matrix */
num_threads = 0;

for (row = 0; row < ARRAY_SIZE; row++) {
for (column = 0; column < ARRAY_SIZE; column++) {
p = (package_t *) malloc(sizeof(package_t));
p->id = num_threads;
p->size = ARRAY_SIZE;

p->Arow = row;
p->Bcol = column;
(p->MA) = &MA;
(p->MB) = &MB;
(p->MC) = &MC;
res = pthread_create(&threads[num_threads],
NULL, mult_worker, (void *)p);

if (res != 0) {
perror("thread creation");
exit(1);
}

printf("MATRIX MAIN THREAD: thread %d created\n",
num_threads);

pthread_detach(threads[num_threads]);
printf("MATRIX MAIN THREAD: thread %d detached\n",

num_threads);
num_threads++;
}
}
/* Synchronize on the completion of the element in each thread. */

for (i = 0; i < (ARRAY_SIZE * ARRAY_SIZE); i++) {
pthread_join(threads[i], NULL);

printf("MATRIX MAIN THREAD: child %d has joined\n", i);
}
printf("MATRIX MAIN THREAD: The resulting matrix C is:\n");

for (row = 0; row < ARRAY_SIZE; row++) {
for (column = 0; column < ARRAY_SIZE; column++) {
printf("%5d ", MC[row][column]);
}
printf("\n");
}
pthread_exit(EXIT_SUCCESS);
}

I'm fully aware of the fact that with this solution there is no
synchronisation and thus it fails badly to print the correct matrix in
the end.

Best regards,

el_ba...@nospam.com

unread,

Dec 7, 2004, 2:04:47 PM12/7/04

to

Replying to myself:

> It does not work the way I want it to, see below:
>
> Also I cannot believe that 80 forks would already give me this
> behaviour. How costly is one thread in memory if this question may be
> asked so simple?

Ok, according to following test programme, a thread consumes rather much
stack memory (8MB it seems):

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

#define MAX_THREADS 10000
int i;

void run(void) {
char c;
if (i < 10) {
printf("Address of c = %u KB\n", (unsigned int)&c / 1024);
}
sleep(60 * 60);
}

int main(int argc, char *argv[]) {
int rc = 0;
pthread_t thread[MAX_THREADS];
printf("Creating threads ...\n");
for (i = 0; i < MAX_THREADS && rc == 0; i++) {
rc = pthread_create(&(thread[i]), NULL, (void *)&run,
NULL);
if (rc == 0) {
pthread_detach(thread[i]);
if ((i + 1) % 100 == 0) {
printf("%i threads so far ...\n", i + 1);
}
} else {
printf
("Failed with return code %i creating
thread %i.\n",
rc, i + 1);
exit(EXIT_FAILURE);
}
}
exit(EXIT_SUCCESS);
}

#> ./max_threads
Creating threads ...
Address of c = 1058146 KB
Address of c = 1066342 KB
Address of c = 1074538 KB
Address of c = 1082734 KB
Address of c = 1090930 KB
Address of c = 1099126 KB
Address of c = 1107322 KB
Address of c = 1115518 KB
Address of c = 1123714 KB
Address of c = 1131910 KB
100 threads so far ...
Failed with return code 12 creating thread 127.

Holy cow, I'll look into way of reducing the default stack size in the
2.6.x kernel. It all makes sense now ;).

Cheers,

Patrick TJ McPhee

unread,

Dec 9, 2004, 12:46:05 AM12/9/04

to

In article <41B5FECF...@nospam.com>, <el_ba...@nospam.com> wrote:
% Holy cow, I'll look into way of reducing the default stack size in the
% 2.6.x kernel. It all makes sense now ;).

You can do it with a thread attribute.
--

Patrick TJ McPhee
North York Canada
pt...@interlog.com

el_ba...@nospam.com

unread,

Dec 9, 2004, 4:08:22 AM12/9/04

to

> You can do it with a thread attribute.

Yes, with the following programme and a few ulimit and proc-fs tweaks I
was able to create ~32k concurrent threads:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#ifdef USE_NPTL
#include <nptl/bits/local_lim.h>
#include <nptl/pthread.h>
#else
#include <pthread.h>
#include <bits/local_lim.h>
#include <errno.h>
#endif

#define MAX_THREADS 100000
int i;

void run(void) {
char c;

if (i < 10) {
printf("Address of c = %u KB\n", (unsigned int)&c / 1024);
}
sleep(60 * 60);
}

int main(int argc, char *argv[]) {
int rc = 0;
pthread_t thread[MAX_THREADS];

pthread_attr_t tattr;
size_t size;

if (argc < 2) {
printf("%s <stack size>\n", argv[0]);
exit(EXIT_FAILURE);
} else {
size = strtoul(argv[1], NULL, 10);
size = size + PTHREAD_STACK_MIN;

}
printf("Creating threads ...\n");

pthread_attr_init(&tattr);

for (i = 0; i < MAX_THREADS && rc == 0; i++) {

pthread_attr_setstacksize(&tattr, size);
rc = pthread_create(&(thread[i]), &tattr, (void *)run, NULL);
if (rc != 0) {
printf("Failed with retcode %i creating thread %i.\n", rc, i + 1);
perror("pthread_create");
exit(EXIT_FAILURE);

}
pthread_detach(thread[i]);
if ((i + 1) % 100 == 0) {
printf("%i threads so far ...\n", i + 1);

rc = pthread_attr_getstacksize(&tattr, &size);
if (rc == 0) {
printf("Stack size: %d\n", size);
}
}
}
exit(EXIT_SUCCESS);
}

Is there a guideline or best practise on when to use pthread_exit() and
when exit()?

Best regards,

Patrick TJ McPhee

unread,

Dec 10, 2004, 12:49:51 AM12/10/04

to

In article <41b812e8$1...@news.cybercity.ch>, <el_ba...@nospam.com> wrote:

% for (i = 0; i < MAX_THREADS && rc == 0; i++) {
% pthread_attr_setstacksize(&tattr, size);
% rc = pthread_create(&(thread[i]), &tattr, (void *)run, NULL);

Apropos of nothing at all, you don't really need to call setstacksize
inside this loop, since you never change size.

% rc = pthread_attr_getstacksize(&tattr, &size);

And you don't really need to call getstacksize to find the value of
size -- it's just going to return whatever value you passed to it
before. There's no reliable way to find out the actual stack size
set aside for the thread.

% Is there a guideline or best practise on when to use pthread_exit() and
% when exit()?

Use pthread_exit() when you want a thread to stop running. Use exit()
when you want a process to stop running.

el_ba...@nospam.com

unread,

Dec 10, 2004, 8:06:23 AM12/10/04

to

> % for (i = 0; i < MAX_THREADS && rc == 0; i++) {
> % pthread_attr_setstacksize(&tattr, size);
> % rc = pthread_create(&(thread[i]), &tattr, (void *)run, NULL);
>
> Apropos of nothing at all, you don't really need to call setstacksize
> inside this loop, since you never change size.

Absolutely correct, this I forgot.

> % rc = pthread_attr_getstacksize(&tattr, &size);
>
> And you don't really need to call getstacksize to find the value of
> size -- it's just going to return whatever value you passed to it
> before. There's no reliable way to find out the actual stack size
> set aside for the thread.

Indeed, I just checked the linuxthreads glibc source code:

int __pthread_attr_getstacksize(const pthread_attr_t *attr, size_t
*stacksize)
{
*stacksize = attr->__stacksize;
return 0;
}
weak_alias (__pthread_attr_getstacksize, pthread_attr_getstacksize)

And for the NPTL part of glibc it looks as follows:

int
__pthread_attr_getstacksize (attr, stacksize)
const pthread_attr_t *attr;
size_t *stacksize;
{
struct pthread_attr *iattr;

assert (sizeof (*attr) >= sizeof (struct pthread_attr));
iattr = (struct pthread_attr *) attr;

/* If the user has not set a stack size we return what the system
will use as the default. */
*stacksize = iattr->stacksize ?: __default_stacksize;

return 0;
}
strong_alias (__pthread_attr_getstacksize, pthread_attr_getstacksize)

> Use pthread_exit() when you want a thread to stop running. Use exit()
> when you want a process to stop running.

So if I spawn 100 threads within a process and in one of them I call
exit() the whole lot of threads will be terminated in a clean way? I
assume so but I will test.

Thanks a lot for your answers,

David Butenhof

unread,

Dec 10, 2004, 8:48:32 AM12/10/04

to

el_ba...@nospam.com wrote:

> So if I spawn 100 threads within a process and in one of them I call
> exit() the whole lot of threads will be terminated in a clean way? I
> assume so but I will test.

No, not "clean". Threads terminated by exit() will not unwind and call
C++ destructors or POSIX cleanup handlers, and they won't call
thread-specific data destructors.

This is fine if your threads never have persistent or global state
that's visible outside the process. But if you do have any such state,
you cann't use exit() until you know that all threads' data is stable.
(Usually by shutting down all the threads in some controlled manner first.)

--
Dave Butenhof, David.B...@hp.com
HP Utility Pricing software, POSIX thread consultant
Manageability Solutions Lab (MSL), Hewlett-Packard Company
110 Spit Brook Road, ZK2/3-Q18, Nashua, NH 03062