Re: [go-nuts] Odd abot with runtime/cgo: pthread_create failed: xÓ"!

138 views
Skip to first unread message

Dave Cheney

unread,
Nov 14, 2012, 5:43:39 PM11/14/12
to j...@cloudflare.com, golan...@googlegroups.com
Hi John,

Can you share some more details of the platform you found this error on?

Can you share some details on the condition of the 30,000 odd goroutines at the time of the panic? How many of them were in syscall.Syscall?

Dave

On 15/11/2012, at 9:36, j...@cloudflare.com wrote:

I have a long running Go program that aborted with the 

error: runtime/cgo: pthread_create failed: xÓ"!
SIGABRT: abort
PC=0x800af5cbc

The hex of the odd characters is 78 ee 22 21 08.  At the time of the crash there were 30,427 goroutines running. This is running against 40ba4d4e4672.

Anyone seen anything like this before? 

John.

--
 
 

bryanturley

unread,
Nov 14, 2012, 6:01:41 PM11/14/12
to golan...@googlegroups.com
Also can you duplicate it on another machine?

Dave Cheney

unread,
Nov 14, 2012, 6:10:06 PM11/14/12
to John Graham-Cumming, golan...@googlegroups.com, j...@cloudflare.com
Does FreeBSD limit the per process threads to 256? Of the running goroutines, can you summarise the top of the call stack for each as I suspect almost all will be executing syscalls. 

On 15/11/2012, at 9:49, John Graham-Cumming <j...@cloudflare.com> wrote:

On Wednesday, November 14, 2012 10:44:06 PM UTC, Dave Cheney wrote:
Can you share some more details of the platform you found this error on?

64-bit FreeBSD
 
Can you share some details on the condition of the 30,000 odd goroutines at the time of the panic? How many of them were in syscall.Syscall?

Here are the counts and status of the goroutines.

   1 [finalizer wait]
   7 [running]
 210 [runnable]
1348 [semacquire]
1489 [syscall]
10675 [chan receive]
16697 [select]

John.

--
 
 

John Graham-Cumming

unread,
Nov 14, 2012, 6:16:58 PM11/14/12
to golan...@googlegroups.com
On Wednesday, November 14, 2012 11:01:41 PM UTC, bryanturley wrote:
Also can you duplicate it on another machine?

Non trivial to do this as this is happening on a production system that's under heavy load (HTTP traffic). This does not occur in our test systems.

John.
 

John Graham-Cumming

unread,
Nov 14, 2012, 6:18:31 PM11/14/12
to golan...@googlegroups.com, John Graham-Cumming
On Wednesday, November 14, 2012 11:10:26 PM UTC, Dave Cheney wrote:
Does FreeBSD limit the per process threads to 256? Of the running goroutines, can you summarise the top of the call stack for each as I suspect almost all will be executing syscalls. 

From the data I posted above:

   1 [finalizer wait]
   7 [running]
 210 [runnable]
1348 [semacquire]
1489 [syscall]
10675 [chan receive]
16697 [select]

So most are either in a select or chan receive. Only 1,489 of 30,000 are in syscall.

John.

John Graham-Cumming

unread,
Nov 14, 2012, 6:22:13 PM11/14/12
to golan...@googlegroups.com, John Graham-Cumming
Almost all the ones that are in syscall are here:

func Connect(fd int, sa Sockaddr) (err error) {
ptr, n, err := sa.sockaddr()
if err != nil {
return err
}
return connect(fd, ptr, n)
}

on the return statement.

Dave Cheney

unread,
Nov 14, 2012, 6:26:31 PM11/14/12
to John Graham-Cumming, golan...@googlegroups.com
I can't speak to the strange error message, but this sounds like

http://code.google.com/p/go/issues/detail?id=4056

but could be better described as

http://code.google.com/p/go/issues/detail?id=3412

In the latter issue, you'll find a link to a CL which should reduce
the number of goroutines in syscall state, however, from memory the CL
only covers linux, not freebsd, but adapting it should not be tricky.

Cheers

Dave

John Graham-Cumming

unread,
Nov 14, 2012, 6:32:04 PM11/14/12
to golan...@googlegroups.com, John Graham-Cumming
On Wednesday, November 14, 2012 11:26:43 PM UTC, Dave Cheney wrote:
I can't speak to the strange error message, but this sounds like

http://code.google.com/p/go/issues/detail?id=4056

but could be better described as

http://code.google.com/p/go/issues/detail?id=3412

Thanks. It's entirely possible that this is simply a 'you can't have any more threads' situation, but I'd like to understand the weird error message to make sure that it's not something else.

John.
 

Ian Lance Taylor

unread,
Nov 14, 2012, 8:03:27 PM11/14/12
to j...@cloudflare.com, golan...@googlegroups.com
On Wed, Nov 14, 2012 at 2:36 PM, <j...@cloudflare.com> wrote:
> I have a long running Go program that aborted with the
>
> error: runtime/cgo: pthread_create failed: xÓ"!
> SIGABRT: abort
> PC=0x800af5cbc

As far as I can tell, that error message is coming from these lines in
runtime/cgo/gcc_freebsd_amd64.c:

if (err != 0) {
fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
abort();
}

But it's peculiar that these lines do not print "error: ". I don't
know where that is coming from in your output. And, of course,
strerror should not return a garbage string. This code is compiled by
gcc and invokes libc functions in the usual way. strerror should not
return a garbage pointer.

Hmmm, wait. This file does not #include <string.h>. It's possible
that strerror was never declared and that GCC is implicitly declaring
it to return int. On amd64 int is 32 bits and char* is 64 bits, so it
is possible that when the return value is moved from %rax to %rdx only
the low order 32 bits are moved. This might then be an invalid
pointer, causing printf to spit out garbage. A series of guesses, to
be sure, but a possible explanation for what you are seeing, except
for the "error: " string. But to be safe let's have all those files
#include <string.h>. Unfortunately, if these guesses are correct,
there is no way to determine what error pthread_create actually
returned.

Ian

John Graham-Cumming

unread,
Nov 14, 2012, 8:41:30 PM11/14/12
to golan...@googlegroups.com, j...@cloudflare.com
On Thursday, November 15, 2012 1:03:40 AM UTC, Ian Lance Taylor wrote:
Hmmm, wait.  This file does not #include <string.h>.  It's possible
that strerror was never declared and that GCC is implicitly declaring
it to return int.  On amd64 int is 32 bits and char* is 64 bits, so it
is possible that when the return value is moved from %rax to %rdx only
the low order 32 bits are moved.  This might then be an invalid
pointer, causing printf to spit out garbage.  A series of guesses, to
be sure, but a possible explanation for what you are seeing, except
for the "error: " string.  But to be safe let's have all those files
#include <string.h>.  Unfortunately, if these guesses are correct,
there is no way to determine what error pthread_create actually
returned.

That seems like a likely explanation. The Linux version of the file does include <string.h> and there's a comment about it being for strerror(), but the FreeBSD version does not. Note that on FreeBSD strerror() requires <stdio.h> not <string.h>. 

It's likely that the problem I am seeing is actually just a thread limit on the machine being hit. If it turns out that the error message is garbled because of this issue then I'm happy because it's not something more serious.

But... shouldn't gcc being giving a warning and -Werror be used?

John.

Ian Lance Taylor

unread,
Nov 15, 2012, 1:29:15 AM11/15/12
to John Graham-Cumming, golan...@googlegroups.com
On Wed, Nov 14, 2012 at 5:41 PM, John Graham-Cumming <j...@cloudflare.com> wrote:
>
> But... shouldn't gcc being giving a warning and -Werror be used?

Ideally, yes. But this is gcc being invoked as part of a cgo build of
the runtime/cgo package, and no particular options are used in that
case.

Ian

Dave Cheney

unread,
Nov 15, 2012, 1:45:35 AM11/15/12
to Ian Lance Taylor, John Graham-Cumming, golan...@googlegroups.com
What about something like this

diff -r ceaa16504f36 src/pkg/runtime/cgo/cgo.go
--- a/src/pkg/runtime/cgo/cgo.go Thu Nov 15 13:59:46 2012 +1100
+++ b/src/pkg/runtime/cgo/cgo.go Thu Nov 15 17:44:53 2012 +1100
@@ -14,6 +14,7 @@
#cgo darwin LDFLAGS: -lpthread
#cgo freebsd LDFLAGS: -lpthread
#cgo linux LDFLAGS: -lpthread
+#cgo CFLAGS: -Werror
#cgo netbsd LDFLAGS: -lpthread
#cgo openbsd LDFLAGS: -lpthread
#cgo windows LDFLAGS: -lm -mthreads
diff -r ceaa16504f36 src/pkg/runtime/cgo/gcc_linux_amd64.c
--- a/src/pkg/runtime/cgo/gcc_linux_amd64.c Thu Nov 15 13:59:46 2012 +1100
+++ b/src/pkg/runtime/cgo/gcc_linux_amd64.c Thu Nov 15 17:44:53 2012 +1100
@@ -3,7 +3,6 @@
// license that can be found in the LICENSE file.

#include <pthread.h>
-#include <string.h> // strerror
#include <signal.h>
#include "libcgo.h"

which results in

# runtime/cgo
./gcc_linux_amd64.c: In function ‘libcgo_sys_thread_start’:
./gcc_linux_amd64.c:45:3: error: format ‘%s’ expects argument of type
‘char *’, but argument 3 has type ‘int’ [-Werror=format]
cc1: all warnings being treated as errors
> --
>
>

Ian Lance Taylor

unread,
Nov 15, 2012, 1:52:42 AM11/15/12
to Dave Cheney, John Graham-Cumming, golan...@googlegroups.com
Sounds great. In fact I think we should add all the options from
src/cmd/dist/build.c:

static char *proto_gccargs[] = {
"-Wall",
"-Wno-sign-compare",
"-Wno-missing-braces",
"-Wno-parentheses",
"-Wno-unknown-pragmas",
"-Wno-switch",
"-Wno-comment",
"-Werror",
"-fno-common",
"-ggdb",
"-O2",
};

Ian

Dave Cheney

unread,
Nov 15, 2012, 2:01:47 AM11/15/12
to Ian Lance Taylor, John Graham-Cumming, golan...@googlegroups.com
OK, i'll prepare a CL, it might take a bit of testing.

Dave Cheney

unread,
Nov 15, 2012, 2:17:07 AM11/15/12
to Ian Lance Taylor, John Graham-Cumming, golan...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages