Scheduler thread creation and release

322 views
Skip to first unread message

carl.mas...@gmail.com

unread,
Mar 5, 2016, 1:10:17 PM3/5/16
to golang-nuts
Tl;Dr: How and where are OS threads released and cleaned up, and how do I prevent their creation?

Hello go-nuts,

Background: I am trying to write a simple web scraper in Go, that is destined to run on a small, arm-based system.  The system is a a Beagle Bone Black, similar to a Raspberry Pi, and is fairly resource constrained.   In my current program, the scraper make heavy use of the os.Stat library call to check if a file exists before attempting to download it.  Because the system has very low internal storage, I am mapping a remote Samba share for storing the files.  

The problem that I am seeing is that sometimes the Stat syscalls take longer than expected, and tie up whole OS threads.  This results in more threads being created in the scheduler, until at last pthread_create fails and the entire program crashes.  Looking at the the dump of goroutines shows around 6000 goroutines, with about 280 of them waiting on the Stat syscall.  I don't believe this is a recoverable error, since the failure appears to actually be coming from runtime/cgo rather than goroutine trying to stat.

Inspecting the scheduler source code, I looked for where sched.mcount is ever decremented and didn't find any place. I also don't see anywhere leading to how to limit the number of threads in the system aside from runtime/debug.SetMaxThreads, which is fatal.

This brings up a lot of questions in my mind about how Go works, and how to mitigate my current issue:


1.  How do I limit the number of threads created?  Working with Goroutines has been such a pleasure; it would be a shame if I had to contort my program to be aware of threads in order to not crash.

2.  How are threads ever released back to the operation system?  Are they going to stick around forever based on the highest spike of syscalls in the history of the process?  I noticed that even in times of low load the thread count of my program hovered around 200 (as per /prod/pid/status)

3.  Can thread creation failure be a recoverable error?  In my case, it would be much better if thread creation failure was recoverable rather than fatal.  Even hanging until a new thread could be created or reused would be better, since it wouldn't abruptly leave all my open files and network connections.  Being able to profile a slow program that is caused by thread starvation would be a much better stability story than aborting.



I also some tangential questions that came up in my debugging:

4.  The failure I see seems to come from cgo:

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0xb6e94f96 m=2

I don't really have anything special going on here, it's a plain ol' Go program with no C involved by my actions (and I only use the standard library).  Why is the cgo mentioned in the output when crashing?

5.  What is the difference between runnable and syscall in the goroutine traceback output?  Both seem to be possible while hanging on a syscall.  Example:

goroutine 7464 [syscall]:
syscall.Syscall(0xc3, 0x14634ba0, 0x14573854, 0x0, 0x0, 0x4, 0x149d04)
/home/carl/.golive/src/syscall/asm_linux_arm.s:17 +0x8
syscall.Stat(0x14634b70, 0x25, 0x14573854, 0x0, 0x0)
/home/carl/.golive/src/syscall/zsyscall_linux_arm.go:1613 +0x8c

and

goroutine 7192 [runnable]:
syscall.Syscall(0xc3, 0x13c1fa70, 0x13bd38e4, 0x0, 0xffffffff, 0x0, 0x2)
/home/carl/.golive/src/syscall/asm_linux_arm.s:17 +0x8
syscall.Stat(0x13c1fa40, 0x25, 0x13bd38e4, 0x0, 0x0)
/home/carl/.golive/src/syscall/zsyscall_linux_arm.go:1613 +0x8c

Almost all of the hung Stat calls are in runnable, with only a tiny amount in syscall.

6.  How much memory does a thread typically take, or take by default in Go?  My ulimit is pretty high so I am pretty sure I am not hitting it.  The only other reason I can think of pthread_create failing is memory related.  

7.  There is a SIGABRT in my output.  Is this caused by the Go runtime trying to end itself, or from some outside source?  Would it make any sense to try and catch SIGABRT?

Various command line stuff:

$ go version
go version devel +a162d11 Fri Mar 4 04:10:36 2016 +0000 linux/arm

$ ulimit -u
3948

$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 297.40
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc08
CPU revision    : 2

$ free # run after program crashed.
             total       used       free     shared    buffers     cached
Mem:        508488     393084     115404          0      43108     210004
-/+ buffers/cache:     139972     368516
Swap:      2097148         60    2097088

$ uname -a
Linux beaglebone 3.8.13-bone50 #1 SMP Tue May 13 13:24:52 UTC 2014 armv7l GNU/Linux


Tamás Gulácsi

unread,
Mar 5, 2016, 2:14:23 PM3/5/16
to golang-nuts
Usually syscalls and C calls occupy an extra thread, so limit such calls by
1. Compile wit -tag netgo and CGO_ENABLED=0
2. Use a dedicated goroutine for os.Stat calls, and communicate through channels.

Manlio Perillo

unread,
Mar 5, 2016, 2:28:32 PM3/5/16
to golang-nuts, carl.mas...@gmail.com
Il giorno sabato 5 marzo 2016 19:10:17 UTC+1, carl.mas...@gmail.com ha scritto:
Tl;Dr: How and where are OS threads released and cleaned up, and how do I prevent their creation?

Hello go-nuts,


Hello.
 
Background: I am trying to write a simple web scraper in Go, that is destined to run on a small, arm-based system.  The system is a a Beagle Bone Black, similar to a Raspberry Pi, and is fairly resource constrained.   In my current program, the scraper make heavy use of the os.Stat library call to check if a file exists before attempting to download it.  Because the system has very low internal storage, I am mapping a remote Samba share for storing the files.  


I may have the same problem in future with a web crawler I'm writing.
But not because of hardware limitations, but because the crawler will run on a shared hosting where the number of processes/threads is limited.

It is very important that the number of created threads is kept under control, since it will not only cause the go program to crash, but may cause problems with the other executable (a Django web application).
If possible I prefer the application to crash, instead of causing troubles to the other executable.


Thanks  Manlio
 

Carl Mastrangelo

unread,
Mar 7, 2016, 12:16:30 PM3/7/16
to golang-nuts
I did use suggestion number 2 as a work around, but it still feels like having to contort my program.  The main problem with this is that effectively every syscall causing function will need this, and I have to wire the "Statter" through my execution flow.  All calls to Write and Close will have the same problem, since they are both interacting with the slow fs.

Are there any Go team members who can answer the other questions?

Ian Lance Taylor

unread,
Mar 7, 2016, 1:06:40 PM3/7/16
to carl.mas...@gmail.com, golang-nuts
On Sat, Mar 5, 2016 at 10:10 AM, <carl.mas...@gmail.com> wrote:
>
> The problem that I am seeing is that sometimes the Stat syscalls take longer
> than expected, and tie up whole OS threads. This results in more threads
> being created in the scheduler, until at last pthread_create fails and the
> entire program crashes. Looking at the the dump of goroutines shows around
> 6000 goroutines, with about 280 of them waiting on the Stat syscall. I
> don't believe this is a recoverable error, since the failure appears to
> actually be coming from runtime/cgo rather than goroutine trying to stat.

See https://golang.org/issue/7903 for some discussion on this general
issue.


> Inspecting the scheduler source code, I looked for where sched.mcount is
> ever decremented and didn't find any place. I also don't see anywhere
> leading to how to limit the number of threads in the system aside from
> runtime/debug.SetMaxThreads, which is fatal.

Both correct.


> 1. How do I limit the number of threads created? Working with Goroutines
> has been such a pleasure; it would be a shame if I had to contort my program
> to be aware of threads in order to not crash.

As discussed on issue 7903, we have not found a good general solution
for this problem. It's very hard for the Go runtime to know when it
is OK to delay an operation waiting for another operation to complete.
So, yes, at present, you unfortunately need to contort your program.


> 2. How are threads ever released back to the operation system? Are they
> going to stick around forever based on the highest spike of syscalls in the
> history of the process? I noticed that even in times of low load the thread
> count of my program hovered around 200 (as per /prod/pid/status)

At present threads are never released back to the operating system.


> 3. Can thread creation failure be a recoverable error? In my case, it
> would be much better if thread creation failure was recoverable rather than
> fatal. Even hanging until a new thread could be created or reused would be
> better, since it wouldn't abruptly leave all my open files and network
> connections. Being able to profile a slow program that is caused by thread
> starvation would be a much better stability story than aborting.

New threads are created independently of any goroutine context. It's
not obvious how failure to create a thread could be reported to the
program, or what the program could do to recover.

Hanging until a new thread can be created will solve some problems but
create other ones: some kinds of programs would silently deadlock.
There is some discussion of this at https://golang.org/issue/4056 .


> I also some tangential questions that came up in my debugging:
>
> 4. The failure I see seems to come from cgo:
>
> runtime/cgo: pthread_create failed: Resource temporarily unavailable
> SIGABRT: abort
> PC=0xb6e94f96 m=2
>
>
> I don't really have anything special going on here, it's a plain ol' Go
> program with no C involved by my actions (and I only use the standard
> library). Why is the cgo mentioned in the output when crashing?

By default, if you did not build with CGO_ENABLED=0, and program that
imports the net or os/user packages is a cgo program. In a cgo
program, every new thread is created by the runtime/cgo library. This
error message is clearly somewhat misleading and probably should be
changed.


> 5. What is the difference between runnable and syscall in the goroutine
> traceback output? Both seem to be possible while hanging on a syscall.
> Example:
>
> goroutine 7464 [syscall]:
> syscall.Syscall(0xc3, 0x14634ba0, 0x14573854, 0x0, 0x0, 0x4, 0x149d04)
> /home/carl/.golive/src/syscall/asm_linux_arm.s:17 +0x8
> syscall.Stat(0x14634b70, 0x25, 0x14573854, 0x0, 0x0)
> /home/carl/.golive/src/syscall/zsyscall_linux_arm.go:1613 +0x8c
>
>
> and
>
> goroutine 7192 [runnable]:
> syscall.Syscall(0xc3, 0x13c1fa70, 0x13bd38e4, 0x0, 0xffffffff, 0x0, 0x2)
> /home/carl/.golive/src/syscall/asm_linux_arm.s:17 +0x8
> syscall.Stat(0x13c1fa40, 0x25, 0x13bd38e4, 0x0, 0x0)
> /home/carl/.golive/src/syscall/zsyscall_linux_arm.go:1613 +0x8c
>
>
> Almost all of the hung Stat calls are in runnable, with only a tiny amount
> in syscall.

The runtime will only run a limited number of goroutines
simultaneously, as controlled by GOMAXPROCS. In the absence of other
information, my guess would be that you had many goroutines make a
system call simultaneously. As each one enter syscall.Syscall, it
went into syscall state, and freed up another goroutine slot,
permitting another goroutine to enter syscall.Syscall. The goroutines
enter system calls faster than the calls completed. Then the calls
started completing. Each completed system call moved the goroutine
back to runnable state, but now the burst happened on the other side:
the system calls completed more quickly than the schedule was able to
handle them. The result is a bunch of goroutines that have completed
the system call and are in runnable state waiting for a scheduler slot
to actually continue running.

Just a guess, though.


> 6. How much memory does a thread typically take, or take by default in Go?
> My ulimit is pretty high so I am pretty sure I am not hitting it. The only
> other reason I can think of pthread_create failing is memory related.

Threads are created with the system default thread stack size.


> 7. There is a SIGABRT in my output. Is this caused by the Go runtime
> trying to end itself, or from some outside source? Would it make any sense
> to try and catch SIGABRT?

The SIGABRT is because the fatal error when pthread_create fails calls
abort. Catching that signal would not permit the program to continue.

Ian

Ian Lance Taylor

unread,
Mar 7, 2016, 1:10:08 PM3/7/16
to Carl Mastrangelo, golang-nuts
On Mon, Mar 7, 2016 at 8:56 AM, Carl Mastrangelo
<carl.mas...@gmail.com> wrote:
> I did use suggestion number 2 as a work around, but it still feels like
> having to contort my program. The main problem with this is that
> effectively every syscall causing function will need this, and I have to
> wire the "Statter" through my execution flow. All calls to Write and Close
> will have the same problem, since they are both interacting with the slow
> fs.

It's not obvious to me why you have to write anything through your
execution flow. The simplest approach is to put a semaphore in front
of every independent call to the file system. The semaphore can be a
global variable. (The simplest way to implement a semaphore in Go is
with a buffered channel: acquire the semaphore by writing to the
channel, release it by reading from the channel.)

Ian

Carl Mastrangelo

unread,
Mar 7, 2016, 1:36:01 PM3/7/16
to golang-nuts, carl.mas...@gmail.com
That's is effectively what I did:

type statLim chan struct{}

var statter statLim = make(chan struct{}, 10)

func (s statLim) Stat(ctx context.Context, name string) (os.FileInfo, error) {
  select {
    case: <-ctx.Done(): return nil, ctx.Error()
    case s<-struct{}{}:
  }
  fi, err := os.Stat(name)
  select {
    case: <-ctx.Done(): return nil, ctx.Error()
    case: <-s:
  }
  return fi, err
}

The wiring I am referring to is to pass the statter around, but since it has to be global anyway, I believe you are right about wiring.
Reply all
Reply to author
Forward
0 new messages