Is there any way to debug a hang?

3,470 views
Skip to first unread message

Jack Palevich

unread,
Jan 24, 2010, 5:07:28 AM1/24/10
to golang-nuts
I've run into a problem: my go based Taipei-Torrent application
sometimes appears to hang. Normally the program prints out log
messages every 10 seconds, but after it runs for a while (sometimes
minutes, sometimes hours) the log messages stop.

Is there any way of debugging this?

Ideally I'd like some way of printing the current stacks of all my
goroutines to see what the program is doing.

Russ Cox

unread,
Jan 24, 2010, 2:05:28 PM1/24/10
to Jack Palevich, golang-nuts

kill -ABRT your-prog-pid

will kill the program but as a side effect give you all the
current stacks of your goroutines.

Russ

Jack Palevich

unread,
Jan 24, 2010, 4:16:34 PM1/24/10
to r...@golang.org, golang-nuts
Hmm, I was hoping to see what the goroutines were doing. All I get this stack trace, which looks like a typical "waiting for the next event" state.

SIGABRT: abort
Faulting address: 0x5a208
pc: 0x5a208

mach_semaphore_wait+0xb /Users/jack/code/go/src/pkg/runtime/darwin/amd64/sys.s:206
mach_semaphore_wait()
mach_semacquire+0x1a /Users/jack/code/go/src/pkg/runtime/darwin/thread.c:423
mach_semacquire(0xb03, 0x0)
usemacquire+0x57 /Users/jack/code/go/src/pkg/runtime/darwin/thread.c:97
usemacquire(0xd92c8, 0x0)
notesleep+0x2a /Users/jack/code/go/src/pkg/runtime/darwin/thread.c:123
notesleep(0xd92c0, 0x0)
nextgandunlock+0xfc /Users/jack/code/go/src/pkg/runtime/proc.c:354
nextgandunlock()
scheduler+0xe0 /Users/jack/code/go/src/pkg/runtime/proc.c:509
scheduler()
mstart+0x47 /Users/jack/code/go/src/pkg/runtime/proc.c:400
mstart()
_rt0_amd64+0x74 /Users/jack/code/go/src/pkg/runtime/amd64/asm.s:46
_rt0_amd64()

Russ Cox

unread,
Jan 24, 2010, 4:40:11 PM1/24/10
to Jack Palevich, golang-nuts
that one should have been followed by all the others,
just like a normal crash. i can't explain that.

russ

Jack Palevich

unread,
Jan 24, 2010, 5:28:55 PM1/24/10
to golang-nuts
This is on OSX, if it matters.

I think I figured out the cause of the hang:

I think I have two goroutines that are deadlocked trying to send each
other messages. The channels they are using are not buffered, and
there's a situation where each side is trying to send more than one
message into their respective send channel before reading any more
messages from their receive channel.

Jack Palevich

unread,
Jan 24, 2010, 6:13:21 PM1/24/10
to golang-nuts
I wrote a little deadlock detector that checks for the main goroutine
locking up and panics if it detects that happening.

And indeed it looks like a send/send deadlock, where my main goroutine
is sending to another goroutine that is sending back to the main
goroutine on a channel with multiple writers.

The fix in this case is to enlarge one of my channels to hold multiple
messages.

My design has a single channel that's used by N client goroutines to
send messages to the main goroutine. I guess in order to avoid
deadlocks, I need to size it to hold N * M messages, where N is the
number of clients, M is the total number of messages that a given
client might need to send at once.

Luckily N * M is bounded for my application. (To about 50)

Jack Palevich

unread,
Jan 24, 2010, 6:22:20 PM1/24/10
to golang-nuts
Woops. While the below is a potential deadlock, it's not why my
program was deadlocking. Upon carefully reading the goroutine stack
dump it turns out that my actual deadlock was much dumber -- I was
mistakenly doing network output in my main goroutine, and occasionally
the other end would hang up.

D'Oh!

On Jan 25, 6:28 am, Jack Palevich <jack.palev...@gmail.com> wrote:

Reply all
Reply to author
Forward
0 new messages