Endless Running Goroutine Terminates Without Panic

234 views
Skip to first unread message

Shashwat

unread,
Jan 26, 2023, 10:03:14 AM1/26/23
to golang-nuts
I need to monitor the status of the N sensors and log it. I am creating a separate goroutine for each sensor which fetches the state and logs it at regular intervals. For each goroutine, a channel is created to read the termination signal. Each sensor's state is maintained in sync.Map. The map maintains a pointer to the state. Apart from monitoring goroutines, there are processing goroutines that update the sensor's state.

In my current implementation, the goroutines are able to log the status at regular intervals. But after a few days, some goroutines stop logging the status. But the processing goroutines are able to update the status.

As a goroutine may terminate if panic occurs, I used recover() to log the error and the monitoring thread is re-created. But this doesn't help either. I noticed that the goroutine has been terminated but error hasn't been logged.

I suspect that because of memory leak, the goroutine might be terminated.

Please find attached sample code.


It will be helpful if someone explains this abnormal behavior and provides possible solutions.
Thanks
monitoringThread.txt

Brian Candler

unread,
Jan 26, 2023, 10:38:28 AM1/26/23
to golang-nuts
> As a goroutine may terminate if panic occurs

If that happens, it will crash the entire program (go isn't python :-)

> I used recover() to log the error and the monitoring thread is re-created. But this doesn't help either. I noticed that the goroutine has been terminated but error hasn't been logged.

How are you sure then, that the goroutine has terminated, not that it's just stuck?

In any case, you shouldn't attempt to recover from a system-related panic (e.g. out of memory).  The system will be in an invalid state; better let it crash and be restarted by some external supervisor.  You can recover from an explicit panic() raised in your code, which is sometimes useful when unwinding the stack many levels, e.g. to terminate some deeply-nested recursive function, but that's something else.

> Each sensor's state is maintained in sync.Map. The map maintains a pointer to the state.

Whilst sync.Map protects against concurrent read/write access to the map itself, it doesn't protect against how you use that data.  If you retrieve a pointer, and then read or write via that pointer from different goroutines, then you'll still need to protect those accesses with a mutex.  It looks like you're doing this in the reading loop; we don't see the code which writes to the state.

However, the go maxim is "share memory by communicating".  Could you arrange it so that your data collection process sends data down a channel, instead of updating shared memory? 

Brian Candler

unread,
Jan 26, 2023, 11:09:41 AM1/26/23
to golang-nuts
On Thursday, 26 January 2023 at 15:38:28 UTC Brian Candler wrote:
> As a goroutine may terminate if panic occurs

If that happens, it will crash the entire program (go isn't python :-)

... although other goroutines may continue for a very short time afterwards.  I did see the above link generate the following output a couple of times:

Hello, 世界
panic: foo

goroutine 6 [running]:
main.main.func1()
        /tmp/sandbox319329598/prog.go:13 +0x65
created by main.main
        /tmp/sandbox319329598/prog.go:11 +0x76

Program exited.

Howard C. Shaw III

unread,
Jan 27, 2023, 9:58:34 AM1/27/23
to golang-nuts
Ctrl-\ in a Linux terminal sends the SIGQUIT signal. You can also send this signal with kill.

Sending SIGQUIT to a golang application (unless it traps it and changes the behavior) will cause it to print all of its goroutine's stack traces and then exit.

Perhaps you could use this to confirm that the goroutines have in fact exited and are not actually stuck waiting on a lock or IO?

Then if they *have* exited and have not tripped your recover, it was due to running to the end of the function and not a panic - so add a logging step at the end of the function to confirm this, and then investigate how it can reach that point without a panic. And if they have *not* exited, then the stacktrace should reveal what they are getting hung up on.
Reply all
Reply to author
Forward
0 new messages