query for runtime.reportZombies data race PoC code

134 views
Skip to first unread message

Lin Lin

unread,
Nov 18, 2024, 11:31:04 PMNov 18
to golang-nuts
Hi, gophers

Quite a few issues like https://github.com/golang/go/issues/47513 are caused by DATA RACE. I myselft also ran into one in Go1.17, and data race can be found in the code. But I'm unable to reproduce the issue stably, as I need to some PoC code to make the managers to believe it's caused by the DATA RACE then fixing them.

I'm struggling a while without any real progress by constructing all kinds of DATA RACE. Now I'm diiving into the runtime GC code very slowly.

Does any gopher know or have such PoC code, or any guides to construct one? That'll be great to save my effect to understand the GC code.

Thanks.

Kurtis Rader

unread,
Nov 19, 2024, 12:00:16 AMNov 19
to Lin Lin, golang-nuts
Your question is not clear. You seem to be saying you have a program that fails for a reason that might be a data race. Have you built and run your program with race detection enabled? If you do so the resulting race detection traceback should provide a clue regarding the nature of the race sufficient to identify the problem.

A proof of concept (PoC) illustrating a data race is easy to write. But a generic data race PoC is unlikely to help you solve the problem with your code.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/dfc27fe4-a4ef-4986-9411-20dc691de404n%40googlegroups.com.


--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Lin Lin

unread,
Nov 19, 2024, 1:06:11 AMNov 19
to golang-nuts
Yes, I do agree that it's easy to write data race code.
Sorry, I didn't make it clear.  I have the data race report, my concerning is to prove the relation bewteen data race and the crash.The code I want is actually that can trigger a runtime reportZombies crash. 

robert engels

unread,
Nov 19, 2024, 1:30:17 AMNov 19
to Lin Lin, golang-nuts
Any data race can cause a crash anywhere, see https://go.dev/doc/articles/race_detector so you need to fix the data races.

As for the reasons, from Google Gemini (looks accurate):

Yes, a Go data race can definitely cause a runtime crash.
Here's why:
  • Undefined Behavior:
    When two or more goroutines access the same memory location concurrently without proper synchronization, and at least one of them is a write operation, a data race occurs. This leads to undefined behavior, which means the program can behave in unpredictable ways, including crashing.
  • Memory Corruption:
    Data races can corrupt the internal state of data structures, leading to crashes or unexpected results. For example, if two goroutines try to modify the same map concurrently without proper synchronization, the map's internal structure could become corrupted, causing a crash.
  • Panics:
    In some cases, the Go runtime will detect a data race and trigger a panic to prevent further damage. This results in a controlled crash with an error message, which can help you identify and fix the problem. 

Kurtis Rader

unread,
Nov 19, 2024, 1:33:19 AMNov 19
to Lin Lin, golang-nuts
On Mon, Nov 18, 2024 at 10:06 PM Lin Lin <linsite...@gmail.com> wrote:
Yes, I do agree that it's easy to write data race code.
Sorry, I didn't make it clear.  I have the data race report, my concerning is to prove the relation bewteen data race and the crash.The code I want is actually that can trigger a runtime reportZombies crash. 

A "reportZombies" crash can have several reasons; including, but not limited to, a data race. I still don't understand how you think a generic program that causes a "reportZombies" crash will help you identify the bug in your program. I suggest starting by fixing the data races in your program. If that doesn't eliminate the "reportZombies" crashes then you have a more difficult problem to diagnose. Start by trying to identify whether you have pure Go code that is converting pointers to uintptr's (and not correctly managing such pointers), or (more likely) you are using CGo to link with non-Go code and your Go code is not correctly managing the lifecycle of the non-Go code data.
 

Lin Lin

unread,
Nov 19, 2024, 6:50:20 AMNov 19
to Kurtis Rader, golang-nuts
Thanks for your kind and instant reply.

Allow me to explain myself a bit more.

My code has two kinds of data races. First one is a global struct without any pointer member being written and read by multiple Goroutines. Second one is a struct's string member being written and read by multiple Gourotines. Data race report shows no other unsafe or cgo usage.

As far as I know, those kinds of data races may lead to a string or struct value inconsistency within the race time window. How can that inconsistency lead to a marked  free bit in runtime.mspan? This really puzzles me. 
Or maybe I took the wrong direction, this could be a hardware issue, like a memory bit flip? But I failed to find any issue in the Go community.

Robert Engels

unread,
Nov 19, 2024, 8:32:32 AMNov 19
to Lin Lin, Kurtis Rader, golang-nuts
As I shared, ANY data race can lead to a panic. You can be interfering with the GC object tracking. 

I agree with you that it seems not possible, but I was surprised to learn that it is. 

It is not the case in Java as there are assignment guarantees even without concurrency - but the data race can cause a crash if using native code. 

On Nov 19, 2024, at 5:50 AM, Lin Lin <linsite...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Ian Lance Taylor

unread,
Nov 19, 2024, 9:46:54 AMNov 19
to Lin Lin, Kurtis Rader, golang-nuts
On Tue, Nov 19, 2024 at 3:50 AM Lin Lin <linsite...@gmail.com> wrote:
>
> Thanks for your kind and instant reply.
>
> Allow me to explain myself a bit more.
>
> My code has two kinds of data races. First one is a global struct without any pointer member being written and read by multiple Goroutines. Second one is a struct's string member being written and read by multiple Gourotines. Data race report shows no other unsafe or cgo usage.
>
> As far as I know, those kinds of data races may lead to a string or struct value inconsistency within the race time window. How can that inconsistency lead to a marked free bit in runtime.mspan? This really puzzles me.
> Or maybe I took the wrong direction, this could be a hardware issue, like a memory bit flip? But I failed to find any issue in the Go community.

A data race in a string can confuse the garbage collector. A string
is a length and a pointer. If goroutine A updates the length in
parallel with goroutine B updating the pointer, then the garbage
collector can see a length and a pointer that do not correspond. If
the length is longer than the pointer expects, that can cause the GC
to mark some memory as allocated although it isn't. If that memory is
not possible to allocate, because it is part of the GC metadata, the
collector will crash. That is just one possible way to crash the
program with a race.

As a practical matter, it's a waste of time to try to reason about GC
behavior when you know that the program has a race. As Kurtis says,
fix the race first.

Ian

Lin Lin

unread,
Nov 19, 2024, 7:41:05 PMNov 19
to Ian Lance Taylor, Kurtis Rader, golang-nuts
Ian, thanks for your explanation, it really shed light on that for me. I certainly will fix the data race.

Thanks to all for your time.
Reply all
Reply to author
Forward
0 new messages