Hello everyone,
Over the past year, I've encountered two strange issues with net.ListenTCPand listener.Accept. Without explicitly enabling reuseport, multiple service processes on the same machine, all searching for available ports starting from 9000, managed to successfully call listenon the same IP and port. At least when calling net.ListenTCP, it returned err == nil, and the error only appeared during listener.Accept. However, at the time, we weren't explicitly checking the returned error or printing the error message. Instead, when we found the returned conn == nil, we kept retrying listener.Acceptin a for-loop.
We've reproduced this issue twice within a year. The environment was a virtual machine allocated on a physical host with a Linux 5.4 kernel, and it was very difficult to reproduce. Our immediate fix was to add the error checking logic and print the specific error. While handling this issue, we also ran into the problem with netError.Temporary().
I completely agree with Ian's insight: "Whether an error is temporary depends on what you were doing at the time." For the specific case of listener.Accept(), even if netError.Temporary()returns true, retrying doesn't necessarily mean the service can remain available. Errors always manifest in wildly different ways. In our specific flawed usage scenario, the service had already successfully registered with the name service, and other services had already discovered it and started sending requests. However, because the listenwasn't actually successful (the IP:port was held by another process), it resulted in persistent access failures.
But if we don't use Temporary(), asking developers to enumerate all possible temporary errors that can be retried isn't a very straightforward task. Could several categorical functions, similar to IsTimeout, be provided to allow developers to combine them freely? For example, something like if ne.IsTimeout() || ne.IsXXX() || ne.IsYYY().
On Nov 18, 2025, at 3:36 AM, 'Brian Candler' via golang-nuts <golan...@googlegroups.com> wrote:
When the problem occurs, I suggest you look at "ss -natp" ("netstat -natp" on older systems) and see if you really do have two listening sockets on the same port and address.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/86d641cd-4503-4568-b491-f82b5fa705c9n%40googlegroups.com.
"Double bind race" refers to a scenario where multiple threads/CPUs attempt to bind() to the same (IP, port, proto) almost simultaneously. Due to a race condition window in the kernel when creating and inserting an inet_bind_bucket (port binding bucket), the following may occur :
Both threads may believe the port is available.
Both threads may create their own inet_bind_bucket.
The kernel might ultimately insert one bucket, but in an inconsistent state.
This can lead to one thread's bind operation failing with an unexpected error (e.g., not EADDRINUSE), or, in older versions, even result in a temporary "successful duplicate bind" (which theoretically should not happen) .
This type of race condition is typically difficult to reproduce and requires a multi-core environment with near-instantaneous concurrent attempts to bind to the same port .
```
I don't know if the kernel bug really exists, or is it caused by some virtualization technology bugs.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/0f47147b-5911-4f67-aa45-8eb00e722f5fn%40googlegroups.com.
I don't know if the kernel bug really exists, or is it caused by some virtualization technology bugs.
On Nov 18, 2025, at 7:55 AM, 'Brian Candler' via golang-nuts <golan...@googlegroups.com> wrote:
SO_REUSEADDR