Hello everyone,
I'd like to get feedback on an idea for changing how Go manages goroutine stack growth.
Below is a short draft of the proposal.
## Current State
Currently, almost every Go function prologue includes a stack growth check.
If the remaining stack space is insufficient, the runtime allocates a larger stack and copies the old one, adjusting pointers to local variables as needed.
**Drawbacks of this approach:**
* Increased CPU usage due to frequent stack size checks and possible reallocations
* Larger code size because of the additional prologue instructions
## Proposed Stack Management Mechanism
I would like to hear your opinion on the following stack growth mechanism and whether it's worth exploring further.
If you think that this idea has potential, I'll continue by estimating its effect on CPU usage and code size and, if estimations will look good enough, make a proof of concept.
### Reallocation via Page Faults
The idea is inspired by how Linux manages system thread stacks.
In Linux, each thread reserves (by default) 8 MB of virtual memory for its stack. Physical memory is mapped lazily - new pages are allocated when the thread touches them, via page faults.
When the stack limit is reached, the program aborts.
In Go, however, instead of aborting, we could reuse the existing stack growth logic - relocating the stack to a larger chunk when a page fault occurs near the stack boundary.
**Potential drawbacks**
* The Go runtime would need to handle page faults:
* This might increase the number of page faults and add handling overhead
* It could be tricky to distinguish between stack-related and unrelated page faults
* A large number of goroutines will consume a large amount of virtual address space
* The minimal stack size would effectively increase from 2 KB to 4 KB (one physical page). In the worst case, when all goroutine use <2Kb stack space, this will double memory consumption
* This mechanism would depend on OS-level signal handling and may require platform-specific implementations
The main concern, as I see it, is the increased use of virtual address space.
A rough estimation:
100k goroutines with 8 MB stacks each would reserve ~800 GB (=2^3 * 10^5 * 2^20 ~ 2^38 B), i.e., about 1/1000 of the 2^48 bit virtual address space.
This seems acceptable, especially since we can reserve less than 8 MB.
The second concern is the larger minimum stack size (4 KB vs 2 KB). This could double memory consumption in the worst case.
I'm not yet sure whether this trade-off would be acceptable or if it can be mitigated.
Also, the cross-platform support is a major concern.
## Additional notes
* The current implementation supports stack shrinking (when less than 1/4 of the stack is used). I guess we can shrink stack with MADV_DONTNEED.
* Stack growth checks are currently tied to the goroutine preemption. Removing them might indirectly affect the scheduler. However, Go has other cooperative/asynchronous preemption, so this may not be a major issue.
### Conclusion
What do you think about this idea?
Is this direction worth further exploration? To get some concrete performance improvement estimations and make PoC?
Thank you for your time and feedback!