On 11/14/2009 11:18 AM, Ian Lance Taylor wrote:
> Alex Kaushansky<
kaush...@gmail.com> writes:
>
>> As I gather from the above discussion , the only justification for NOT
>> always allocating struct values in heap, java-style, is that
>> sometimes you want to save nanoseconds and allocate it in stack.
>
> I think those nanoseconds might tend to add up once you factor in the
> garbage collection costs.
If speed is your god, then even stacks can be slow, since they still map all the way back to system memory. Anything not in registers runs the risk of having horrendously slow access in the event of a cache miss. Relatively speaking, of course.
For specific routines on embedded / real-time systems needing hair-on-fire speed, I'd lock the cache for the stack and prevent writebacks until the need for speed has passed. This has resulted in 100x speed-ups with no other changes to the compiled code. The writeback prevention frees bus bandwidth for use by other threads/tasks/cores.
The general theme is to ensure "locality of access". In such cases, pass-by-value can win by a huge margin, since the item is near the top of the stack (or in registers), so no dereferencing is needed, and there is no remote data to also try to keep cached. Pointer access can easily cause cache thrashing. In many cases, when multiple parallel access is not required, it is beneficial to implement pass-by-reference as a pair of pass-by-value operations, copy-in and copy-back, rather than via direct access to the referenced data.
The downside is stack size. When that becomes an issue, the program needs to be restructured to manage stack use. Since most processors are highly optimized for fast stack access (well, access to at least the top stack frame), other mechanisms can be slower, even if they are also locked in cache.
Furthermore, if you want to avoid GC in Go, the current best way is to ensure your data is either on the stack or is main-global (I'm not absolutely sure if Go allocates system globals on the heap or not). The heap can be bad when speed is everything.
snip
>> Because most of struct types have attached methods (via pointers), none of
>> them can be allocated in the stack. We drive java out through the door - it
>> comes back through the window.
>
> Fair enough, but I don't see why one need conclude that all struct
> values should be allocated on the heap. Allocating them on the stack
> when possible does work, and does not affect the language. The
> programmer normally doesn't have to care where a struct is allocated.
> The compiler is free to allocate it in the most efficient place.
Except when the programmer *does* care, which will happen when profiling shows compiler decisions are sub-optimal. This is precisely the reason why I hope Go will eventually provide a rich set of (possibly complex) memory and execution control features.
Once such controls exist, even the language toolchain can take advantage of them: Self-directed program optimization via automated analysis of profiler runs, AKA "autotuning", would also be a HUGE help. Recent research suggests this is by far the best known way to automatically maximize performance on multicore systems (CACM Vol 52 Number 10 Page 56).
>> If you go to the root of the problem, it's this: garbage collection doesn't
>> live very well with explicit pointers. What do you think?
>
> This conclusion doesn't seem to follow. I don't see any difficulty in
> garbage collection and explicit pointers living together.
>
> What you may be driving at is that we might as well turn structs into
> a reference type, but I don't think that follows. A small struct like
> "type Point struct { x, y int}" often does not needs its address
> taken, and it can be efficiently passed by value. Why not simply pass
> it by value when the fields don't need to change? That is also an
> example where "func (p Point) ..." rather than "func (p *Point) ..."
> makes perfect sense.
Clearly, there is no "one best way" that is optimal for all situations. Sometimes, when the need arises, only the programmer will know which way is optimal for a given situation, and therefore the language and its toolchain must provide the ability for the programmer to force specific things onto the stack or the heap, or even into registers.
If all you need is "good enough" speed, and are willing to give up the ability to occasionally obtain truly massive speedups, then leaving all decisions to the compiler will simplify the situation. But that could make Go unusable or inefficient for performance-critical applications.
At the bottom of the problem is the issue of keeping all cores fed while both instructions and data are accessed via a single shared system memory interface. While this is typically something the OS tries to manage through various forms of load balancing, it can only do so much if the application is actively thrashing the caches. In this case, one bad app on one core can limit memory access for all other cores. There is an inherent performance limit to conventional multicore architectures when the shared single memory interface exists.
Modern parallel programming will, by necessity, move many code execution decisions away from the OS and into the language runtime. Go already does this by how it maps goroutines to threads. It is possible, perhaps likely, that Go's future thread-level load balancing optimizations will conflict with the OS's thread/task/core management optimizations. Should this become a chronic situation, it will be well beyond the ability of the programmer to manage it correctly for each application, at which point the distinction between the OS and the language runtime on multicore systems will need to be reassessed from a higher perspective.
Until this day comes, both the OS and Go will need to let the programmer directly manage the situation. Fortunately, all modern OSs and RTOSs I've used provide such features, often through a rich set of several overlapping mechanisms. Go will need to provide similar programmer control of its memory and scheduling behavior.
Architecturally, Go may want to expose its mechanisms via something that looks like an OS extension, rather than as a Go-specific feature. Then a patch submitted to Linus could permit Linux to quickly become "multicore language aware", much as Linux has recently become virtualization-aware.
-BobC