| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
memclrNoHeapPointers(x, size)t hepuddsI suppose the comment was meant to say "even if span.needzero is _not_ set", but more importantly I don't follow why we need to zero out unconditionally - especially since above (line 1560) we only do it if needzero is set.
why we need to zero out unconditionally
Hi Carlo, thanks for taking a look! We need to zero out unconditionally here because the heap object was previously live and likely has non-zero user data, which we would not want to hand back to the user again without zeroing it.
For the pre-existing mallocgc code, I think there are two maybe distinct concepts:
1. the span tracks whether zeroing can be skipped (for example, because it is fresh memory from the OS that can be treated as zeroed already, so no need for the malloc code to zero it on the normal allocation path)
2. a mallocgc caller can say that they are taking responsibility for overwriting any bytes that will be visible to the user, so no need for the malloc code to zero out bytes that will be overwritten before ever being visible to user code.
For case 1, for this code on the reuse path -- because it was a previously live object, the _span_ might be saying a normal allocation doesn't need to be zeroed, but for a reused _object_ in such a span, we can't rely on that -- the memory for that object was already in the user's hands, regardless of what the rest of the span might look like.
For case 2, for this code on the reuse path -- the code you commented upon on line 1689 is in the scan case (that is, the heap object contains pointers), and case 2 in general only applies for the noscan case. (The code above on line 1560 at the time was for the noscan case).
And you are correct that I was missing a word in the original comment.
All that said, maybe I made some mistake, and I'm very happy to have more eyes on this, so thanks again!
Finally, some of the code you asked about is now in CL 698515.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// some of which are very expensive and would never be on by default.this is already true of doubleCheckMalloc, I think reusing doubleCheckMalloc instead of a new one is fine.
if !c.hasReusableNoscan(spc) {I think I would prefer this inverted. so, rather than moving around the old code, we make this the special path. it might also be worthwhile to put all that new stuff into its own function; I am slightly worried about making this function too big and have icache problems in the no reuse case.
with reuse we're already winning. IMO it's OK if we incur the cost of an extra function call.
const (these debug constants are pretty granular. I understand wanting to avoid log spam (so debugReusableLog seems fine), but maybe the others can be grouped together?
// TODO(thepudds): no-op for now for scan. Implemented in later CL in stack.throw here maybe, for now?
//go:linkname strings_freeSized strings.runtimefreesized
func strings_freeSized(p unsafe.Pointer, size uintptr, noscan bool) bool {
return freeSized(p, size, noscan)
}I would leave this out of this CL. I'm not yet totally sold on having stdlib call this API directly.
// Clear reusable slots.
// TODO(thepudds): is this needed? Probably not, probably allocated zeroed, but confirm.
for i := range c.reusableNoscan {
c.reusableNoscan[i] = 0
}you can drop this
for i := range c.reusableNoscan {
// For noscan objects, the nodes of the linked lists are the resusable heap objects themselves,
// so we can simply clear the linked list head pointers.
c.reusableNoscan[i] = 0
}nit: `clear(c.reusableNoscan[:])`
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// some of which are very expensive and would never be on by default.this is already true of doubleCheckMalloc, I think reusing doubleCheckMalloc instead of a new one is fine.
Having this `doubleCheckMallocExtended` here was pretty handy when doing initial runtime.free development because it made it more convenient to flip on the ~4 function-local `doubleCheck` consts in mbitmap.go from one spot here.
That said, happy to delete `doubleCheckMallocExtended ` now. (I left a parenthetical to remind myself and maybe other external contributors about the existence of the other ones.)
if !c.hasReusableNoscan(spc) {I think I would prefer this inverted. so, rather than moving around the old code, we make this the special path. it might also be worthwhile to put all that new stuff into its own function; I am slightly worried about making this function too big and have icache problems in the no reuse case.
with reuse we're already winning. IMO it's OK if we incur the cost of an extra function call.
Inverted as suggested. For pulling into a new function, for now I left that as a TODO and will leave this Gerrit comment unresolved for now.
const (these debug constants are pretty granular. I understand wanting to avoid log spam (so debugReusableLog seems fine), but maybe the others can be grouped together?
Done
// TODO(thepudds): no-op for now for scan. Implemented in later CL in stack.throw here maybe, for now?
Done
//go:linkname strings_freeSized strings.runtimefreesized
func strings_freeSized(p unsafe.Pointer, size uintptr, noscan bool) bool {
return freeSized(p, size, noscan)
}I would leave this out of this CL. I'm not yet totally sold on having stdlib call this API directly.
Done
// Clear reusable slots.
// TODO(thepudds): is this needed? Probably not, probably allocated zeroed, but confirm.
for i := range c.reusableNoscan {
c.reusableNoscan[i] = 0
}t hepuddsyou can drop this
Done
for i := range c.reusableNoscan {
// For noscan objects, the nodes of the linked lists are the resusable heap objects themselves,
// so we can simply clear the linked list head pointers.
c.reusableNoscan[i] = 0
}t hepuddsnit: `clear(c.reusableNoscan[:])`
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if !c.hasReusableNoscan(spc) {t hepuddsI think I would prefer this inverted. so, rather than moving around the old code, we make this the special path. it might also be worthwhile to put all that new stuff into its own function; I am slightly worried about making this function too big and have icache problems in the no reuse case.
with reuse we're already winning. IMO it's OK if we incur the cost of an extra function call.
Inverted as suggested. For pulling into a new function, for now I left that as a TODO and will leave this Gerrit comment unresolved for now.
Pulling out a function is now done.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if c.hasReusableNoscan(spc) {is this used frequently in follow-up CLs? if not, I think inlining the function here can be helpful to readers in understanding that this is only used if the GOEXPERIMENT is enabled.
x := mallocgcSmallNoscanReuse(c, span, spc, size, needzero)I think we also need to add this to the mallocgc stubs for size-specialized malloc. right now, this will never fire for allocations that skip the mallocgc entrypoint and go straight to a size-specialized malloc.
// TODO(thepudds): confirm / test that is correct and the math works out, including when there's internal fragmentation.maybe a simple way to check this is to make sure that repeated alloc/free of the same size class results in a net zero change to the G's assist credit. this should be doable from the runtime tests if you add an export_test function to get this value from the current G.
this is optional though, just leaving this here since the TODO suggest you might be worried about it. :) I think the logic here is right, but you're right that it is a little far away from the deductAssistCredit call.
// See publicationBarrier comment in mallocgcSmallNoscan.it just occurred to me -- if the caller zeroes the memory, is *it* executing a publication barrier? if not, it may be possible to observe non-zero memory during a race.
I think this is *probably* not a real problem, and definitely not for fixing in this CL.
// Note that we do not update span.freeIndexForScan, profiling info,this is slightly concerning to me in how it will affect the resulting profile. we may need to process/detach any profiling specials on free and resample on new ones to get an accurate result. unfortunately, detaching the profiling special would likely be very expensive, currently.
we really need a better specials system, and for a number of reasons, not just this.
I think this current approach is fine for a disabled-by-default GOEXPERIMENT, but I think we need to resolve this before switching the default.
please leave a comment/TODO about this here.
// freeSized must be called by the effective owner of ptr who knowsp? or change the argument name to ptr.
// be used past that moment. The intended callers aremaybe "In other words, ptr must be the last and only pointer to its referent."
// be used past that moment. The intended callers are
// a limited number of places in the standard library (e.g., strings.Builder)
// or the runtime (e.g., maps grow/split code, append-handling
// code). In the future, the compiler could also insert direct freeSized calls,
// though currently a related API freeTracked serves that purpose.I think you can just say "the intended caller is the compiler" for now. we can expand this as new callsites are added.
// freeSized is a no-op if called on a stack pointer. It must not be called"pointer into a stack" maybe?
// If the size of ptr's object is less than or equal to 16 bytes or
// greater than 32KiB bytes, freeSized is currently a no-op. It must only
// be called in alloc-safe places.maybe also say it currently crashes if noscan is false?
// This pointer is on the stack, so free is a no-op."points into our stack" maybe?
// TODO(thepudds): we could enforce no free on globals in bss or data. Maybe by checking span via spanOfhigh-level comment: we generally don't care about line length, except this is frequently causing wrap-around in Gerrit for me which makes it a bit harder to review. :) if you could shrink the line length a bit, I would appreciate it.
// TODO(thepudds): is something like this KeepAlive needed? For freeSized, we want
// the pointer to be alive, though I don't know if we need the KeepAlive here.
// (For freeTracked, we don't keep the pointer alive).
KeepAlive(p)I think it is not needed. if it is dead, that's OK. the important part is a GC transition cannot occur while we're in this function, since we're non-preemptible. I think you can remove this and add a comment to that effect.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
x := mallocgcSmallNoscanReuse(c, span, spc, size, needzero)I think we also need to add this to the mallocgc stubs for size-specialized malloc. right now, this will never fire for allocations that skip the mallocgc entrypoint and go straight to a size-specialized malloc.
oops! I see you do this in a follow-up.
// no allocs get reported. (Again, not the desired long-term behavior).ah, we should add a TODO to update alloc/free stats, which currently don't move. I don't remember seeing one, so putting a reminder in freeSize and in the Reuse functions would be helpful.
// The remaining tests below here are more expensive currently.how long are these tests? maybe skip if testing.Short()?
// TODO(thepudds): confirm / test that is correct and the math works out, including when there's internal fragmentation.maybe a simple way to check this is to make sure that repeated alloc/free of the same size class results in a net zero change to the G's assist credit. this should be doable from the runtime tests if you add an export_test function to get this value from the current G.
this is optional though, just leaving this here since the TODO suggest you might be worried about it. :) I think the logic here is right, but you're right that it is a little far away from the deductAssistCredit call.
actually yeah, after reviewing the follow-up CL, I recalled that the update to the assist credit is actually split inside mallocgc. I think a test here is a great idea. we have 3 updates, and none of them are close to each other.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
is this used frequently in follow-up CLs? if not, I think inlining the function here can be helpful to readers in understanding that this is only used if the GOEXPERIMENT is enabled.
Above, I added a new `runtimeFreegcEnabled` const, which follows the naming convention of the new-ish `sizeSpecializedMallocEnabled` const.
Here, I added a a check against that `runtimeFreegcEnabled`, which I think does help a reader know this branch is less interesting if runtime.freegc support is not enabled.
(I did not inline the whole function because `hasReusableNoscan` and the `hasReusable*` variants introduced in a later CL for !noscan support have changed a few times, and it also helps minimize how much needs to be duplicated into the malloc_stubs.go in a later CL. Hope that makes sense to you; happy to revisit if not).
// TODO(thepudds): confirm / test that is correct and the math works out, including when there's internal fragmentation.Michael Knyszekmaybe a simple way to check this is to make sure that repeated alloc/free of the same size class results in a net zero change to the G's assist credit. this should be doable from the runtime tests if you add an export_test function to get this value from the current G.
this is optional though, just leaving this here since the TODO suggest you might be worried about it. :) I think the logic here is right, but you're right that it is a little far away from the deductAssistCredit call.
actually yeah, after reviewing the follow-up CL, I recalled that the update to the assist credit is actually split inside mallocgc. I think a test here is a great idea. we have 3 updates, and none of them are close to each other.
That was a great suggestion to add that test. I was in fact missing a conditional check in the allocation credit handling here, which I've resolved now here (hopefully).
Making the test work was a little subtle and more complicated than simply doing some bracketed measurements, so I sent it as a separate CL 717520 to make it easier to review/discuss, currently immediately following this one in the stack. (And maybe you will point out a simpler testing approach, and then maybe it won't be subtle any longer).
// Note that we do not update span.freeIndexForScan, profiling info,this is slightly concerning to me in how it will affect the resulting profile. we may need to process/detach any profiling specials on free and resample on new ones to get an accurate result. unfortunately, detaching the profiling special would likely be very expensive, currently.
we really need a better specials system, and for a number of reasons, not just this.
I think this current approach is fine for a disabled-by-default GOEXPERIMENT, but I think we need to resolve this before switching the default.
please leave a comment/TODO about this here.
Done
// freeSized must be called by the effective owner of ptr who knowsp? or change the argument name to ptr.
Renamed to `ptr`.
// be used past that moment. The intended callers aremaybe "In other words, ptr must be the last and only pointer to its referent."
Done
// be used past that moment. The intended callers are
// a limited number of places in the standard library (e.g., strings.Builder)
// or the runtime (e.g., maps grow/split code, append-handling
// code). In the future, the compiler could also insert direct freeSized calls,
// though currently a related API freeTracked serves that purpose.t hepuddsI think you can just say "the intended caller is the compiler" for now. we can expand this as new callsites are added.
Done
// freeSized is a no-op if called on a stack pointer. It must not be called"pointer into a stack" maybe?
Done
// If the size of ptr's object is less than or equal to 16 bytes or
// greater than 32KiB bytes, freeSized is currently a no-op. It must only
// be called in alloc-safe places.maybe also say it currently crashes if noscan is false?
Done
// This pointer is on the stack, so free is a no-op."points into our stack" maybe?
Done
// TODO(thepudds): we could enforce no free on globals in bss or data. Maybe by checking span via spanOfhigh-level comment: we generally don't care about line length, except this is frequently causing wrap-around in Gerrit for me which makes it a bit harder to review. :) if you could shrink the line length a bit, I would appreciate it.
I wrapped this better, and took a mental note for the future.
(I did not revisit other places, but happy to do that if you were noticing a general issue or maybe this specific spot was just the worst offender).
FWIW, I tend to go wider than my norm when writing a TODO (to make more context pop out when grepping, etc.), but went too wide here.
// TODO(thepudds): is something like this KeepAlive needed? For freeSized, we want
// the pointer to be alive, though I don't know if we need the KeepAlive here.
// (For freeTracked, we don't keep the pointer alive).
KeepAlive(p)t hepuddsI think it is not needed. if it is dead, that's OK. the important part is a GC transition cannot occur while we're in this function, since we're non-preemptible. I think you can remove this and add a comment to that effect.
Done
// no allocs get reported. (Again, not the desired long-term behavior).ah, we should add a TODO to update alloc/free stats, which currently don't move. I don't remember seeing one, so putting a reminder in freeSize and in the Reuse functions would be helpful.
Related TODO about alloc/free stats added in mallocgcSmallNoscanReuse.
// The remaining tests below here are more expensive currently.how long are these tests? maybe skip if testing.Short()?
WASM was particularly slow, and I suspect especially on the tests that create a bunch of goroutines, which are below here (but I did not investigate WASM much).
Setting aside WASM, I did tune the tests to be faster and to do less work when `-short` is specified, but maybe I should do another pass. Or maybe I could leave a TODO for now for this CL to tune execution time a bit more?
The current numbers:
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// The remaining tests below here are more expensive currently.t hepuddshow long are these tests? maybe skip if testing.Short()?
WASM was particularly slow, and I suspect especially on the tests that create a bunch of goroutines, which are below here (but I did not investigate WASM much).
Setting aside WASM, I did tune the tests to be faster and to do less work when `-short` is specified, but maybe I should do another pass. Or maybe I could leave a TODO for now for this CL to tune execution time a bit more?
The current numbers:
- `gotip test -short -run=TestFreegc runtime` takes ~0.6 secs on a decent Linux VM.
- `gotip test -run=TestFreegc runtime` (no -short) takes ~6.5 sec on that same VM, which is probably too long.
Hi @mkny...@google.com, marking this as resolved after I tuned the test execution time a little more.
Latest numbers for `go test -run=TestFreegc` are:
Hopefully that's reasonable, at least for now (though of course, like everything here, happy to discuss more or tweak further if you have a different take).
I could also add a TODO to tune more.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |