internal/abi: set MaxPtrmaskBytes to 16
When MaxPtrmaskBytes was introduced in CL 9888, it was given the value 16.
That seemed like a good tradeoff between space in the binary and
code computing the GC bitmask from a GC program.
In CL 10815 MaxPtrmaxBytes was increased to 2048, because that CL
changed channel sending to use typeBitsBulkBarrier, which did not
support GC programs. The value 2048 was chosen to ensure that all
types up to 64K would use a GC bitmask, as channel element types are
limited to 64K.
In CL 616255 GC programs were removed and the GC bitmask,
if not precomputed, was instead generated from the type descriptor.
As part of this change the restriction on typeBitsBulkBarrier was removed.
Thus the requirement of setting MaxPtrmaskBytes to 2048 no longer applies.
This CL restores MaxPtrmaskBytes back to the original value of 16.
For reference, in tailscaled this changes the number of types
requiring the GC bitmask to be computed from 6 to 49.
This saves about 100 bytes in the executable, which I admit isn't much.
On the other hand some of those precomputed bitmasks are never used,
such as the one generated for runtime.spanQueue.
diff --git a/src/internal/abi/type.go b/src/internal/abi/type.go
index e420ce2..8e008d8 100644
--- a/src/internal/abi/type.go
+++ b/src/internal/abi/type.go
@@ -871,31 +871,10 @@
// MaxPtrmaskBytes is the maximum length of a GC ptrmask bitmap,
// which holds 1-bit entries describing where pointers are in a given type.
-// Above this length, the GC information is recorded as a GC program,
-// which can express repetition compactly. In either form, the
-// information is used by the runtime to initialize the heap bitmap,
-// and for large types (like 128 or more words), they are roughly the
-// same speed. GC programs are never much larger and often more
-// compact. (If large arrays are involved, they can be arbitrarily
-// more compact.)
+// Above this length, the runtime computes the GC ptrmask bitmap as needed.
+// The information is used by the runtime to initialize the heap bitmap.
//
-// The cutoff must be large enough that any allocation large enough to
-// use a GC program is large enough that it does not share heap bitmap
-// bytes with any other objects, allowing the GC program execution to
-// assume an aligned start and not use atomic operations. In the current
-// runtime, this means all malloc size classes larger than the cutoff must
-// be multiples of four words. On 32-bit systems that's 16 bytes, and
-// all size classes >= 16 bytes are 16-byte aligned, so no real constraint.
-// On 64-bit systems, that's 32 bytes, and 32-byte alignment is guaranteed
-// for size classes >= 256 bytes. On a 64-bit system, 256 bytes allocated
-// is 32 pointers, the bits for which fit in 4 bytes. So MaxPtrmaskBytes
-// must be >= 4.
-//
-// We used to use 16 because the GC programs do have some constant overhead
-// to get started, and processing 128 pointers seems to be enough to
-// amortize that overhead well.
-//
-// To make sure that the runtime's chansend can call typeBitsBulkBarrier,
-// we raised the limit to 2048, so that even 32-bit systems are guaranteed to
-// use bitmaps for objects up to 64 kB in size.
-const MaxPtrmaskBytes = 2048
+// We use 16 because computing the GC ptrmask bitmap has some overhead
+// the first time the bitmap is required, and processing 128 pointers
+// seems to be enough to amortize that overhead well.
+const MaxPtrmaskBytes = 16
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// the first time the bitmap is required, and processing 128 pointers
// seems to be enough to amortize that overhead well.I don't know that this is true anymore.
looking at the code path, after first use, we're just adding an extra indirection and a couple predictable branches (https://cs.opensource.google/go/go/+/master:src/runtime/type.go;l=116?q=getGCMaskOn&ss=go%2Fgo), it may be that this actually scales down further (but I doubt we gain much by doing so).
either way, my point is I think we should just drop this part of the comment and we should state that 16 is arbitrary. or we could just measure it again. (for instance, we could simulate the inner loop of runtime.scanObject, specifically the iteration part, in a benchmark, and find the point where the extra cost disappears.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
// the first time the bitmap is required, and processing 128 pointers
// seems to be enough to amortize that overhead well.I don't know that this is true anymore.
looking at the code path, after first use, we're just adding an extra indirection and a couple predictable branches (https://cs.opensource.google/go/go/+/master:src/runtime/type.go;l=116?q=getGCMaskOn&ss=go%2Fgo), it may be that this actually scales down further (but I doubt we gain much by doing so).
either way, my point is I think we should just drop this part of the comment and we should state that 16 is arbitrary. or we could just measure it again. (for instance, we could simulate the inner loop of runtime.scanObject, specifically the iteration part, in a benchmark, and find the point where the extra cost disappears.)
Good point. And now I feel bad because I haven't done any benchmarking. But not enough to actually try to do a real benchmark. It seems likely that any effect is going to be marginal either way.
Anyhow I updated the comment to be more honest.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// the first time the bitmap is required, and processing 128 pointers
// seems to be enough to amortize that overhead well.Ian Lance TaylorI don't know that this is true anymore.
looking at the code path, after first use, we're just adding an extra indirection and a couple predictable branches (https://cs.opensource.google/go/go/+/master:src/runtime/type.go;l=116?q=getGCMaskOn&ss=go%2Fgo), it may be that this actually scales down further (but I doubt we gain much by doing so).
either way, my point is I think we should just drop this part of the comment and we should state that 16 is arbitrary. or we could just measure it again. (for instance, we could simulate the inner loop of runtime.scanObject, specifically the iteration part, in a benchmark, and find the point where the extra cost disappears.)
Good point. And now I feel bad because I haven't done any benchmarking. But not enough to actually try to do a real benchmark. It seems likely that any effect is going to be marginal either way.
Anyhow I updated the comment to be more honest.
It seems likely that any effect is going to be marginal either way.
agreed! benchmarking it to find the true cutoff is almost certainly overkill.
thanks for updating the comment.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +2 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// the first time the bitmap is required, and processing 128 pointers
// seems to be enough to amortize that overhead well.Ian Lance TaylorI don't know that this is true anymore.
looking at the code path, after first use, we're just adding an extra indirection and a couple predictable branches (https://cs.opensource.google/go/go/+/master:src/runtime/type.go;l=116?q=getGCMaskOn&ss=go%2Fgo), it may be that this actually scales down further (but I doubt we gain much by doing so).
either way, my point is I think we should just drop this part of the comment and we should state that 16 is arbitrary. or we could just measure it again. (for instance, we could simulate the inner loop of runtime.scanObject, specifically the iteration part, in a benchmark, and find the point where the extra cost disappears.)
Michael KnyszekGood point. And now I feel bad because I haven't done any benchmarking. But not enough to actually try to do a real benchmark. It seems likely that any effect is going to be marginal either way.
Anyhow I updated the comment to be more honest.
It seems likely that any effect is going to be marginal either way.
agreed! benchmarking it to find the true cutoff is almost certainly overkill.
thanks for updating the comment.
Thanks.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |