| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Look at the change description of CL 766980 for the results of thoseIIUC, these results compare tiny allocations with specialized malloc today vs consolidated specialized malloc (this CL).
How do they compare against no specialized malloc (GOEXPERIMENT=nosizespecializedmalloc)? I assume that they are still faster?
if i < 16 {See comment in malloc_table_generated.go. I think we still want to emit panic for size 0?
func mallocgcTinySC2(size uintptr, typ *_type, needzero bool) unsafe.Pointer {It's nice that we can easily see the actual diff here 😄
if size&7 == 0 {
off = alignUp(off, 8)
} else if goarch.PtrSize == 4 && size == 12 {
off = alignUp(off, 8)
} else if size&3 == 0 {
off = alignUp(off, 4)
} else if size&1 == 0 {
off = alignUp(off, 2)
}I'd assume that the only real difference in microbenchmarks is that previously these branches would be eliminated, leaving exactly the right alignUp call.
The other uses below go from add/compare of register and immediate to register and register, which I'd guess is negligible.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
function was slower. Michael Matloobtrailing whitespace
Done
See comment in malloc_table_generated.go. I think we still want to emit panic for size 0?
Done
mallocPanic,Why did this go away?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Look at the change description of CL 766980 for the results of thoseIIUC, these results compare tiny allocations with specialized malloc today vs consolidated specialized malloc (this CL).
How do they compare against no specialized malloc (GOEXPERIMENT=nosizespecializedmalloc)? I assume that they are still faster?
I need to look at this more. They are faster when invoked directly, but not anymore through the jump table. I'm going to see if I can skip the jump table in the tiny case.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Look at the change description of CL 766980 for the results of thoseMichael MatloobIIUC, these results compare tiny allocations with specialized malloc today vs consolidated specialized malloc (this CL).
How do they compare against no specialized malloc (GOEXPERIMENT=nosizespecializedmalloc)? I assume that they are still faster?
I need to look at this more. They are faster when invoked directly, but not anymore through the jump table. I'm going to see if I can skip the jump table in the tiny case.
Okay, I was able to make a change to them so that they are now faster:
```
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD Ryzen 9 9950X3D 16-Core Processor
│ base │ sizespecializedmalloc │
│ sec/op │ sec/op vs base │
1 4.029n ± 2% 3.736n ± 1% -7.27% (p=0.000 n=10)
2 4.048n ± 1% 3.595n ± 1% -11.17% (p=0.000 n=10)
3 4.352n ± 1% 4.062n ± 1% -6.64% (p=0.000 n=10)
4 4.485n ± 1% 4.277n ± 1% -4.64% (p=0.000 n=10)
5 4.969n ± 1% 4.715n ± 1% -5.10% (p=0.000 n=10)
6 5.800n ± 1% 5.234n ± 0% -9.77% (p=0.000 n=10)
7 5.729n ± 1% 5.328n ± 0% -7.00% (p=0.000 n=10)
8 5.390n ± 2% 5.173n ± 1% -4.02% (p=0.000 n=10)
9 7.328n ± 1% 7.258n ± 1% -0.95% (p=0.015 n=10)
10 7.362n ± 2% 7.125n ± 0% -3.22% (p=0.000 n=10)
11 7.335n ± 1% 7.223n ± 1% -1.51% (p=0.001 n=10)
12 7.068n ± 1% 7.072n ± 1% ~ (p=0.529 n=10)
13 7.363n ± 2% 7.218n ± 1% -1.96% (p=0.001 n=10)
14 7.347n ± 2% 7.082n ± 1% -3.61% (p=0.000 n=10)
15 7.365n ± 1% 7.199n ± 1% -2.26% (p=0.002 n=10)
geomean 5.843n 5.571n -4.66%
```
i also checked the larger sizes to make sure that they weren't slower either since we made changes before the jump table
```
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD Ryzen 9 9950X3D 16-Core Processor
│ base │ sizespecializedmalloc │
│ sec/op │ sec/op vs base │
1 4.029n ± 2% 3.736n ± 1% -7.27% (p=0.000 n=10)
2 4.048n ± 1% 3.595n ± 1% -11.17% (p=0.000 n=10)
3 4.352n ± 1% 4.062n ± 1% -6.64% (p=0.000 n=10)
4 4.485n ± 1% 4.277n ± 1% -4.64% (p=0.000 n=10)
5 4.969n ± 1% 4.715n ± 1% -5.10% (p=0.000 n=10)
6 5.800n ± 1% 5.234n ± 0% -9.77% (p=0.000 n=10)
7 5.729n ± 1% 5.328n ± 0% -7.00% (p=0.000 n=10)
8 5.390n ± 2% 5.173n ± 1% -4.02% (p=0.000 n=10)
9 7.328n ± 1% 7.258n ± 1% -0.95% (p=0.015 n=10)
10 7.362n ± 2% 7.125n ± 0% -3.22% (p=0.000 n=10)
11 7.335n ± 1% 7.223n ± 1% -1.51% (p=0.001 n=10)
12 7.068n ± 1% 7.072n ± 1% ~ (p=0.529 n=10)
13 7.363n ± 2% 7.218n ± 1% -1.96% (p=0.001 n=10)
14 7.347n ± 2% 7.082n ± 1% -3.61% (p=0.000 n=10)
15 7.365n ± 1% 7.199n ± 1% -2.26% (p=0.002 n=10)
geomean 5.843n 5.571n -4.66%
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
runtime: consolidate tiny sizespecializedmalloc functions
In the sizespecializedmalloc goexperiment, we specialized the tiny
function per tiny size, so there was a different allocation function per
size from 1-15. This created a lot of functions for a code path that was
not executed that often. From the microbenchmarks, comparing the
consolidated tiny function in this cl with the per-size functions, the
specialized functions could be up to 20% faster, but for 8 byte
allocations, which are almost certainly the most common, the per-size
function was slower.
Look at the change description of CL 766980 for the results of those
microbenchmarks. The CL also contains the code used to run the
benchmark.
Since we've noticed significant icache pressure from all the functions,
and, the tiny functions aren't used as much as the other ones, and the
benefits seem to be mixed, consolidate the 15 functions into a single
function.
This cuts the size of the mallocgc* functions by about 20%.
For #79286
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |