Tobias Klauser would like Keith Randall and Ian Lance Taylor to review this change.
runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).
A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.
Test in runtime.osinit whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it (but only actually return it on Linux).
Fixes #23687
Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
---
M src/runtime/defs_linux_386.go
M src/runtime/defs_linux_amd64.go
M src/runtime/defs_linux_arm.go
M src/runtime/defs_linux_arm64.go
M src/runtime/defs_linux_mips64x.go
M src/runtime/defs_linux_mipsx.go
M src/runtime/defs_linux_ppc64.go
M src/runtime/defs_linux_ppc64le.go
M src/runtime/defs_linux_s390x.go
M src/runtime/mem_linux.go
M src/runtime/os_linux.go
M src/runtime/stubs2.go
M src/runtime/sys_linux_386.s
M src/runtime/sys_linux_amd64.s
M src/runtime/sys_linux_arm.s
M src/runtime/sys_linux_arm64.s
M src/runtime/sys_linux_mips64x.s
M src/runtime/sys_linux_mipsx.s
M src/runtime/sys_linux_ppc64x.s
M src/runtime/sys_linux_s390x.s
20 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/src/runtime/defs_linux_386.go b/src/runtime/defs_linux_386.go
index a7e435f..0ebac17 100644
--- a/src/runtime/defs_linux_386.go
+++ b/src/runtime/defs_linux_386.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_amd64.go b/src/runtime/defs_linux_amd64.go
index e8c6a21..c0a0ef0 100644
--- a/src/runtime/defs_linux_amd64.go
+++ b/src/runtime/defs_linux_amd64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_arm.go b/src/runtime/defs_linux_arm.go
index 62ec8fa..43946bb 100644
--- a/src/runtime/defs_linux_arm.go
+++ b/src/runtime/defs_linux_arm.go
@@ -16,6 +16,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_arm64.go b/src/runtime/defs_linux_arm64.go
index c295bc0..c2cc281 100644
--- a/src/runtime/defs_linux_arm64.go
+++ b/src/runtime/defs_linux_arm64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_mips64x.go b/src/runtime/defs_linux_mips64x.go
index df11cb0..9dacd5d 100644
--- a/src/runtime/defs_linux_mips64x.go
+++ b/src/runtime/defs_linux_mips64x.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_mipsx.go b/src/runtime/defs_linux_mipsx.go
index 702fbb5..9532ac5 100644
--- a/src/runtime/defs_linux_mipsx.go
+++ b/src/runtime/defs_linux_mipsx.go
@@ -22,6 +22,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_ppc64.go b/src/runtime/defs_linux_ppc64.go
index 45363d1..5a4326d 100644
--- a/src/runtime/defs_linux_ppc64.go
+++ b/src/runtime/defs_linux_ppc64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_ppc64le.go b/src/runtime/defs_linux_ppc64le.go
index 45363d1..5a4326d 100644
--- a/src/runtime/defs_linux_ppc64le.go
+++ b/src/runtime/defs_linux_ppc64le.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_s390x.go b/src/runtime/defs_linux_s390x.go
index ab90723..a6cc9c4 100644
--- a/src/runtime/defs_linux_s390x.go
+++ b/src/runtime/defs_linux_s390x.go
@@ -19,6 +19,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/mem_linux.go b/src/runtime/mem_linux.go
index 7aa4817..70b331f 100644
--- a/src/runtime/mem_linux.go
+++ b/src/runtime/mem_linux.go
@@ -102,7 +102,7 @@
throw("unaligned sysUnused")
}
- madvise(v, n, _MADV_DONTNEED)
+ madvise(v, n, adviseUnused)
}
func sysUsed(v unsafe.Pointer, n uintptr) {
diff --git a/src/runtime/os_linux.go b/src/runtime/os_linux.go
index a04c995..910e52d 100644
--- a/src/runtime/os_linux.go
+++ b/src/runtime/os_linux.go
@@ -274,8 +274,19 @@
return i / 2
}
+var adviseUnused = int32(_MADV_DONTNEED)
+
func osinit() {
ncpu = getproccount()
+
+ // MADV_FREE was added in Linux 4.5, check if it is supported.
+ p, err := mmap(nil, physPageSize, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+ if err == 0 {
+ if err = madvise(p, physPageSize, _MADV_FREE); err == 0 {
+ adviseUnused = _MADV_FREE
+ }
+ munmap(p, physPageSize)
+ }
}
var urandom_dev = []byte("/dev/urandom\x00")
diff --git a/src/runtime/stubs2.go b/src/runtime/stubs2.go
index 02249d0..793a3fc 100644
--- a/src/runtime/stubs2.go
+++ b/src/runtime/stubs2.go
@@ -25,7 +25,8 @@
//go:noescape
func open(name *byte, mode, perm int32) int32
-func madvise(addr unsafe.Pointer, n uintptr, flags int32)
+// return value is only set on linux to be used in osinit()
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int
// exitThread terminates the current thread, writing *wait = 0 when
// the stack is safe to reclaim.
diff --git a/src/runtime/sys_linux_386.s b/src/runtime/sys_linux_386.s
index 4e914f3..40b55a6 100644
--- a/src/runtime/sys_linux_386.s
+++ b/src/runtime/sys_linux_386.s
@@ -427,7 +427,7 @@
MOVL n+4(FP), CX
MOVL flags+8(FP), DX
INVOKE_SYSCALL
- // ignore failure - maybe pages are locked
+ MOVL AX, ret+12(FP)
RET
// int32 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_amd64.s b/src/runtime/sys_linux_amd64.s
index 4492dad..7e84637 100644
--- a/src/runtime/sys_linux_amd64.s
+++ b/src/runtime/sys_linux_amd64.s
@@ -519,7 +519,7 @@
MOVL flags+16(FP), DX
MOVQ $SYS_madvise, AX
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVL AX, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_arm.s b/src/runtime/sys_linux_arm.s
index a709c4c..43a5833 100644
--- a/src/runtime/sys_linux_arm.s
+++ b/src/runtime/sys_linux_arm.s
@@ -195,7 +195,7 @@
MOVW flags+8(FP), R2
MOVW $SYS_madvise, R7
SWI $0
- // ignore failure - maybe pages are locked
+ MOVW R0, ret+12(FP)
RET
TEXT runtime·setitimer(SB),NOSPLIT,$0
diff --git a/src/runtime/sys_linux_arm64.s b/src/runtime/sys_linux_arm64.s
index 086c8dd..8b344be 100644
--- a/src/runtime/sys_linux_arm64.s
+++ b/src/runtime/sys_linux_arm64.s
@@ -401,7 +401,7 @@
MOVW flags+16(FP), R2
MOVD $SYS_madvise, R8
SVC
- // ignore failure - maybe pages are locked
+ MOVW R0, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_mips64x.s b/src/runtime/sys_linux_mips64x.s
index 337299b..c45703d 100644
--- a/src/runtime/sys_linux_mips64x.s
+++ b/src/runtime/sys_linux_mips64x.s
@@ -291,7 +291,7 @@
MOVW flags+16(FP), R6
MOVV $SYS_madvise, R2
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_mipsx.s b/src/runtime/sys_linux_mipsx.s
index dca5f1e..f362b0f 100644
--- a/src/runtime/sys_linux_mipsx.s
+++ b/src/runtime/sys_linux_mipsx.s
@@ -302,13 +302,13 @@
UNDEF // crash
RET
-TEXT runtime·madvise(SB),NOSPLIT,$0-12
+TEXT runtime·madvise(SB),NOSPLIT,$0-16
MOVW addr+0(FP), R4
MOVW n+4(FP), R5
MOVW flags+8(FP), R6
MOVW $SYS_madvise, R2
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+12(FP)
RET
// int32 futex(int32 *uaddr, int32 op, int32 val, struct timespec *timeout, int32 *uaddr2, int32 val2);
diff --git a/src/runtime/sys_linux_ppc64x.s b/src/runtime/sys_linux_ppc64x.s
index 7c2f8ea..ed79b69 100644
--- a/src/runtime/sys_linux_ppc64x.s
+++ b/src/runtime/sys_linux_ppc64x.s
@@ -454,7 +454,7 @@
MOVD n+8(FP), R4
MOVW flags+16(FP), R5
SYSCALL $SYS_madvise
- // ignore failure - maybe pages are locked
+ MOVW R3, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_s390x.s b/src/runtime/sys_linux_s390x.s
index 95401af..c79ceea 100644
--- a/src/runtime/sys_linux_s390x.s
+++ b/src/runtime/sys_linux_s390x.s
@@ -290,7 +290,7 @@
MOVW flags+16(FP), R4
MOVW $SYS_madvise, R1
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
TryBots beginning. Status page: https://farmer.golang.org/try?commit=f7b8de56
Build is still in progress...
This change failed on misc-vet-vetall:
See https://storage.googleapis.com/go-build-log/f7b8de56/misc-vet-vetall_f833ec16.log
Consult https://build.golang.org/ to see whether it's a new failure. Other builds still in progress; subsequent failure notices suppressed until final report.
1 of 19 TryBots failed:
Failed on misc-vet-vetall: https://storage.googleapis.com/go-build-log/f7b8de56/misc-vet-vetall_f833ec16.log
Consult https://build.golang.org/ to see whether they are new failures.
Patch set 1:TryBot-Result -1
Tobias Klauser uploaded patch set #2 to this change.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Build is still in progress...
This change failed on misc-vet-vetall:
See https://storage.googleapis.com/go-build-log/d957ba8b/misc-vet-vetall_f21303af.log
Consult https://build.golang.org/ to see whether it's a new failure. Other builds still in progress; subsequent failure notices suppressed until final report.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
1 of 19 TryBots failed:
Failed on misc-vet-vetall: https://storage.googleapis.com/go-build-log/d957ba8b/misc-vet-vetall_f21303af.log
Consult https://build.golang.org/ to see whether they are new failures.
Patch set 2:TryBot-Result -1
Tobias Klauser uploaded patch set #3 to this change.
runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).
A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.
Test in runtime.osinit whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it.
Fixes #23687
Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
---
M src/runtime/defs_linux_386.go
M src/runtime/defs_linux_amd64.go
M src/runtime/defs_linux_arm.go
M src/runtime/defs_linux_arm64.go
M src/runtime/defs_linux_mips64x.go
M src/runtime/defs_linux_mipsx.go
M src/runtime/defs_linux_ppc64.go
M src/runtime/defs_linux_ppc64le.go
M src/runtime/defs_linux_s390x.go
M src/runtime/mem_linux.go
M src/runtime/os_linux.go
M src/runtime/stubs2.go
M src/runtime/sys_dragonfly_amd64.s
M src/runtime/sys_freebsd_386.s
M src/runtime/sys_freebsd_amd64.s
M src/runtime/sys_freebsd_arm.s
M src/runtime/sys_linux_386.s
M src/runtime/sys_linux_amd64.s
M src/runtime/sys_linux_arm.s
M src/runtime/sys_linux_arm64.s
M src/runtime/sys_linux_mips64x.s
M src/runtime/sys_linux_mipsx.s
M src/runtime/sys_linux_ppc64x.s
M src/runtime/sys_linux_s390x.s
M src/runtime/sys_netbsd_386.s
M src/runtime/sys_netbsd_amd64.s
M src/runtime/sys_netbsd_arm.s
M src/runtime/sys_openbsd_386.s
M src/runtime/sys_openbsd_amd64.s
29 files changed, 53 insertions(+), 33 deletions(-)
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Build is still in progress...
This change failed on misc-vet-vetall:
See https://storage.googleapis.com/go-build-log/8b71d9ad/misc-vet-vetall_6631b36b.log
Consult https://build.golang.org/ to see whether it's a new failure. Other builds still in progress; subsequent failure notices suppressed until final report.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
1 of 19 TryBots failed:
Failed on misc-vet-vetall: https://storage.googleapis.com/go-build-log/8b71d9ad/misc-vet-vetall_6631b36b.log
Consult https://build.golang.org/ to see whether they are new failures.
Patch set 3:TryBot-Result -1
Tobias Klauser uploaded patch set #4 to this change.
runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).
A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.
Test in runtime.osinit whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it.
Fixes #23687
Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
---
M src/runtime/defs_linux_386.go
M src/runtime/defs_linux_amd64.go
M src/runtime/defs_linux_arm.go
M src/runtime/defs_linux_arm64.go
M src/runtime/defs_linux_mips64x.go
M src/runtime/defs_linux_mipsx.go
M src/runtime/defs_linux_ppc64.go
M src/runtime/defs_linux_ppc64le.go
M src/runtime/defs_linux_s390x.go
M src/runtime/mem_linux.go
M src/runtime/os_linux.go
M src/runtime/stubs2.go
M src/runtime/sys_dragonfly_amd64.s
M src/runtime/sys_freebsd_386.s
M src/runtime/sys_freebsd_amd64.s
M src/runtime/sys_freebsd_arm.s
M src/runtime/sys_linux_386.s
M src/runtime/sys_linux_amd64.s
M src/runtime/sys_linux_arm.s
M src/runtime/sys_linux_arm64.s
M src/runtime/sys_linux_mips64x.s
M src/runtime/sys_linux_mipsx.s
M src/runtime/sys_linux_ppc64x.s
M src/runtime/sys_linux_s390x.s
M src/runtime/sys_netbsd_386.s
M src/runtime/sys_netbsd_amd64.s
M src/runtime/sys_netbsd_arm.s
M src/runtime/sys_openbsd_386.s
M src/runtime/sys_openbsd_amd64.s
M src/runtime/sys_openbsd_arm.s
30 files changed, 54 insertions(+), 35 deletions(-)
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Sorry for the TryBot mess. I now ran vetall locally and fixed the madvise implementations where applicable. Please take a look.
TryBots are happy.
Patch set 4:TryBot-Result +1
4 comments:
File src/runtime/defs_linux_386.go:
Patch Set #4, Line 21: _MADV_FREE = 0x8
May as well add the relevant line to defs_linux.go, too, even though we don't currently use that file.
File src/runtime/mem_linux.go:
Patch Set #4, Line 40: DONTNEED
Update comment--there is no DONTNEED below any more.
Patch Set #4, Line 283: p, err := mmap(nil, physPageSize, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
Since this happens for every Go programs, but calls to madvise(MADV_DONTNEED/MADV_FREE) are relatively rare, I have a mild preference for not doing anything here, initializing adviseUnused to MADV_FREE, using an atomic load to fetch adviseUnused, and, if madvise fails, and adviseUnused == MADV_FREE, using an atomic store to change it to MADV_DONTNEED and redoing the madvise.
File src/runtime/sys_dragonfly_amd64.s:
Patch Set #4, Line 263: MOVQ AX, ret+24(FP)
These system calls return two values: one in AX and one in the carry flag. Is it necessarily the case that AX == 0 if the carry flag is clear?
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Tobias Klauser uploaded patch set #5 to this change.
runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).
A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.
In sysUnused, test whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it.
Fixes #23687
Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
---
M src/runtime/defs2_linux.go
M src/runtime/defs_linux.go
M src/runtime/defs_linux_386.go
M src/runtime/defs_linux_amd64.go
M src/runtime/defs_linux_arm.go
M src/runtime/defs_linux_arm64.go
M src/runtime/defs_linux_mips64x.go
M src/runtime/defs_linux_mipsx.go
M src/runtime/defs_linux_ppc64.go
M src/runtime/defs_linux_ppc64le.go
M src/runtime/defs_linux_s390x.go
M src/runtime/mem_linux.go
M src/runtime/stubs2.go
M src/runtime/sys_dragonfly_amd64.s
M src/runtime/sys_freebsd_386.s
M src/runtime/sys_freebsd_amd64.s
M src/runtime/sys_freebsd_arm.s
M src/runtime/sys_linux_386.s
M src/runtime/sys_linux_amd64.s
M src/runtime/sys_linux_arm.s
M src/runtime/sys_linux_arm64.s
M src/runtime/sys_linux_mips64x.s
M src/runtime/sys_linux_mipsx.s
M src/runtime/sys_linux_ppc64x.s
M src/runtime/sys_linux_s390x.s
M src/runtime/sys_netbsd_386.s
M src/runtime/sys_netbsd_amd64.s
M src/runtime/sys_netbsd_arm.s
M src/runtime/sys_openbsd_386.s
M src/runtime/sys_openbsd_amd64.s
M src/runtime/sys_openbsd_arm.s
31 files changed, 77 insertions(+), 37 deletions(-)
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Thanks for the review.
4 comments:
Patch Set #4, Line 21: _MADV_FREE = 0x8
May as well add the relevant line to defs_linux. […]
Done. Also added MADV_HUGEPAGE and MDAV_NOHUGEPAGE. And added them to defs2_linux.go as well.
File src/runtime/mem_linux.go:
Update comment--there is no DONTNEED below any more.
Done
Patch Set #4, Line 283: func getRandomData(r []byte) {
Since this happens for every Go programs, but calls to madvise(MADV_DONTNEED/MADV_FREE) are relative […]
Thanks, I didn't consider that these calls are relatively rare. Now updates as you suggest.
File src/runtime/sys_dragonfly_amd64.s:
Patch Set #4, Line 263: JCC 2(PC)
These system calls return two values: one in AX and one in the carry flag. […]
Oops. Not sure if that's necessarily the case. Now updated to check the carry flag on all BSDs.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
TryBots are happy.
Patch set 5:TryBot-Result +1
Patch set 5:Code-Review +2
1 comment:
File src/runtime/mem_linux.go:
Patch Set #5, Line 109: if err := madvise(v, n, int32(advise)); advise == _MADV_FREE && err != 0 {
Since err here is not type error but is an errno value, I would call it errno rather than err.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
1 comment:
Patch Set #5, Line 109: if err := madvise(v, n, int32(advise)); advise == _MADV_FREE && err != 0 {
Since err here is not type error but is an errno value, I would call it errno rather than err.
Done
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
Build is still in progress...
This change failed on linux-amd64:
See https://storage.googleapis.com/go-build-log/c4f51def/linux-amd64_0fcbdc6b.log
Consult https://build.golang.org/ to see whether it's a new failure. Other builds still in progress; subsequent failure notices suppressed until final report.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
1 of 19 TryBots failed:
Failed on linux-amd64: https://storage.googleapis.com/go-build-log/c4f51def/linux-amd64_0fcbdc6b.log
Consult https://build.golang.org/ to see whether they are new failures.
Patch set 6:TryBot-Result -1
Tobias Klauser uploaded patch set #7 to this change.
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
TryBots beginning. Status page: https://farmer.golang.org/try?commit=3dafa6f1
TryBots are happy.
Patch set 7:TryBot-Result +1
Tobias Klauser merged this change.
runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).
A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.
In sysUnused, test whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it.
Fixes #23687
Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
Reviewed-on: https://go-review.googlesource.com/135395
Run-TryBot: Tobias Klauser <tobias....@gmail.com>
TryBot-Result: Gobot Gobot <go...@golang.org>
Reviewed-by: Ian Lance Taylor <ia...@golang.org>
diff --git a/src/runtime/defs2_linux.go b/src/runtime/defs2_linux.go
index c10dfb8..b08c0da 100644
--- a/src/runtime/defs2_linux.go
+++ b/src/runtime/defs2_linux.go
@@ -58,7 +58,10 @@
MAP_PRIVATE = C.MAP_PRIVATE
MAP_FIXED = C.MAP_FIXED
- MADV_DONTNEED = C.MADV_DONTNEED
+ MADV_DONTNEED = C.MADV_DONTNEED
+ MADV_FREE = C.MADV_FREE
+ MADV_HUGEPAGE = C.MADV_HUGEPAGE
+ MADV_NOHUGEPAGE = C.MADV_HNOUGEPAGE
SA_RESTART = C.SA_RESTART
SA_ONSTACK = C.SA_ONSTACK
diff --git a/src/runtime/defs_linux.go b/src/runtime/defs_linux.go
index 553366a..2d81013 100644
--- a/src/runtime/defs_linux.go
+++ b/src/runtime/defs_linux.go
@@ -47,7 +47,10 @@
MAP_PRIVATE = C.MAP_PRIVATE
MAP_FIXED = C.MAP_FIXED
- MADV_DONTNEED = C.MADV_DONTNEED
+ MADV_DONTNEED = C.MADV_DONTNEED
+ MADV_FREE = C.MADV_FREE
+ MADV_HUGEPAGE = C.MADV_HUGEPAGE
+ MADV_NOHUGEPAGE = C.MADV_HNOUGEPAGE
SA_RESTART = C.SA_RESTART
SA_ONSTACK = C.SA_ONSTACK
diff --git a/src/runtime/defs_linux_386.go b/src/runtime/defs_linux_386.go
index a7e435f..0ebac17 100644
--- a/src/runtime/defs_linux_386.go
+++ b/src/runtime/defs_linux_386.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_amd64.go b/src/runtime/defs_linux_amd64.go
index e8c6a21..c0a0ef0 100644
--- a/src/runtime/defs_linux_amd64.go
+++ b/src/runtime/defs_linux_amd64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_arm.go b/src/runtime/defs_linux_arm.go
index 62ec8fa..43946bb 100644
--- a/src/runtime/defs_linux_arm.go
+++ b/src/runtime/defs_linux_arm.go
@@ -16,6 +16,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_arm64.go b/src/runtime/defs_linux_arm64.go
index c295bc0..c2cc281 100644
--- a/src/runtime/defs_linux_arm64.go
+++ b/src/runtime/defs_linux_arm64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_mips64x.go b/src/runtime/defs_linux_mips64x.go
index df11cb0..9dacd5d 100644
--- a/src/runtime/defs_linux_mips64x.go
+++ b/src/runtime/defs_linux_mips64x.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_mipsx.go b/src/runtime/defs_linux_mipsx.go
index 702fbb5..9532ac5 100644
--- a/src/runtime/defs_linux_mipsx.go
+++ b/src/runtime/defs_linux_mipsx.go
@@ -22,6 +22,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_ppc64.go b/src/runtime/defs_linux_ppc64.go
index 45363d1..5a4326d 100644
--- a/src/runtime/defs_linux_ppc64.go
+++ b/src/runtime/defs_linux_ppc64.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_ppc64le.go b/src/runtime/defs_linux_ppc64le.go
index 45363d1..5a4326d 100644
--- a/src/runtime/defs_linux_ppc64le.go
+++ b/src/runtime/defs_linux_ppc64le.go
@@ -18,6 +18,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/defs_linux_s390x.go b/src/runtime/defs_linux_s390x.go
index ab90723..a6cc9c4 100644
--- a/src/runtime/defs_linux_s390x.go
+++ b/src/runtime/defs_linux_s390x.go
@@ -19,6 +19,7 @@
_MAP_FIXED = 0x10
_MADV_DONTNEED = 0x4
+ _MADV_FREE = 0x8
_MADV_HUGEPAGE = 0xe
_MADV_NOHUGEPAGE = 0xf
diff --git a/src/runtime/mem_linux.go b/src/runtime/mem_linux.go
index 7aa4817..845f72d 100644
--- a/src/runtime/mem_linux.go
+++ b/src/runtime/mem_linux.go
@@ -5,6 +5,7 @@
package runtime
import (
+ "runtime/internal/atomic"
"runtime/internal/sys"
"unsafe"
)
@@ -34,10 +35,12 @@
return p
}
+var adviseUnused = uint32(_MADV_FREE)
+
func sysUnused(v unsafe.Pointer, n uintptr) {
// By default, Linux's "transparent huge page" support will
// merge pages into a huge page if there's even a single
- // present regular page, undoing the effects of the DONTNEED
+ // present regular page, undoing the effects of madvise(adviseUnused)
// below. On amd64, that means khugepaged can turn a single
// 4KB page to 2MB, bloating the process's RSS by as much as
// 512X. (See issue #8832 and Linux kernel bug
@@ -102,7 +105,13 @@
throw("unaligned sysUnused")
}
- madvise(v, n, _MADV_DONTNEED)
+ advise := atomic.Load(&adviseUnused)
+ if errno := madvise(v, n, int32(advise)); advise == _MADV_FREE && errno != 0 {
+ // MADV_FREE was added in Linux 4.5. Fall back to MADV_DONTNEED if it is
+ // not supported.
+ atomic.Store(&adviseUnused, _MADV_DONTNEED)
+ madvise(v, n, _MADV_DONTNEED)
+ }
}
func sysUsed(v unsafe.Pointer, n uintptr) {
diff --git a/src/runtime/stubs2.go b/src/runtime/stubs2.go
index 02249d0..c14db74 100644
--- a/src/runtime/stubs2.go
+++ b/src/runtime/stubs2.go
@@ -25,7 +25,8 @@
//go:noescape
func open(name *byte, mode, perm int32) int32
-func madvise(addr unsafe.Pointer, n uintptr, flags int32)
+// return value is only set on linux to be used in osinit()
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
// exitThread terminates the current thread, writing *wait = 0 when
// the stack is safe to reclaim.
diff --git a/src/runtime/sys_dragonfly_amd64.s b/src/runtime/sys_dragonfly_amd64.s
index f0eb5f4..b18e967 100644
--- a/src/runtime/sys_dragonfly_amd64.s
+++ b/src/runtime/sys_dragonfly_amd64.s
@@ -260,9 +260,11 @@
MOVL flags+16(FP), DX
MOVQ $75, AX // madvise
SYSCALL
- // ignore failure - maybe pages are locked
+ JCC 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+24(FP)
RET
-
+
TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
MOVQ new+0(FP), DI
MOVQ old+8(FP), SI
diff --git a/src/runtime/sys_freebsd_386.s b/src/runtime/sys_freebsd_386.s
index b8f685a..754689b 100644
--- a/src/runtime/sys_freebsd_386.s
+++ b/src/runtime/sys_freebsd_386.s
@@ -163,7 +163,9 @@
TEXT runtime·madvise(SB),NOSPLIT,$-4
MOVL $75, AX // madvise
INT $0x80
- // ignore failure - maybe pages are locked
+ JAE 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+12(FP)
RET
TEXT runtime·setitimer(SB), NOSPLIT, $-4
diff --git a/src/runtime/sys_freebsd_amd64.s b/src/runtime/sys_freebsd_amd64.s
index be191a0..55959b3 100644
--- a/src/runtime/sys_freebsd_amd64.s
+++ b/src/runtime/sys_freebsd_amd64.s
@@ -337,9 +337,11 @@
MOVL flags+16(FP), DX
MOVQ $75, AX // madvise
SYSCALL
- // ignore failure - maybe pages are locked
+ JCC 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+24(FP)
RET
-
+
TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
MOVQ new+0(FP), DI
MOVQ old+8(FP), SI
diff --git a/src/runtime/sys_freebsd_arm.s b/src/runtime/sys_freebsd_arm.s
index 93bf569..f347b9f 100644
--- a/src/runtime/sys_freebsd_arm.s
+++ b/src/runtime/sys_freebsd_arm.s
@@ -264,14 +264,15 @@
RET
TEXT runtime·madvise(SB),NOSPLIT,$0
- MOVW addr+0(FP), R0 // arg 1 addr
- MOVW n+4(FP), R1 // arg 2 len
- MOVW flags+8(FP), R2 // arg 3 flags
- MOVW $SYS_madvise, R7
- SWI $0
- // ignore failure - maybe pages are locked
+ MOVW addr+0(FP), R0 // arg 1 addr
+ MOVW n+4(FP), R1 // arg 2 len
+ MOVW flags+8(FP), R2 // arg 3 flags
+ MOVW $SYS_madvise, R7
+ SWI $0
+ MOVW.CS $-1, R0
+ MOVW R0, ret+12(FP)
RET
-
+
TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
MOVW new+0(FP), R0
MOVW old+4(FP), R1
diff --git a/src/runtime/sys_linux_386.s b/src/runtime/sys_linux_386.s
index 4e914f3..40b55a6 100644
--- a/src/runtime/sys_linux_386.s
+++ b/src/runtime/sys_linux_386.s
@@ -427,7 +427,7 @@
MOVL n+4(FP), CX
MOVL flags+8(FP), DX
INVOKE_SYSCALL
- // ignore failure - maybe pages are locked
+ MOVL AX, ret+12(FP)
RET
// int32 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_amd64.s b/src/runtime/sys_linux_amd64.s
index 4492dad..7e84637 100644
--- a/src/runtime/sys_linux_amd64.s
+++ b/src/runtime/sys_linux_amd64.s
@@ -519,7 +519,7 @@
MOVL flags+16(FP), DX
MOVQ $SYS_madvise, AX
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVL AX, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_arm.s b/src/runtime/sys_linux_arm.s
index a709c4c..43a5833 100644
--- a/src/runtime/sys_linux_arm.s
+++ b/src/runtime/sys_linux_arm.s
@@ -195,7 +195,7 @@
MOVW flags+8(FP), R2
MOVW $SYS_madvise, R7
SWI $0
- // ignore failure - maybe pages are locked
+ MOVW R0, ret+12(FP)
RET
TEXT runtime·setitimer(SB),NOSPLIT,$0
diff --git a/src/runtime/sys_linux_arm64.s b/src/runtime/sys_linux_arm64.s
index 086c8dd..8b344be 100644
--- a/src/runtime/sys_linux_arm64.s
+++ b/src/runtime/sys_linux_arm64.s
@@ -401,7 +401,7 @@
MOVW flags+16(FP), R2
MOVD $SYS_madvise, R8
SVC
- // ignore failure - maybe pages are locked
+ MOVW R0, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_mips64x.s b/src/runtime/sys_linux_mips64x.s
index 337299b..c45703d 100644
--- a/src/runtime/sys_linux_mips64x.s
+++ b/src/runtime/sys_linux_mips64x.s
@@ -291,7 +291,7 @@
MOVW flags+16(FP), R6
MOVV $SYS_madvise, R2
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_mipsx.s b/src/runtime/sys_linux_mipsx.s
index dca5f1e..f362b0f 100644
--- a/src/runtime/sys_linux_mipsx.s
+++ b/src/runtime/sys_linux_mipsx.s
@@ -302,13 +302,13 @@
UNDEF // crash
RET
-TEXT runtime·madvise(SB),NOSPLIT,$0-12
+TEXT runtime·madvise(SB),NOSPLIT,$0-16
MOVW addr+0(FP), R4
MOVW n+4(FP), R5
MOVW flags+8(FP), R6
MOVW $SYS_madvise, R2
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+12(FP)
RET
// int32 futex(int32 *uaddr, int32 op, int32 val, struct timespec *timeout, int32 *uaddr2, int32 val2);
diff --git a/src/runtime/sys_linux_ppc64x.s b/src/runtime/sys_linux_ppc64x.s
index 7c2f8ea..ed79b69 100644
--- a/src/runtime/sys_linux_ppc64x.s
+++ b/src/runtime/sys_linux_ppc64x.s
@@ -454,7 +454,7 @@
MOVD n+8(FP), R4
MOVW flags+16(FP), R5
SYSCALL $SYS_madvise
- // ignore failure - maybe pages are locked
+ MOVW R3, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_linux_s390x.s b/src/runtime/sys_linux_s390x.s
index 95401af..c79ceea 100644
--- a/src/runtime/sys_linux_s390x.s
+++ b/src/runtime/sys_linux_s390x.s
@@ -290,7 +290,7 @@
MOVW flags+16(FP), R4
MOVW $SYS_madvise, R1
SYSCALL
- // ignore failure - maybe pages are locked
+ MOVW R2, ret+24(FP)
RET
// int64 futex(int32 *uaddr, int32 op, int32 val,
diff --git a/src/runtime/sys_netbsd_386.s b/src/runtime/sys_netbsd_386.s
index 4042ab4..66f4620 100644
--- a/src/runtime/sys_netbsd_386.s
+++ b/src/runtime/sys_netbsd_386.s
@@ -135,7 +135,9 @@
TEXT runtime·madvise(SB),NOSPLIT,$-4
MOVL $75, AX // sys_madvise
INT $0x80
- // ignore failure - maybe pages are locked
+ JAE 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+12(FP)
RET
TEXT runtime·setitimer(SB),NOSPLIT,$-4
diff --git a/src/runtime/sys_netbsd_amd64.s b/src/runtime/sys_netbsd_amd64.s
index 11b9c1b..5523659 100644
--- a/src/runtime/sys_netbsd_amd64.s
+++ b/src/runtime/sys_netbsd_amd64.s
@@ -319,7 +319,9 @@
MOVL flags+16(FP), DX // arg 3 - behav
MOVQ $75, AX // sys_madvise
SYSCALL
- // ignore failure - maybe pages are locked
+ JCC 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+24(FP)
RET
TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
diff --git a/src/runtime/sys_netbsd_arm.s b/src/runtime/sys_netbsd_arm.s
index 6b2c5a8..304075f 100644
--- a/src/runtime/sys_netbsd_arm.s
+++ b/src/runtime/sys_netbsd_arm.s
@@ -284,11 +284,12 @@
RET
TEXT runtime·madvise(SB),NOSPLIT,$0
- MOVW addr+0(FP), R0 // arg 1 - addr
- MOVW n+4(FP), R1 // arg 2 - len
- MOVW flags+8(FP), R2 // arg 3 - behav
- SWI $0xa0004b // sys_madvise
- // ignore failure - maybe pages are locked
+ MOVW addr+0(FP), R0 // arg 1 - addr
+ MOVW n+4(FP), R1 // arg 2 - len
+ MOVW flags+8(FP), R2 // arg 3 - behav
+ SWI $0xa0004b // sys_madvise
+ MOVW.CS $-1, R0
+ MOVW R0, ret+12(FP)
RET
TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
diff --git a/src/runtime/sys_openbsd_386.s b/src/runtime/sys_openbsd_386.s
index 21f13c8..8e34ab4 100644
--- a/src/runtime/sys_openbsd_386.s
+++ b/src/runtime/sys_openbsd_386.s
@@ -136,7 +136,8 @@
MOVL $75, AX // sys_madvise
INT $0x80
JAE 2(PC)
- MOVL $0xf1, 0xf1 // crash
+ MOVL $-1, AX
+ MOVL AX, ret+12(FP)
RET
TEXT runtime·setitimer(SB),NOSPLIT,$-4
diff --git a/src/runtime/sys_openbsd_amd64.s b/src/runtime/sys_openbsd_amd64.s
index 38ac38d..227e818 100644
--- a/src/runtime/sys_openbsd_amd64.s
+++ b/src/runtime/sys_openbsd_amd64.s
@@ -305,7 +305,9 @@
MOVL flags+16(FP), DX // arg 3 - behav
MOVQ $75, AX // sys_madvise
SYSCALL
- // ignore failure - maybe pages are locked
+ JCC 2(PC)
+ MOVL $-1, AX
+ MOVL AX, ret+24(FP)
RET
TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
diff --git a/src/runtime/sys_openbsd_arm.s b/src/runtime/sys_openbsd_arm.s
index ff1c1da..52d3638 100644
--- a/src/runtime/sys_openbsd_arm.s
+++ b/src/runtime/sys_openbsd_arm.s
@@ -143,8 +143,8 @@
MOVW flags+8(FP), R2 // arg 2 - flags
MOVW $75, R12 // sys_madvise
SWI $0
- MOVW.CS $0, R8 // crash on syscall failure
- MOVW.CS R8, (R8)
+ MOVW.CS $-1, R0
+ MOVW R0, ret+12(FP)
RET
TEXT runtime·setitimer(SB),NOSPLIT,$0
To view, visit change 135395. To unsubscribe, or for help writing mail filters, visit settings.
This certainly seems like the right change, but I think it has a bunch of unintended consequences because the implementation of MADV_FREE just isn't very good right now. This post summarizes the problems pretty well: https://lwn.net/Articles/713037/
Michael (CC'd) just ran into this in his work to reduce process RSS when there's large object fragmentation. When we MADV_FREE pages instead of MADV_DONTNEEDing them, this process RSS doesn't go down. This is really confusing, and Linux is surprisingly stingy about actually freeing these pages. If swap is off, which is becoming pretty common, it won't free them at all (which seems like an obvious bug, but is the current state of things). Even if swap is on, Linux will prefer swapping out file pages before it drops MADV_FREE pages. I haven't confirmed this, but I suspect that given all of these issues, Go running in a container may just get OOM-killed instead of ever freeing memory.
Given all of this, I'm afraid we may want to disable MADV_FREE for now, until Linux can fix these implementation issues.
Patch Set 8:
This certainly seems like the right change, but I think it has a bunch of unintended consequences because the implementation of MADV_FREE just isn't very good right now. This post summarizes the problems pretty well: https://lwn.net/Articles/713037/
Michael (CC'd) just ran into this in his work to reduce process RSS when there's large object fragmentation. When we MADV_FREE pages instead of MADV_DONTNEEDing them, this process RSS doesn't go down. This is really confusing, and Linux is surprisingly stingy about actually freeing these pages. If swap is off, which is becoming pretty common, it won't free them at all (which seems like an obvious bug, but is the current state of things). Even if swap is on, Linux will prefer swapping out file pages before it drops MADV_FREE pages. I haven't confirmed this, but I suspect that given all of these issues, Go running in a container may just get OOM-killed instead of ever freeing memory.
Given all of this, I'm afraid we may want to disable MADV_FREE for now, until Linux can fix these implementation issues.
I did much more investigation on this issue (using https://gist.github.com/aclements/528629d7ff304c2981974c7d00f5d8d8).
It seems Linux has fixed the most egregious issues of the original implementation. It now works with swap off and will release MADV_FREE pages before swapping anything else out. I tested this in a memory-limited container as well.
It still doesn't reduce RSS until the pages are actually freed. I'm worried that this may confuse users and possibly confuse systems monitoring RSS.
Given the possibility for problems, I'm wondering if we should provide a GODEBUG escape hatch, at least for now. Then, if people have problems, we have a low-overhead way to verify the problem and a workaround for users, and we'll get a sense of where (and if) there are still sharp edges to this.
RELNOTE=On Linux, the Go runtime now releases memory only when the OS is under memory pressure. This is more efficient, but means a process's RSS (resident set size) won't decrease unless the OS is running out of memory.