[go] runtime: add vector implementation of memclrNoHeapPointers for riscv64

14 views
Skip to first unread message

Joel Sing (Gerrit)

unread,
Feb 12, 2025, 7:43:08 AMFeb 12
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Joel Sing has uploaded the change for review

Commit message

runtime: add vector implementation of memclrNoHeapPointers for riscv64
Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1

Change diff

diff --git a/src/runtime/cpuflags.go b/src/runtime/cpuflags.go
index e81e50f..35dce45 100644
--- a/src/runtime/cpuflags.go
+++ b/src/runtime/cpuflags.go
@@ -11,16 +11,18 @@

// Offsets into internal/cpu records for use in assembly.
const (
- offsetX86HasAVX = unsafe.Offsetof(cpu.X86.HasAVX)
- offsetX86HasAVX2 = unsafe.Offsetof(cpu.X86.HasAVX2)
- offsetX86HasERMS = unsafe.Offsetof(cpu.X86.HasERMS)
- offsetX86HasRDTSCP = unsafe.Offsetof(cpu.X86.HasRDTSCP)
-
offsetARMHasIDIVA = unsafe.Offsetof(cpu.ARM.HasIDIVA)

offsetMIPS64XHasMSA = unsafe.Offsetof(cpu.MIPS64X.HasMSA)

offsetLOONG64HasLSX = unsafe.Offsetof(cpu.Loong64.HasLSX)
+
+ offsetRISCV64HasV = unsafe.Offsetof(cpu.RISCV64.HasV)
+
+ offsetX86HasAVX = unsafe.Offsetof(cpu.X86.HasAVX)
+ offsetX86HasAVX2 = unsafe.Offsetof(cpu.X86.HasAVX2)
+ offsetX86HasERMS = unsafe.Offsetof(cpu.X86.HasERMS)
+ offsetX86HasRDTSCP = unsafe.Offsetof(cpu.X86.HasRDTSCP)
)

var (
diff --git a/src/runtime/memclr_riscv64.s b/src/runtime/memclr_riscv64.s
index 16c511c..ead0fe0 100644
--- a/src/runtime/memclr_riscv64.s
+++ b/src/runtime/memclr_riscv64.s
@@ -2,6 +2,8 @@
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

+#include "asm_riscv64.h"
+#include "go_asm.h"
#include "textflag.h"

// See memclrNoHeapPointers Go doc for important implementation constraints.
@@ -15,6 +17,30 @@
MOV $8, X9
BLT X11, X9, check4

+#ifndef hasV
+ MOVB internal∕cpu·RISCV64+const_offsetRISCV64HasV(SB), X5
+ BEQZ X5, memclr_scalar
+#endif
+
+ // Use vector if not 8 byte aligned.
+ AND $7, X10, X5
+ BNEZ X5, vector_loop
+
+ // Use scalar if 8 byte aligned and <= 64 bytes.
+ SUB $64, X11, X6
+ BLEZ X6, aligned
+
+ PCALIGN $16
+vector_loop:
+ VSETVLI X11, E8, M8, TA, MA, X5
+ VMVVI $0, V8
+ VSE8V V8, (X10)
+ ADD X5, X10
+ SUB X5, X11
+ BNEZ X11, vector_loop
+ RET
+
+memclr_scalar:
// Check alignment
AND $7, X10, X5
BEQZ X5, aligned
@@ -37,6 +63,8 @@
BLT X11, X9, zero16
MOV $64, X9
BLT X11, X9, zero32
+
+ PCALIGN $16
loop64:
MOV ZERO, 0(X10)
MOV ZERO, 8(X10)

Change information

Files:
  • M src/runtime/cpuflags.go
  • M src/runtime/memclr_riscv64.s
Change size: S
Delta: 2 files changed, 35 insertions(+), 5 deletions(-)
Open in Gerrit

Related details

Attention set is empty
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newchange
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
Gerrit-Change-Number: 648857
Gerrit-PatchSet: 1
Gerrit-Owner: Joel Sing <jo...@sing.id.au>
unsatisfied_requirement
satisfied_requirement
open
diffy

Mark Ryan (Gerrit)

unread,
Feb 12, 2025, 8:06:20 AMFeb 12
to Joel Sing, goph...@pubsubhelper.golang.org, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
Attention needed from Joel Sing and Meng Zhuo

Mark Ryan added 1 comment

File src/runtime/memclr_riscv64.s
Line 36, Patchset 1 (Latest): VMVVI $0, V8
Mark Ryan . unresolved

The VMVI can be moved out of the loop.

Open in Gerrit

Related details

Attention is currently required from:
  • Joel Sing
  • Meng Zhuo
Submit Requirements:
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    • requirement is not satisfiedTryBots-Pass
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
    Gerrit-Change-Number: 648857
    Gerrit-PatchSet: 1
    Gerrit-Owner: Joel Sing <jo...@sing.id.au>
    Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
    Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
    Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
    Gerrit-Attention: Joel Sing <jo...@sing.id.au>
    Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
    Gerrit-Comment-Date: Wed, 12 Feb 2025 13:06:13 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    unsatisfied_requirement
    open
    diffy

    Joel Sing (Gerrit)

    unread,
    Feb 12, 2025, 9:13:52 AMFeb 12
    to goph...@pubsubhelper.golang.org, Mark Ryan, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
    Attention needed from Mark Ryan and Meng Zhuo

    Joel Sing added 1 comment

    File src/runtime/memclr_riscv64.s
    Mark Ryan . unresolved

    The VMVI can be moved out of the loop.

    Joel Sing

    Maybe... the catch is we would need to make an equivalent `VSETVLI` call as the M8 is effectively splatting $0 into V8..V15 (with the number of elements also controlled by the current X11 value) - I also wanted to confirm what the required behaviour is if we VMVVI with one X11 value, then use it with a different value (not sure if you know the answer to this). At least for now it seemed easier/safer to call VMMVI in the loop (and it's not going to be the slow part of the code).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mark Ryan
    • Meng Zhuo
    Submit Requirements:
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    • requirement is not satisfiedTryBots-Pass
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
    Gerrit-Change-Number: 648857
    Gerrit-PatchSet: 1
    Gerrit-Owner: Joel Sing <jo...@sing.id.au>
    Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
    Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
    Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
    Gerrit-Attention: Mark Ryan <mark...@rivosinc.com>
    Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
    Gerrit-Comment-Date: Wed, 12 Feb 2025 14:13:44 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mark Ryan <mark...@rivosinc.com>
    unsatisfied_requirement
    open
    diffy

    Mark Ryan (Gerrit)

    unread,
    Feb 12, 2025, 9:53:47 AMFeb 12
    to Joel Sing, goph...@pubsubhelper.golang.org, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
    Attention needed from Joel Sing and Meng Zhuo

    Mark Ryan voted and added 2 comments

    Votes added by Mark Ryan

    Code-Review+1

    2 comments

    Patchset-level comments
    File src/runtime/memclr_riscv64.s
    Mark Ryan . resolved

    The VMVI can be moved out of the loop.

    Joel Sing

    Maybe... the catch is we would need to make an equivalent `VSETVLI` call as the M8 is effectively splatting $0 into V8..V15 (with the number of elements also controlled by the current X11 value) - I also wanted to confirm what the required behaviour is if we VMVVI with one X11 value, then use it with a different value (not sure if you know the answer to this). At least for now it seemed easier/safer to call VMMVI in the loop (and it's not going to be the slow part of the code).

    Mark Ryan

    Yes, sorry, ignore my comment. Still trying to get my head around vector.

    I also wanted to confirm what the required behaviour is if we VMVVI with one X11 value, then use it with a different value (not sure if you know the answer to this)

    I'd say it should work, but I agree, not worth it. One thing we could consider trying/benchmarking in the future is whole vector loads and stores. This would allow us to just set v8 once outside the loop, but we'd need CSR support for this (to get the vlenb), and possibly to manually unroll and we'd also need one call to vsetvli to process the tail, so possibly not worth it either.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Joel Sing
    • Meng Zhuo
    Submit Requirements:
      • requirement is not satisfiedCode-Review
      • requirement satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 1
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-Comment-Date: Wed, 12 Feb 2025 14:53:40 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      Comment-In-Reply-To: Mark Ryan <mark...@rivosinc.com>
      Comment-In-Reply-To: Joel Sing <jo...@sing.id.au>
      unsatisfied_requirement
      satisfied_requirement
      open
      diffy

      Mark Ryan (Gerrit)

      unread,
      Feb 28, 2025, 12:17:26 PMFeb 28
      to Joel Sing, goph...@pubsubhelper.golang.org, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Mark Ryan voted and added 1 comment

      Votes added by Mark Ryan

      Code-Review+1

      1 comment

      Patchset-level comments
      File-level comment, Patchset 5 (Latest):
      Mark Ryan . resolved

      Tested on a Banana Pi. Seeing a geomean improvement of -10.32% on the Memclr tests.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement is not satisfiedCode-Review
      • requirement satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 5
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-Comment-Date: Fri, 28 Feb 2025 17:17:19 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      unsatisfied_requirement
      satisfied_requirement
      open
      diffy

      Joel Sing (Gerrit)

      unread,
      Mar 29, 2025, 9:09:25 AMMar 29
      to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Joel Sing uploaded new patchset

      Joel Sing uploaded patch set #6 to this change.
      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement is not satisfiedCode-Review
      • requirement satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: newpatchset
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 6
      unsatisfied_requirement
      satisfied_requirement
      open
      diffy

      Meng Zhuo (Gerrit)

      unread,
      Oct 24, 2025, 4:31:53 AMOct 24
      to Joel Sing, goph...@pubsubhelper.golang.org, Meng Zhuo, Mark Ryan, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing

      Meng Zhuo voted

      Code-Review+2
      Commit-Queue+1
      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      Submit Requirements:
      • requirement satisfiedCode-Review
      • requirement satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 10
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Comment-Date: Fri, 24 Oct 2025 08:31:43 +0000
      Gerrit-HasComments: No
      Gerrit-Has-Labels: Yes
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Joel Sing (Gerrit)

      unread,
      Nov 26, 2025, 9:38:11 AMNov 26
      to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Joel Sing uploaded new patchset

      Joel Sing uploaded patch set #11 to this change.
      Following approvals got outdated and were removed:
      • TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI
      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement satisfiedCode-Review
      • requirement satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: newpatchset
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 11
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Boyao Wang (Gerrit)

      unread,
      Dec 11, 2025, 1:57:00 AM (6 days ago) Dec 11
      to Joel Sing, goph...@pubsubhelper.golang.org, Go LUCI, Meng Zhuo, Mark Ryan, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Boyao Wang added 1 comment

      File src/runtime/memclr_riscv64.s

      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      BNEZ X11, vector_loop
      RET
      Boyao Wang . unresolved
      There is no need to issue VMVI $0, V8 in every iteration. Perhaps we can initialize V8 once before entering the main vector loop.
      ```suggestion
      vector_loop:

      VSETVLI X11, E8, M8, TA, MA, X5
      	VMVVI	$0, V8
      vector_loop_body:
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11

      VSETVLI X11, E8, M8, TA, MA, X5
      	BNEZ	X11, vector_loop_body
      RET
      ```
      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 11
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Boyao Wang <wang...@bytedance.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-Comment-Date: Thu, 11 Dec 2025 06:56:52 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Mark Ryan (Gerrit)

      unread,
      Dec 11, 2025, 3:15:51 AM (5 days ago) Dec 11
      to Joel Sing, goph...@pubsubhelper.golang.org, Boyao Wang, Go LUCI, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Boyao Wang, Joel Sing and Meng Zhuo

      Mark Ryan voted and added 1 comment

      Votes added by Mark Ryan

      Code-Review+1

      1 comment

      File src/runtime/memclr_riscv64.s
      Line 34, Patchset 11 (Latest):vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      BNEZ X11, vector_loop
      RET
      Boyao Wang . unresolved
      There is no need to issue VMVI $0, V8 in every iteration. Perhaps we can initialize V8 once before entering the main vector loop.
      ```suggestion
      vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      vector_loop_body:
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      VSETVLI X11, E8, M8, TA, MA, X5
      BNEZ X11, vector_loop_body
      RET
      ```
      Mark Ryan

      @wang...@bytedance.com do you know whether these changes measurably boost the performance of the code? We did discuss doing something like this in the comments above but weren't sure how much of a difference it would make.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Boyao Wang
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 11
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Boyao Wang <wang...@bytedance.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-Attention: Boyao Wang <wang...@bytedance.com>
      Gerrit-Comment-Date: Thu, 11 Dec 2025 08:15:42 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      Comment-In-Reply-To: Boyao Wang <wang...@bytedance.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Boyao Wang (Gerrit)

      unread,
      Dec 11, 2025, 3:31:36 AM (5 days ago) Dec 11
      to Joel Sing, goph...@pubsubhelper.golang.org, Go LUCI, Meng Zhuo, Mark Ryan, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Boyao Wang added 1 comment

      File src/runtime/memclr_riscv64.s
      Line 34, Patchset 11 (Latest):vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      BNEZ X11, vector_loop
      RET
      Boyao Wang . unresolved
      There is no need to issue VMVI $0, V8 in every iteration. Perhaps we can initialize V8 once before entering the main vector loop.
      ```suggestion
      vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      vector_loop_body:
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      VSETVLI X11, E8, M8, TA, MA, X5
      BNEZ X11, vector_loop_body
      RET
      ```
      Mark Ryan

      @wang...@bytedance.com do you know whether these changes measurably boost the performance of the code? We did discuss doing something like this in the comments above but weren't sure how much of a difference it would make.

      Boyao Wang

      Sorry, I didn’t look closely at the comments that were already marked as resolved. I used a similar approach in glibc’s memset implementation, and I can confirm that moving VMVVI out of the loop in this way is functionally correct. Perhaps the gains observed in the glibc memset-zero benchtest can be used as a reference? I can share the results later.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Joel Sing
      • Meng Zhuo
      Submit Requirements:
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      • requirement is not satisfiedReview-Enforcement
      • requirement is not satisfiedTryBots-Pass
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I95fba5f963d3972f16f09c1c167461d79bcecae1
      Gerrit-Change-Number: 648857
      Gerrit-PatchSet: 11
      Gerrit-Owner: Joel Sing <jo...@sing.id.au>
      Gerrit-Reviewer: Mark Ryan <mark...@rivosinc.com>
      Gerrit-Reviewer: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-CC: Boyao Wang <wang...@bytedance.com>
      Gerrit-CC: Pengcheng Wang <wangpeng...@bytedance.com>
      Gerrit-Attention: Joel Sing <jo...@sing.id.au>
      Gerrit-Attention: Meng Zhuo <mengzh...@gmail.com>
      Gerrit-Comment-Date: Thu, 11 Dec 2025 08:31:28 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Boyao Wang <wang...@bytedance.com>
      Comment-In-Reply-To: Mark Ryan <mark...@rivosinc.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Pengcheng Wang (Gerrit)

      unread,
      Dec 11, 2025, 3:33:51 AM (5 days ago) Dec 11
      to Joel Sing, goph...@pubsubhelper.golang.org, Boyao Wang, Go LUCI, Meng Zhuo, Mark Ryan, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Pengcheng Wang added 1 comment

      File src/runtime/memclr_riscv64.s
      Line 34, Patchset 11 (Latest):vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      BNEZ X11, vector_loop
      RET
      Boyao Wang . unresolved
      There is no need to issue VMVI $0, V8 in every iteration. Perhaps we can initialize V8 once before entering the main vector loop.
      ```suggestion
      vector_loop:
      VSETVLI X11, E8, M8, TA, MA, X5
      VMVVI $0, V8
      vector_loop_body:
      VSE8V V8, (X10)
      ADD X5, X10
      SUB X5, X11
      VSETVLI X11, E8, M8, TA, MA, X5
      BNEZ X11, vector_loop_body
      RET
      ```
      Mark Ryan

      @wang...@bytedance.com do you know whether these changes measurably boost the performance of the code? We did discuss doing something like this in the comments above but weren't sure how much of a difference it would make.

      Pengcheng Wang

      We can save one instruction by hoisting loop invariants out of the loop. This is what LLVM/GCC generate: https://godbolt.org/z/55Ej73ac9.

      Gerrit-Comment-Date: Thu, 11 Dec 2025 08:33:42 +0000
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Boyao Wang (Gerrit)

      unread,
      Dec 14, 2025, 10:23:21 PM (2 days ago) Dec 14
      to Joel Sing, goph...@pubsubhelper.golang.org, Go LUCI, Meng Zhuo, Mark Ryan, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Boyao Wang added 1 comment

      File src/runtime/memclr_riscv64.s
      Boyao Wang

      @mark...@rivosinc.com Based on testing BenchmarkMemclr on SpacemiT X60, moving VMVVI out of the loop provides a geomean performance improvement of around 0.2%.

      Gerrit-Comment-Date: Mon, 15 Dec 2025 03:23:13 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Pengcheng Wang <wangpeng...@bytedance.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Mark Ryan (Gerrit)

      unread,
      5:31 AM (9 hours ago) 5:31 AM
      to Joel Sing, goph...@pubsubhelper.golang.org, Boyao Wang, Go LUCI, Meng Zhuo, Pengcheng Wang, golang-co...@googlegroups.com
      Attention needed from Joel Sing and Meng Zhuo

      Mark Ryan added 1 comment

      File src/runtime/memclr_riscv64.s
      Mark Ryan

      Thanks for checking this, and thanks also Pengcheng for the llvm and gcc links. Looks like this is a viable optimisation.

      Gerrit-Comment-Date: Tue, 16 Dec 2025 10:31:44 +0000
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy
      Reply all
      Reply to author
      Forward
      0 new messages