[go] unicode/utf8.Valid: use AVX2 on amd64

59 views
Skip to first unread message

Thomas Pelletier (Gerrit)

unread,
Oct 16, 2023, 8:40:31 PM10/16/23
to goph...@pubsubhelper.golang.org, Achille Roussel, golang-co...@googlegroups.com

Thomas Pelletier has uploaded this change for review.

View Change

unicode/utf8.Valid: use AVX2 on amd64

This change aims to improve the performance of unicode/utf8.Valid and
ValidString using advanced vector extensions AVX2 on amd64 processors that
support them.

It implements the LOOKUP algorithm described in Keiser, J. and Lemire, D., 2021.
Validating UTF8 in less than one instruction per byte. Software: Practice and
Experience, 51(5), pp.950-964. https://arxiv.org/pdf/2010.03090.pdf

At a high level, this algorithm looks up each input byte into a virtual table of
potential UTF8 errors. This table is expressed as masks stored in vector
registers, allowing the processing of multiple bytes of inputs with a few
instructions. Though there is an initial cost to prepare the vector registers,
the AVX2 implementation outpaces the scalar version after around 8 bytes of
input:

goos: linux
goarch: amd64
pkg: unicode/utf8
cpu: AMD EPYC 9R14
scalar avx2
B/s B/s vs base
Valid/small8-64 1.244Gi 0% 1.444Gi 1% +16.13% (p=0.000 n=10)
Valid/small16-64 1.339Gi 0% 2.892Gi 1% +115.98% (p=0.000 n=10)
Valid/small24-64 1.352Gi 0% 4.331Gi 1% +220.42% (p=0.000 n=10)
Valid/small32-64 1.367Gi 0% 7.321Gi 0% +435.63% (p=0.000 n=10)
Valid/small40-64 1.356Gi 0% 5.979Gi 0% +341.01% (p=0.000 n=10)
Valid/small48-64 1.358Gi 0% 7.175Gi 0% +428.38% (p=0.000 n=10)
Valid/small56-64 1.354Gi 0% 8.368Gi 0% +518.19% (p=0.000 n=10)
Valid/small64-64 1.345Gi 0% 10.459Gi 0% +677.60% (p=0.000 n=10)
Valid/small72-64 1.354Gi 1% 8.670Gi 0% +540.46% (p=0.000 n=10)
Valid/small80-64 1.354Gi 0% 9.646Gi 0% +612.54% (p=0.000 n=10)
Valid/small88-64 1.351Gi 0% 10.606Gi 0% +685.26% (p=0.000 n=10)
Valid/small96-64 1.344Gi 0% 12.160Gi 1% +804.70% (p=0.000 n=10)
geomean 1.343Gi 6.468Gi +381.77%

As input size grows, vectorization increases speed-ups:

scalar avx2
B/s B/s vs base
Valid/1kValid-64 1.399Gi 0% 16.168Gi 0% +1056.06% (p=0.000 n=10)
Valid/1MValid-64 1.399Gi 0% 16.309Gi 0% +1065.48% (p=0.000 n=10)
Valid/1kASCII-64 12.03Gi 0% 47.49Gi 0% +294.59% (p=0.000 n=10)
Valid/1MASCII-64 13.54Gi 0% 53.96Gi 0% +298.59% (p=0.000 n=10)
geomean 4.226Gi 28.67Gi +578.49%

This implementation validates the whole input, even if it finds an invalid byte
early on. It reduces branching in the hot loop, which is beneficial under the
assumption that the input is more often than not valid.

A modified version of Avo generates the actual assembly code. It contains two
patches to support pre-processor macros and jumps outside a function. These
features are necessary to dynamically execute the Go fallback implementation and
skip the AVX2 availability check when compiling with GOAMD64>=v3. We have
submitted pull requests upstream.

When compiling with flags=purego or running on an amd64 CPU that does not
support AVX2, there is little overhead:

base patch
sec/op sec/op vs base
Valid/small0-64 1.899n 0% 2.171n 0% +14.32% (p=0.000 n=10)
Valid/small8-64 5.991n 0% 5.792n 0% -3.32% (p=0.000 n=10)
Valid/small16-64 11.13n 0% 11.10n 0% ~ (p=0.221 n=10)
Valid/small24-64 16.54n 0% 16.47n 0% -0.39% (p=0.000 n=10)
Valid/small32-64 21.80n 0% 21.89n 0% +0.41% (p=0.000 n=10)
Valid/small40-64 27.48n 0% 26.99n 0% -1.78% (p=0.000 n=10)
Valid/small48-64 32.92n 0% 32.92n 0% ~ (p=0.905 n=10)
Valid/small56-64 38.52n 0% 38.26n 0% -0.69% (p=0.000 n=10)
Valid/small64-64 44.31n 0% 43.79n 0% -1.20% (p=0.000 n=10)
Valid/small72-64 49.54n 1% 49.73n 0% +0.39% (p=0.041 n=10)
Valid/small80-64 55.03n 0% 54.94n 0% -0.17% (p=0.000 n=10)
Valid/small88-64 60.68n 0% 60.87n 0% +0.30% (p=0.000 n=10)
Valid/small96-64 66.52n 0% 66.33n 0% -0.29% (p=0.000 n=10)
geomean 23.78n 23.89n +0.49%

On non-amd64 architectures, the patch does not introduce significant performance
changes. For example, on arm64:

goos: darwin
goarch: arm64
pkg: unicode/utf8
base patch
B/s B/s vs base
Valid/small8-8 1.228Gi 0% 1.226Gi 0% -0.18% (p=0.002 n=10)
Valid/small16-8 1.296Gi 1% 1.301Gi 0% +0.43% (p=0.000 n=10)
Valid/small24-8 1.267Gi 0% 1.266Gi 0% -0.09% (p=0.043 n=10)
Valid/small32-8 1.251Gi 0% 1.252Gi 0% ~ (p=0.197 n=10)
Valid/small40-8 1.238Gi 0% 1.239Gi 0% +0.06% (p=0.014 n=10)
Valid/small48-8 1.234Gi 0% 1.235Gi 0% +0.08% (p=0.000 n=10)
Valid/small56-8 1.228Gi 0% 1.228Gi 0% ~ (p=0.255 n=10)
Valid/small64-8 1.224Gi 0% 1.223Gi 0% -0.13% (p=0.000 n=10)
Valid/small72-8 1.222Gi 0% 1.222Gi 0% ~ (p=0.190 n=10)
Valid/small80-8 1.219Gi 0% 1.218Gi 0% ~ (p=0.101 n=10)
Valid/small88-8 1.216Gi 0% 1.217Gi 0% ~ (p=0.123 n=10)
Valid/small96-8 1.214Gi 0% 1.215Gi 0% +0.06% (p=0.005 n=10)
geomean 1.236Gi 1.237Gi +0.03%

As pointed out in the original paper, this approach is not unique to AVX2 but to
any CPU-supporting vector operations. Supporting more architectures such as
ARM64/NEON should be straightforward, as the structure of the generator program
can be reused, as well as the definition of the lookup masks.

Fixes #63347

Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
---
M src/go/build/deps_test.go
A src/unicode/utf8/_asm/LICENSE
A src/unicode/utf8/_asm/go.mod
A src/unicode/utf8/_asm/go.sum
A src/unicode/utf8/_asm/valid_asm.go
A src/unicode/utf8/export_test.go
M src/unicode/utf8/utf8.go
M src/unicode/utf8/utf8_test.go
A src/unicode/utf8/valid.go
A src/unicode/utf8/valid_amd64.go
A src/unicode/utf8/valid_amd64.s
A src/unicode/utf8/valid_linux_test.go
A src/unicode/utf8/valid_noasm.go
A src/unicode/utf8/valid_test.go
14 files changed, 1,419 insertions(+), 135 deletions(-)

diff --git a/src/go/build/deps_test.go b/src/go/build/deps_test.go
index fcd5e93..db03111 100644
--- a/src/go/build/deps_test.go
+++ b/src/go/build/deps_test.go
@@ -46,9 +46,12 @@
internal/goexperiment, internal/goos,
internal/goversion, internal/nettrace, internal/platform,
log/internal,
- unicode/utf8, unicode/utf16, unicode,
+ unicode/utf16, unicode,
unsafe;

+ # unicode/utf8 uses AVX2 acceleration.
+ internal/cpu < unicode/utf8;
+
# These packages depend only on internal/goarch and unsafe.
internal/goarch, unsafe
< internal/abi;
diff --git a/src/unicode/utf8/_asm/LICENSE b/src/unicode/utf8/_asm/LICENSE
new file mode 100644
index 0000000..715bcd5
--- /dev/null
+++ b/src/unicode/utf8/_asm/LICENSE
@@ -0,0 +1,23 @@
+Original implementation at https://github.com/segmentio/asm/tree/d7e16ffc289d92df6e2c3df0ab55c67460825ac6/build/utf8
+
+MIT License
+
+Copyright (c) 2021 Segment
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/src/unicode/utf8/_asm/go.mod b/src/unicode/utf8/_asm/go.mod
new file mode 100644
index 0000000..fda1075
--- /dev/null
+++ b/src/unicode/utf8/_asm/go.mod
@@ -0,0 +1,16 @@
+module std/unicode/utf8/_asm
+
+go 1.21.0
+
+require github.com/mmcloughlin/avo v0.5.0
+
+require (
+ golang.org/x/mod v0.12.0 // indirect
+ golang.org/x/sys v0.12.0 // indirect
+ golang.org/x/tools v0.13.0 // indirect
+)
+
+// Use a patched version of Avo that adds support for preprocessor macros,
+// custom includes, and jumps to non-label locations. Should be removed when
+// the patch is merged and released.
+replace github.com/mmcloughlin/avo => github.com/pelletier/avo v0.0.0-20231012145902-e86a5cccf71f
diff --git a/src/unicode/utf8/_asm/go.sum b/src/unicode/utf8/_asm/go.sum
new file mode 100644
index 0000000..2446a8a
--- /dev/null
+++ b/src/unicode/utf8/_asm/go.sum
@@ -0,0 +1,10 @@
+github.com/pelletier/avo v0.0.0-20231012145902-e86a5cccf71f h1:ihPDNMsA2RplaKN+6sITuzxeRt22pXB30a5mP5G0HBk=
+github.com/pelletier/avo v0.0.0-20231012145902-e86a5cccf71f/go.mod h1:y/rwlCobjsY4vwU7Nq/sCfdeziAYBcwP7xaDyAvwQgc=
+golang.org/x/mod v0.12.0 h1:rmsUpXtvNzj340zd98LZ4KntptpfRHwpFOHG188oHXc=
+golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
+golang.org/x/sync v0.3.0 h1:ftCYgMx6zT/asHUrPw8BLLscYtGznsLAnjq5RH9P66E=
+golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
+golang.org/x/sys v0.12.0 h1:CM0HF96J0hcLAwsHPJZjfdNzs0gftsLfgKt57wWHJ0o=
+golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/tools v0.13.0 h1:Iey4qkscZuv0VvIt8E0neZjtPVQFSc870HQ448QgEmQ=
+golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
diff --git a/src/unicode/utf8/_asm/valid_asm.go b/src/unicode/utf8/_asm/valid_asm.go
new file mode 100644
index 0000000..f093302
--- /dev/null
+++ b/src/unicode/utf8/_asm/valid_asm.go
@@ -0,0 +1,545 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This program generates the AMD64 AVX2 implementation of utf8.Valid following
+// the LOOKUP algorithm described in Keiser, J. and Lemire, D., 2021.
+// Validating UTF‐8 in less than one instruction per byte. Software: Practice
+// and Experience, 51(5), pp.950-964. https://arxiv.org/pdf/2010.03090.pdf
+package main
+
+import (
+ "bytes"
+ "encoding/binary"
+ "fmt"
+
+ . "github.com/mmcloughlin/avo/build"
+ . "github.com/mmcloughlin/avo/operand"
+ . "github.com/mmcloughlin/avo/reg"
+)
+
+//go:generate go run . -out ../valid_amd64.s -pkg utf8
+
+func publicFunc(name, param, t, fallback string, d, n, o Register) {
+ TEXT(name, NOSPLIT, fmt.Sprintf("func(%s %s) bool", param, t))
+ Load(Param(param).Base(), d)
+ Load(Param(param).Len(), n)
+ ret, err := ReturnIndex(0).Resolve()
+ if err != nil {
+ panic(err)
+ }
+ LEAQ(ret.Addr, o)
+ Preprocessor("ifndef hasAVX2")
+ CMPB(NewDataAddr(Symbol{Name: "·hasAVX2"}, 0), U8(1))
+ JEQ(LabelRef("has_avx"))
+ JMP(NewDataAddr(Symbol{Name: fallback}, 0))
+ Preprocessor("endif")
+ Label("has_avx")
+ JMP(NewDataAddr(Symbol{Name: "·validBody"}, 0))
+}
+
+func main() {
+ ConstraintExpr("amd64,!purego")
+
+ Include("asm_amd64.h") // provides hasAVX2
+
+ // Those registers are used to call validBody from the public functions.
+ d := RSI // Pointer to bytes
+ n := RCX // Input length
+ o := RBX // Result address
+
+ // Visible functions that call into validBody.
+ publicFunc("ValidString", "s", "string", "·validStringDefault", d, n, o)
+ publicFunc("Valid", "p", "[]byte", "·validDefault", d, n, o)
+
+ // Private main processing function. See above for expected registers.
+ TEXT("validBody", NOSPLIT, "func()")
+
+ Comment("Prepare the lookup masks")
+
+ incompleteMask := ConstBytes("incomplete_mask", incompleteMaskData())
+ incompleteMaskY := YMM()
+ VMOVDQU(incompleteMask, incompleteMaskY)
+
+ continuation4Bytes := ConstBytes("cont4_vec", continuationMaskData(0b11110000))
+ continuation4BytesY := YMM()
+ VMOVDQU(continuation4Bytes, continuation4BytesY)
+
+ continuation3Bytes := ConstBytes("cont3_vec", continuationMaskData(0b11100000))
+ continuation3BytesY := YMM()
+ VMOVDQU(continuation3Bytes, continuation3BytesY)
+
+ nib1Data, nib2Data, nib3Data := nibbleMasksData()
+
+ nibble1Errors := ConstBytes("nibble1_errors", nib1Data)
+ nibble1Y := YMM()
+ VMOVDQU(nibble1Errors, nibble1Y)
+
+ nibble2Errors := ConstBytes("nibble2_errors", nib2Data)
+ nibble2Y := YMM()
+ VMOVDQU(nibble2Errors, nibble2Y)
+
+ nibble3Errors := ConstBytes("nibble3_errors", nib3Data)
+ nibble3Y := YMM()
+ VMOVDQU(nibble3Errors, nibble3Y)
+
+ lowerNibbleMask := constArray64("nibble_mask",
+ 0x0F0F0F0F0F0F0F0F,
+ 0x0F0F0F0F0F0F0F0F,
+ 0x0F0F0F0F0F0F0F0F,
+ 0x0F0F0F0F0F0F0F0F,
+ )
+
+ nibbleMaskY := YMM()
+ VMOVDQU(lowerNibbleMask, nibbleMaskY)
+
+ msbMask := constArray64("msb_mask",
+ 0x8080808080808080,
+ 0x8080808080808080,
+ 0x8080808080808080,
+ 0x8080808080808080,
+ )
+
+ msbMaskY := YMM()
+ VMOVDQU(msbMask, msbMaskY)
+
+ Comment("For the first pass, set the previous block as zero.")
+ previousBlockY := YMM()
+ zeroOutVector(previousBlockY)
+
+ Comment("Sticky error vector starts empty.")
+ errorY := YMM()
+ zeroOutVector(errorY)
+
+ Comment(`Zeroes the "previous block was incomplete" vector.`)
+ incompletePreviousBlockY := YMM()
+ zeroOutVector(incompletePreviousBlockY)
+
+ currentBlockY := YMM()
+
+ Comment("Top of the loop.")
+ Label("check_input")
+
+ Comment("If bytes left >= 32")
+ CMPQ(n, U8(32))
+ JL(LabelRef("tail_load"))
+
+ Comment("Process one 32B block of data")
+ Label("process")
+
+ Comment("Load the next block of bytes")
+ VMOVDQU(Mem{Base: d}, currentBlockY)
+ SUBQ(U8(32), n)
+ ADDQ(U8(32), d)
+
+ Label("loaded")
+
+ Comment("Fast check to see if ASCII")
+ tmp := GP32()
+ VPMOVMSKB(currentBlockY, tmp)
+ TESTL(tmp.As32(), tmp.As32())
+ JNZ(LabelRef("non_ascii"))
+
+ Comment("If this whole block is ASCII, there is nothing to do, and it is an error if any of the previous code point was incomplete.")
+ VPOR(errorY, incompletePreviousBlockY, errorY)
+ JMP(LabelRef("check_input"))
+
+ Label("non_ascii")
+
+ Comment("Prepare intermediate vector for push operations")
+ vp := YMM()
+ VPERM2I128(Imm(3), previousBlockY, currentBlockY, vp)
+
+ Comment("Check errors on the high nibble of the previous byte")
+ previousY := YMM()
+ VPALIGNR(Imm(15), vp, currentBlockY, previousY)
+
+ highPrev := highNibbles(previousY, nibbleMaskY)
+ VPSHUFB(highPrev, nibble1Y, highPrev)
+
+ Comment("Check errors on the low nibble of the previous byte")
+ lowPrev := lowNibbles(previousY, nibbleMaskY)
+ VPSHUFB(lowPrev, nibble2Y, lowPrev)
+ VPAND(lowPrev, highPrev, highPrev)
+
+ Comment("Check errors on the high nibble on the current byte")
+ highCurr := highNibbles(currentBlockY, nibbleMaskY)
+ VPSHUFB(highCurr, nibble3Y, highCurr)
+ VPAND(highCurr, highPrev, highPrev)
+
+ Comment("Find 3 bytes continuations")
+ off2 := YMM()
+ VPALIGNR(Imm(14), vp, currentBlockY, off2)
+ VPSUBUSB(continuation3BytesY, off2, off2)
+
+ Comment("Find 4 bytes continuations")
+ off3 := YMM()
+ VPALIGNR(Imm(13), vp, currentBlockY, off3)
+
+ VPSUBUSB(continuation4BytesY, off3, off3)
+
+ Comment("Combine them to have all continuations")
+ continuationBitsY := YMM()
+ VPOR(off2, off3, continuationBitsY)
+
+ Comment("Perform a byte-sized signed comparison with zero to turn any non-zero bytes into 0xFF.")
+ tmpY := zeroOutVector(YMM())
+ VPCMPGTB(tmpY, continuationBitsY, continuationBitsY)
+
+ Comment("Find bytes that are continuations by looking at their most significant bit.")
+ VPAND(msbMaskY, continuationBitsY, continuationBitsY)
+
+ Comment("Find mismatches between expected and actual continuation bytes")
+ VPXOR(continuationBitsY, highPrev, continuationBitsY)
+
+ Comment("Store result in sticky error")
+ VPOR(errorY, continuationBitsY, errorY)
+
+ Comment("Prepare for next iteration")
+ VPSUBUSB(incompleteMaskY, currentBlockY, incompletePreviousBlockY)
+ VMOVDQU(currentBlockY, previousBlockY)
+
+ Comment("End of loop")
+ JMP(LabelRef("check_input"))
+
+ Label("tail_load")
+ Comment("If < 32 bytes left")
+
+ Comment("Fast exit if done")
+ CMPQ(n, U8(0))
+ JE(LabelRef("end"))
+
+ // When loading the trailing block with a size of less than 32 Bytes, we
+ // need to be careful to not incur a page fault by touching unmapped or
+ // protected memory. There are 3 different situations:
+ //
+ // 1. No page boundary between the end of the input and the end of
+ // a 32B read.
+ // 2. There is a page boundary, and the remaining input is 16 bytes or
+ // less.
+ // 3. There is a page boundary, and the remaining input is more than
+ // 16 bytes.
+ //
+ // Situation 1 is the easiest: simply perform a 32B load blended with the
+ // zero vector.
+ //
+ // Situations 2 and 3 are conceptually the same. To avoid a potential page
+ // fault, we need to perform a 32B read that ends where the input ends, and
+ // shift bytes to have them start so that the current block begins where
+ // the previous iteration ended, padding the rest with zeroes. The
+ // situations differ because AVX2 32 bytes vectors are really made of two
+ // 16 bytes vectors (two "lanes"). A byte-wise shift operation across lanes
+ // requires a combinatin shuffle instructions.
+
+ Comment("If 0 < bytes left < 32")
+
+ zeroes := YMM()
+ zeroOutVector(zeroes)
+
+ Comment("Check if there is a page boundary between end of input and the next 32 bytes")
+
+ // On systems with larger pages, we would unecessarily take the slow path,
+ // but that's good enough for now.
+ const page = 4096
+
+ end := GP64()
+ LEAQ(Mem{Base: d, Index: n, Scale: 1, Disp: -1}, end) // p+n-1
+ ANDL(U32(page-1), end.As32()) // % page
+ ADDL(U32(32), end.As32()) // + 32
+ SUBQ(n, end) // -n
+ CMPL(end.As32(), U32(page))
+ JAE(LabelRef("page_boundary"))
+
+ // Situation 1
+ Comment("No page boundary, no problem")
+
+ {
+ halfBlendMaskBytes := make([]byte, 2*32)
+ for i := byte(0); i < 32; i++ {
+ halfBlendMaskBytes[i] = 0xFF
+ halfBlendMaskBytes[i+32] = 0x00
+ }
+ halfBlendMask := ConstBytes("half_blend_mask", halfBlendMaskBytes)
+
+ halfBlendMaskStartPtr := GP64()
+ LEAQ(halfBlendMask.Offset(32), halfBlendMaskStartPtr)
+ SUBQ(n, halfBlendMaskStartPtr)
+
+ blend := YMM()
+ VMOVDQU(Mem{Base: halfBlendMaskStartPtr}, blend)
+ VPBLENDVB(blend, Mem{Base: d}, zeroes, currentBlockY)
+ }
+
+ XORQ(n, n)
+ JMP(LabelRef("loaded"))
+
+ // Situation 2 or 3
+ Label("page_boundary")
+
+ offset := GP64()
+ shuffleMaskPtr := GP64()
+ shuffle := YMM()
+ tmp1 := YMM()
+
+ MOVQ(U64(32), offset)
+ SUBQ(n, offset)
+ SUBQ(offset, d)
+
+ VMOVDQU(Mem{Base: d}, currentBlockY)
+
+ CMPQ(n, U8(16))
+ JA(LabelRef("tail_load_large"))
+
+ // Situation 2
+ Comment("Shift when remaining bytes <= 16, safe next to a page boundary")
+
+ VPERM2I128(Imm(3), currentBlockY, zeroes, currentBlockY)
+
+ shuffleClearMaskBytes := make([]byte, 3*16)
+ for i := byte(0); i < 16; i++ {
+ shuffleClearMaskBytes[i] = i
+ shuffleClearMaskBytes[i+16] = 0xFF
+ shuffleClearMaskBytes[i+32] = 0xFF
+ }
+ shuffleClearMask := ConstBytes("shuffle_clear_mask", shuffleClearMaskBytes)
+
+ LEAQ(shuffleClearMask.Offset(16), shuffleMaskPtr)
+ LEAQ(Mem{Base: offset, Index: n, Scale: 2, Disp: -32}, offset) // offset += 2*n - 32
+ SUBQ(offset, shuffleMaskPtr)
+ VMOVDQU(Mem{Base: shuffleMaskPtr}, shuffle)
+
+ VPSHUFB(shuffle, currentBlockY, currentBlockY)
+
+ XORQ(n, n)
+ JMP(LabelRef("loaded"))
+
+ // Situation 3
+ Comment("Shift when remaining bytes > 16, safe next to a page boundary")
+ Label("tail_load_large")
+
+ shuffleMaskBytes := make([]byte, 3*16)
+ for i := byte(0); i < 16; i++ {
+ shuffleMaskBytes[i] = i
+ shuffleMaskBytes[i+16] = i
+ shuffleMaskBytes[i+32] = i
+ }
+ shuffleMask := ConstBytes("shuffle_mask", shuffleMaskBytes)
+
+ LEAQ(Mem{Base: offset, Index: n, Scale: 2, Disp: -48}, offset) // offset += 2*n - 48
+
+ LEAQ(shuffleMask.Offset(16), shuffleMaskPtr)
+ SUBQ(offset, shuffleMaskPtr)
+ VMOVDQU(Mem{Base: shuffleMaskPtr}, shuffle)
+
+ VPSHUFB(shuffle, currentBlockY, tmp1)
+
+ tmp2 := YMM()
+ VPERM2I128(Imm(3), currentBlockY, zeroes, tmp2)
+
+ VPSHUFB(shuffle, tmp2, tmp2)
+
+ blendMaskBytes := make([]byte, 3*16)
+ for i := byte(0); i < 16; i++ {
+ blendMaskBytes[i] = 0xFF
+ blendMaskBytes[i+16] = 0x00
+ blendMaskBytes[i+32] = 0xFF
+ }
+ blendMask := ConstBytes("blend_mask", blendMaskBytes)
+
+ blendMaskStartPtr := GP64()
+ LEAQ(blendMask.Offset(16), blendMaskStartPtr)
+ SUBQ(offset, blendMaskStartPtr)
+
+ blend := YMM()
+ VBROADCASTF128(Mem{Base: blendMaskStartPtr}, blend)
+ VPBLENDVB(blend, tmp1, tmp2, currentBlockY)
+
+ XORQ(n, n)
+ JMP(LabelRef("loaded"))
+
+ Label("end")
+
+ Comment("If the previous block was incomplete, this is an error.")
+ VPOR(incompletePreviousBlockY, errorY, errorY)
+
+ Comment("Return whether any error bit was set")
+ VPTEST(errorY, errorY)
+ SETEQ(Mem{Base: o})
+ VZEROUPPER()
+ RET()
+
+ Generate()
+}
+
+func incompleteMaskData() []byte {
+ // The incomplete mask is used on every block to flag the bytes that are
+ // incomplete if this is the last block (for example a byte that starts
+ // a 4 byte character only 3 bytes before the end).
+ any := byte(0xFF)
+ needs4 := byte(0b11110000) - 1
+ needs3 := byte(0b11100000) - 1
+ needs2 := byte(0b11000000) - 1
+ b := [32]byte{
+ any, any, any, any, any, any, any, any,
+ any, any, any, any, any, any, any, any,
+ any, any, any, any, any, any, any, any,
+ any, any, any, any, any, needs4, needs3, needs2,
+ }
+ return b[:]
+}
+
+func continuationMaskData(pattern byte) []byte {
+ // Pattern is something like 0b11100000 to accept all bytes of the form
+ // 111xxxxx.
+ v := pattern - 1
+ return bytes.Repeat([]byte{v}, 32)
+}
+
+func nibbleMasksData() (nib1, nib2, nib3 []byte) {
+ const (
+ TooShort = 1 << 0
+ TooLong = 1 << 1
+ Overlong3 = 1 << 2
+ Surrogate = 1 << 4
+ Overlong2 = 1 << 5
+ TwoConts = 1 << 7
+ TooLarge = 1 << 3
+ TooLarge1000 = 1 << 6
+ Overlong4 = 1 << 6
+ Carry = TooShort | TooLong | TwoConts
+ )
+
+ fullMask := func(b [16]byte) []byte {
+ m := make([]byte, 32)
+ copy(m, b[:])
+ copy(m[16:], b[:])
+ return m
+ }
+
+ nib1 = fullMask([16]byte{
+ // 0_______ ________ <ASCII in byte 1>
+ TooLong, TooLong, TooLong, TooLong,
+ TooLong, TooLong, TooLong, TooLong,
+ // 10______ ________ <continuation in byte 1>
+ TwoConts, TwoConts, TwoConts, TwoConts,
+ // 1100____ ________ <two byte lead in byte 1>
+ TooShort | Overlong2,
+ // 1101____ ________ <two byte lead in byte 1>
+ TooShort,
+ // 1110____ ________ <three byte lead in byte 1>
+ TooShort | Overlong3 | Surrogate,
+ // 1111____ ________ <four+ byte lead in byte 1>
+ TooShort | TooLarge | TooLarge1000 | Overlong4,
+ })
+
+ nib2 = fullMask([16]byte{
+ // ____0000 ________
+ Carry | Overlong3 | Overlong2 | Overlong4,
+ // ____0001 ________
+ Carry | Overlong2,
+ // ____001_ ________
+ Carry,
+ Carry,
+
+ // ____0100 ________
+ Carry | TooLarge,
+ // ____0101 ________
+ Carry | TooLarge | TooLarge1000,
+ // ____011_ ________
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+
+ // ____1___ ________
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+ // ____1101 ________
+ Carry | TooLarge | TooLarge1000 | Surrogate,
+ Carry | TooLarge | TooLarge1000,
+ Carry | TooLarge | TooLarge1000,
+ })
+
+ nib3 = fullMask([16]byte{
+ // ________ 0_______ <ASCII in byte 2>
+ TooShort, TooShort, TooShort, TooShort,
+ TooShort, TooShort, TooShort, TooShort,
+
+ // ________ 1000____
+ TooLong | Overlong2 | TwoConts | Overlong3 | TooLarge1000 | Overlong4,
+ // ________ 1001____
+ TooLong | Overlong2 | TwoConts | Overlong3 | TooLarge,
+ // ________ 101_____
+ TooLong | Overlong2 | TwoConts | Surrogate | TooLarge,
+ TooLong | Overlong2 | TwoConts | Surrogate | TooLarge,
+
+ // ________ 11______
+ TooShort, TooShort, TooShort, TooShort,
+ })
+
+ return
+}
+
+func lowNibbles(a VecVirtual, nibbleMask VecVirtual) VecVirtual {
+ out := YMM()
+ VPAND(a, nibbleMask, out)
+ return out
+}
+
+func highNibbles(a VecVirtual, nibbleMask VecVirtual) VecVirtual {
+ out := YMM()
+ VPSRLW(Imm(4), a, out)
+ VPAND(out, nibbleMask, out)
+ return out
+}
+
+func zeroOutVector(y VecVirtual) VecVirtual {
+ VPXOR(y, y, y)
+ return y
+}
+
+func ConstBytes(name string, data []byte) Mem {
+ m := GLOBL(name, RODATA|NOPTR)
+
+ switch {
+ case len(data)%8 == 0:
+ constBytes8(0, data)
+
+ case len(data)%4 == 0:
+ constBytes4(0, data)
+
+ default:
+ i := (len(data) / 8) * 8
+ constBytes8(0, data[:i])
+ constBytes1(i, data[i:])
+ }
+
+ return m
+}
+
+func constArray64(name string, elems ...uint64) Mem {
+ data := make([]byte, 8*len(elems))
+ for i, elem := range elems {
+ binary.LittleEndian.PutUint64(data[i*8:], elem)
+ }
+ return ConstBytes(name, data)
+}
+
+func constBytes8(offset int, data []byte) {
+ for i := 0; i < len(data); i += 8 {
+ DATA(offset+i, U64(binary.LittleEndian.Uint64(data[i:i+8])))
+ }
+}
+
+func constBytes4(offset int, data []byte) {
+ for i := 0; i < len(data); i += 4 {
+ DATA(offset+i, U32(binary.LittleEndian.Uint32(data[i:i+4])))
+ }
+}
+
+func constBytes1(offset int, data []byte) {
+ for i, b := range data {
+ DATA(offset+i, U8(b))
+ }
+}
diff --git a/src/unicode/utf8/export_test.go b/src/unicode/utf8/export_test.go
new file mode 100644
index 0000000..049fa57
--- /dev/null
+++ b/src/unicode/utf8/export_test.go
@@ -0,0 +1,11 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package utf8
+
+// Exported version of validDefault to do side-by-side comparison with the AVX2
+// implementation.
+func ValidDefault(b []byte) bool {
+ return validDefault(b)
+}
diff --git a/src/unicode/utf8/utf8.go b/src/unicode/utf8/utf8.go
index 1e9f666..24b3e52 100644
--- a/src/unicode/utf8/utf8.go
+++ b/src/unicode/utf8/utf8.go
@@ -473,103 +473,6 @@
// bits set to 10.
func RuneStart(b byte) bool { return b&0xC0 != 0x80 }

-// Valid reports whether p consists entirely of valid UTF-8-encoded runes.
-func Valid(p []byte) bool {
- // This optimization avoids the need to recompute the capacity
- // when generating code for p[8:], bringing it to parity with
- // ValidString, which was 20% faster on long ASCII strings.
- p = p[:len(p):len(p)]
-
- // Fast path. Check for and skip 8 bytes of ASCII characters per iteration.
- for len(p) >= 8 {
- // Combining two 32 bit loads allows the same code to be used
- // for 32 and 64 bit platforms.
- // The compiler can generate a 32bit load for first32 and second32
- // on many platforms. See test/codegen/memcombine.go.
- first32 := uint32(p[0]) | uint32(p[1])<<8 | uint32(p[2])<<16 | uint32(p[3])<<24
- second32 := uint32(p[4]) | uint32(p[5])<<8 | uint32(p[6])<<16 | uint32(p[7])<<24
- if (first32|second32)&0x80808080 != 0 {
- // Found a non ASCII byte (>= RuneSelf).
- break
- }
- p = p[8:]
- }
- n := len(p)
- for i := 0; i < n; {
- pi := p[i]
- if pi < RuneSelf {
- i++
- continue
- }
- x := first[pi]
- if x == xx {
- return false // Illegal starter byte.
- }
- size := int(x & 7)
- if i+size > n {
- return false // Short or invalid.
- }
- accept := acceptRanges[x>>4]
- if c := p[i+1]; c < accept.lo || accept.hi < c {
- return false
- } else if size == 2 {
- } else if c := p[i+2]; c < locb || hicb < c {
- return false
- } else if size == 3 {
- } else if c := p[i+3]; c < locb || hicb < c {
- return false
- }
- i += size
- }
- return true
-}
-
-// ValidString reports whether s consists entirely of valid UTF-8-encoded runes.
-func ValidString(s string) bool {
- // Fast path. Check for and skip 8 bytes of ASCII characters per iteration.
- for len(s) >= 8 {
- // Combining two 32 bit loads allows the same code to be used
- // for 32 and 64 bit platforms.
- // The compiler can generate a 32bit load for first32 and second32
- // on many platforms. See test/codegen/memcombine.go.
- first32 := uint32(s[0]) | uint32(s[1])<<8 | uint32(s[2])<<16 | uint32(s[3])<<24
- second32 := uint32(s[4]) | uint32(s[5])<<8 | uint32(s[6])<<16 | uint32(s[7])<<24
- if (first32|second32)&0x80808080 != 0 {
- // Found a non ASCII byte (>= RuneSelf).
- break
- }
- s = s[8:]
- }
- n := len(s)
- for i := 0; i < n; {
- si := s[i]
- if si < RuneSelf {
- i++
- continue
- }
- x := first[si]
- if x == xx {
- return false // Illegal starter byte.
- }
- size := int(x & 7)
- if i+size > n {
- return false // Short or invalid.
- }
- accept := acceptRanges[x>>4]
- if c := s[i+1]; c < accept.lo || accept.hi < c {
- return false
- } else if size == 2 {
- } else if c := s[i+2]; c < locb || hicb < c {
- return false
- } else if size == 3 {
- } else if c := s[i+3]; c < locb || hicb < c {
- return false
- }
- i += size
- }
- return true
-}
-
// ValidRune reports whether r can be legally encoded as UTF-8.
// Code points that are out of range or a surrogate half are illegal.
func ValidRune(r rune) bool {
diff --git a/src/unicode/utf8/utf8_test.go b/src/unicode/utf8/utf8_test.go
index 19a04dc..440502e 100644
--- a/src/unicode/utf8/utf8_test.go
+++ b/src/unicode/utf8/utf8_test.go
@@ -464,43 +464,6 @@
}
}

-type ValidTest struct {
- in string
- out bool
-}
-
-var validTests = []ValidTest{
- {"", true},
- {"a", true},
- {"abc", true},
- {"Ж", true},
- {"ЖЖ", true},
- {"брэд-ЛГТМ", true},
- {"☺☻☹", true},
- {"aa\xe2", false},
- {string([]byte{66, 250}), false},
- {string([]byte{66, 250, 67}), false},
- {"a\uFFFDb", true},
- {string("\xF4\x8F\xBF\xBF"), true}, // U+10FFFF
- {string("\xF4\x90\x80\x80"), false}, // U+10FFFF+1; out of range
- {string("\xF7\xBF\xBF\xBF"), false}, // 0x1FFFFF; out of range
- {string("\xFB\xBF\xBF\xBF\xBF"), false}, // 0x3FFFFFF; out of range
- {string("\xc0\x80"), false}, // U+0000 encoded in two bytes: incorrect
- {string("\xed\xa0\x80"), false}, // U+D800 high surrogate (sic)
- {string("\xed\xbf\xbf"), false}, // U+DFFF low surrogate (sic)
-}
-
-func TestValid(t *testing.T) {
- for _, tt := range validTests {
- if Valid([]byte(tt.in)) != tt.out {
- t.Errorf("Valid(%q) = %v; want %v", tt.in, !tt.out, tt.out)
- }
- if ValidString(tt.in) != tt.out {
- t.Errorf("ValidString(%q) = %v; want %v", tt.in, !tt.out, tt.out)
- }
- }
-}
-
type ValidRuneTest struct {
r rune
ok bool
diff --git a/src/unicode/utf8/valid.go b/src/unicode/utf8/valid.go
new file mode 100644
index 0000000..e6d51ca
--- /dev/null
+++ b/src/unicode/utf8/valid.go
@@ -0,0 +1,102 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package utf8
+
+// Scalar implementation of Valid. Used for non-amd64 targets, and as a
+// fallback when AVX2 is not supported.
+func validDefault(p []byte) bool {
+ // This optimization avoids the need to recompute the capacity
+ // when generating code for p[8:], bringing it to parity with
+ // ValidString, which was 20% faster on long ASCII strings.
+ p = p[:len(p):len(p)]
+
+ // Fast path. Check for and skip 8 bytes of ASCII characters per iteration.
+ for len(p) >= 8 {
+ // Combining two 32 bit loads allows the same code to be used
+ // for 32 and 64 bit platforms.
+ // The compiler can generate a 32bit load for first32 and second32
+ // on many platforms. See test/codegen/memcombine.go.
+ first32 := uint32(p[0]) | uint32(p[1])<<8 | uint32(p[2])<<16 | uint32(p[3])<<24
+ second32 := uint32(p[4]) | uint32(p[5])<<8 | uint32(p[6])<<16 | uint32(p[7])<<24
+ if (first32|second32)&0x80808080 != 0 {
+ // Found a non ASCII byte (>= RuneSelf).
+ break
+ }
+ p = p[8:]
+ }
+ n := len(p)
+ for i := 0; i < n; {
+ pi := p[i]
+ if pi < RuneSelf {
+ i++
+ continue
+ }
+ x := first[pi]
+ if x == xx {
+ return false // Illegal starter byte.
+ }
+ size := int(x & 7)
+ if i+size > n {
+ return false // Short or invalid.
+ }
+ accept := acceptRanges[x>>4]
+ if c := p[i+1]; c < accept.lo || accept.hi < c {
+ return false
+ } else if size == 2 {
+ } else if c := p[i+2]; c < locb || hicb < c {
+ return false
+ } else if size == 3 {
+ } else if c := p[i+3]; c < locb || hicb < c {
+ return false
+ }
+ i += size
+ }
+ return true
+}
+
+func validStringDefault(s string) bool {
+ // Fast path. Check for and skip 8 bytes of ASCII characters per iteration.
+ for len(s) >= 8 {
+ // Combining two 32 bit loads allows the same code to be used
+ // for 32 and 64 bit platforms.
+ // The compiler can generate a 32bit load for first32 and second32
+ // on many platforms. See test/codegen/memcombine.go.
+ first32 := uint32(s[0]) | uint32(s[1])<<8 | uint32(s[2])<<16 | uint32(s[3])<<24
+ second32 := uint32(s[4]) | uint32(s[5])<<8 | uint32(s[6])<<16 | uint32(s[7])<<24
+ if (first32|second32)&0x80808080 != 0 {
+ // Found a non ASCII byte (>= RuneSelf).
+ break
+ }
+ s = s[8:]
+ }
+ n := len(s)
+ for i := 0; i < n; {
+ si := s[i]
+ if si < RuneSelf {
+ i++
+ continue
+ }
+ x := first[si]
+ if x == xx {
+ return false // Illegal starter byte.
+ }
+ size := int(x & 7)
+ if i+size > n {
+ return false // Short or invalid.
+ }
+ accept := acceptRanges[x>>4]
+ if c := s[i+1]; c < accept.lo || accept.hi < c {
+ return false
+ } else if size == 2 {
+ } else if c := s[i+2]; c < locb || hicb < c {
+ return false
+ } else if size == 3 {
+ } else if c := s[i+3]; c < locb || hicb < c {
+ return false
+ }
+ i += size
+ }
+ return true
+}
diff --git a/src/unicode/utf8/valid_amd64.go b/src/unicode/utf8/valid_amd64.go
new file mode 100644
index 0000000..4cd0f3d
--- /dev/null
+++ b/src/unicode/utf8/valid_amd64.go
@@ -0,0 +1,21 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 && !purego
+
+package utf8
+
+import (
+ "internal/cpu"
+)
+
+var hasAVX2 = cpu.X86.HasAVX2
+
+// Valid reports whether p consists entirely of valid UTF-8-encoded runes.
+func Valid(p []byte) bool
+
+// ValidString reports whether s consists entirely of valid UTF-8-encoded runes.
+func ValidString(s string) bool
+
+func validBody()
diff --git a/src/unicode/utf8/valid_amd64.s b/src/unicode/utf8/valid_amd64.s
new file mode 100644
index 0000000..fb0b608
--- /dev/null
+++ b/src/unicode/utf8/valid_amd64.s
@@ -0,0 +1,281 @@
+// Code generated by command: go run valid_asm.go -out ../valid_amd64.s -pkg utf8. DO NOT EDIT.
+
+//go:build amd64 && !purego
+
+#include "asm_amd64.h"
+#include "textflag.h"
+
+// func ValidString(s string) bool
+TEXT ·ValidString(SB), NOSPLIT, $0-17
+ MOVQ s_base+0(FP), SI
+ MOVQ s_len+8(FP), CX
+ LEAQ ret+16(FP), BX
+#ifndef hasAVX2
+ CMPB ·hasAVX2+0(SB), $0x01
+ JEQ has_avx
+ JMP ·validStringDefault+0(SB)
+#endif
+
+has_avx:
+ JMP ·validBody+0(SB)
+
+// func Valid(p []byte) bool
+TEXT ·Valid(SB), NOSPLIT, $0-25
+ MOVQ p_base+0(FP), SI
+ MOVQ p_len+8(FP), CX
+ LEAQ ret+24(FP), BX
+#ifndef hasAVX2
+ CMPB ·hasAVX2+0(SB), $0x01
+ JEQ has_avx
+ JMP ·validDefault+0(SB)
+#endif
+
+has_avx:
+ JMP ·validBody+0(SB)
+
+// func validBody()
+// Requires: AVX, AVX2
+TEXT ·validBody(SB), NOSPLIT, $0
+ // Prepare the lookup masks
+ VMOVDQU incomplete_mask<>+0(SB), Y0
+ VMOVDQU cont4_vec<>+0(SB), Y1
+ VMOVDQU cont3_vec<>+0(SB), Y2
+ VMOVDQU nibble1_errors<>+0(SB), Y3
+ VMOVDQU nibble2_errors<>+0(SB), Y4
+ VMOVDQU nibble3_errors<>+0(SB), Y5
+ VMOVDQU nibble_mask<>+0(SB), Y6
+ VMOVDQU msb_mask<>+0(SB), Y7
+
+ // For the first pass, set the previous block as zero.
+ VPXOR Y8, Y8, Y8
+
+ // Sticky error vector starts empty.
+ VPXOR Y9, Y9, Y9
+
+ // Zeroes the "previous block was incomplete" vector.
+ VPXOR Y10, Y10, Y10
+
+ // Top of the loop.
+check_input:
+ // If bytes left >= 32
+ CMPQ CX, $0x20
+ JL tail_load
+
+ // Process one 32B block of data
+ // Load the next block of bytes
+ VMOVDQU (SI), Y11
+ SUBQ $0x20, CX
+ ADDQ $0x20, SI
+
+loaded:
+ // Fast check to see if ASCII
+ VPMOVMSKB Y11, AX
+ TESTL AX, AX
+ JNZ non_ascii
+
+ // If this whole block is ASCII, there is nothing to do, and it is an error if any of the previous code point was incomplete.
+ VPOR Y9, Y10, Y9
+ JMP check_input
+
+non_ascii:
+ // Prepare intermediate vector for push operations
+ VPERM2I128 $0x03, Y8, Y11, Y8
+
+ // Check errors on the high nibble of the previous byte
+ VPALIGNR $0x0f, Y8, Y11, Y10
+ VPSRLW $0x04, Y10, Y12
+ VPAND Y12, Y6, Y12
+ VPSHUFB Y12, Y3, Y12
+
+ // Check errors on the low nibble of the previous byte
+ VPAND Y10, Y6, Y10
+ VPSHUFB Y10, Y4, Y10
+ VPAND Y10, Y12, Y12
+
+ // Check errors on the high nibble on the current byte
+ VPSRLW $0x04, Y11, Y10
+ VPAND Y10, Y6, Y10
+ VPSHUFB Y10, Y5, Y10
+ VPAND Y10, Y12, Y12
+
+ // Find 3 bytes continuations
+ VPALIGNR $0x0e, Y8, Y11, Y10
+ VPSUBUSB Y2, Y10, Y10
+
+ // Find 4 bytes continuations
+ VPALIGNR $0x0d, Y8, Y11, Y8
+ VPSUBUSB Y1, Y8, Y8
+
+ // Combine them to have all continuations
+ VPOR Y10, Y8, Y8
+
+ // Perform a byte-sized signed comparison with zero to turn any non-zero bytes into 0xFF.
+ VPXOR Y10, Y10, Y10
+ VPCMPGTB Y10, Y8, Y8
+
+ // Find bytes that are continuations by looking at their most significant bit.
+ VPAND Y7, Y8, Y8
+
+ // Find mismatches between expected and actual continuation bytes
+ VPXOR Y8, Y12, Y8
+
+ // Store result in sticky error
+ VPOR Y9, Y8, Y9
+
+ // Prepare for next iteration
+ VPSUBUSB Y0, Y11, Y10
+ VMOVDQU Y11, Y8
+
+ // End of loop
+ JMP check_input
+
+tail_load:
+ // If < 32 bytes left
+ // Fast exit if done
+ CMPQ CX, $0x00
+ JE end
+
+ // If 0 < bytes left < 32
+ VPXOR Y12, Y12, Y12
+
+ // Check if there is a page boundary between end of input and the next 32 bytes
+ LEAQ -1(SI)(CX*1), AX
+ ANDL $0x00000fff, AX
+ ADDL $0x00000020, AX
+ SUBQ CX, AX
+ CMPL AX, $0x00001000
+ JAE page_boundary
+
+ // No page boundary, no problem
+ LEAQ half_blend_mask<>+32(SB), AX
+ SUBQ CX, AX
+ VMOVDQU (AX), Y11
+ VPBLENDVB Y11, (SI), Y12, Y11
+ XORQ CX, CX
+ JMP loaded
+
+page_boundary:
+ MOVQ $0x0000000000000020, AX
+ SUBQ CX, AX
+ SUBQ AX, SI
+ VMOVDQU (SI), Y11
+ CMPQ CX, $0x10
+ JA tail_load_large
+
+ // Shift when remaining bytes <= 16, safe next to a page boundary
+ VPERM2I128 $0x03, Y11, Y12, Y11
+ LEAQ shuffle_clear_mask<>+16(SB), DX
+ LEAQ -32(AX)(CX*2), AX
+ SUBQ AX, DX
+ VMOVDQU (DX), Y13
+ VPSHUFB Y13, Y11, Y11
+ XORQ CX, CX
+ JMP loaded
+
+ // Shift when remaining bytes > 16, safe next to a page boundary
+tail_load_large:
+ LEAQ -48(AX)(CX*2), AX
+ LEAQ shuffle_mask<>+16(SB), DX
+ SUBQ AX, DX
+ VMOVDQU (DX), Y13
+ VPSHUFB Y13, Y11, Y14
+ VPERM2I128 $0x03, Y11, Y12, Y11
+ VPSHUFB Y13, Y11, Y11
+ LEAQ blend_mask<>+16(SB), CX
+ SUBQ AX, CX
+ VBROADCASTF128 (CX), Y12
+ VPBLENDVB Y12, Y14, Y11, Y11
+ XORQ CX, CX
+ JMP loaded
+
+end:
+ // If the previous block was incomplete, this is an error.
+ VPOR Y10, Y9, Y9
+
+ // Return whether any error bit was set
+ VPTEST Y9, Y9
+ SETEQ (BX)
+ VZEROUPPER
+ RET
+
+DATA incomplete_mask<>+0(SB)/8, $0xffffffffffffffff
+DATA incomplete_mask<>+8(SB)/8, $0xffffffffffffffff
+DATA incomplete_mask<>+16(SB)/8, $0xffffffffffffffff
+DATA incomplete_mask<>+24(SB)/8, $0xbfdfefffffffffff
+GLOBL incomplete_mask<>(SB), RODATA|NOPTR, $32
+
+DATA cont4_vec<>+0(SB)/8, $0xefefefefefefefef
+DATA cont4_vec<>+8(SB)/8, $0xefefefefefefefef
+DATA cont4_vec<>+16(SB)/8, $0xefefefefefefefef
+DATA cont4_vec<>+24(SB)/8, $0xefefefefefefefef
+GLOBL cont4_vec<>(SB), RODATA|NOPTR, $32
+
+DATA cont3_vec<>+0(SB)/8, $0xdfdfdfdfdfdfdfdf
+DATA cont3_vec<>+8(SB)/8, $0xdfdfdfdfdfdfdfdf
+DATA cont3_vec<>+16(SB)/8, $0xdfdfdfdfdfdfdfdf
+DATA cont3_vec<>+24(SB)/8, $0xdfdfdfdfdfdfdfdf
+GLOBL cont3_vec<>(SB), RODATA|NOPTR, $32
+
+DATA nibble1_errors<>+0(SB)/8, $0x0202020202020202
+DATA nibble1_errors<>+8(SB)/8, $0x4915012180808080
+DATA nibble1_errors<>+16(SB)/8, $0x0202020202020202
+DATA nibble1_errors<>+24(SB)/8, $0x4915012180808080
+GLOBL nibble1_errors<>(SB), RODATA|NOPTR, $32
+
+DATA nibble2_errors<>+0(SB)/8, $0xcbcbcb8b8383a3e7
+DATA nibble2_errors<>+8(SB)/8, $0xcbcbdbcbcbcbcbcb
+DATA nibble2_errors<>+16(SB)/8, $0xcbcbcb8b8383a3e7
+DATA nibble2_errors<>+24(SB)/8, $0xcbcbdbcbcbcbcbcb
+GLOBL nibble2_errors<>(SB), RODATA|NOPTR, $32
+
+DATA nibble3_errors<>+0(SB)/8, $0x0101010101010101
+DATA nibble3_errors<>+8(SB)/8, $0x01010101babaaee6
+DATA nibble3_errors<>+16(SB)/8, $0x0101010101010101
+DATA nibble3_errors<>+24(SB)/8, $0x01010101babaaee6
+GLOBL nibble3_errors<>(SB), RODATA|NOPTR, $32
+
+DATA nibble_mask<>+0(SB)/8, $0x0f0f0f0f0f0f0f0f
+DATA nibble_mask<>+8(SB)/8, $0x0f0f0f0f0f0f0f0f
+DATA nibble_mask<>+16(SB)/8, $0x0f0f0f0f0f0f0f0f
+DATA nibble_mask<>+24(SB)/8, $0x0f0f0f0f0f0f0f0f
+GLOBL nibble_mask<>(SB), RODATA|NOPTR, $32
+
+DATA msb_mask<>+0(SB)/8, $0x8080808080808080
+DATA msb_mask<>+8(SB)/8, $0x8080808080808080
+DATA msb_mask<>+16(SB)/8, $0x8080808080808080
+DATA msb_mask<>+24(SB)/8, $0x8080808080808080
+GLOBL msb_mask<>(SB), RODATA|NOPTR, $32
+
+DATA half_blend_mask<>+0(SB)/8, $0xffffffffffffffff
+DATA half_blend_mask<>+8(SB)/8, $0xffffffffffffffff
+DATA half_blend_mask<>+16(SB)/8, $0xffffffffffffffff
+DATA half_blend_mask<>+24(SB)/8, $0xffffffffffffffff
+DATA half_blend_mask<>+32(SB)/8, $0x0000000000000000
+DATA half_blend_mask<>+40(SB)/8, $0x0000000000000000
+DATA half_blend_mask<>+48(SB)/8, $0x0000000000000000
+DATA half_blend_mask<>+56(SB)/8, $0x0000000000000000
+GLOBL half_blend_mask<>(SB), RODATA|NOPTR, $64
+
+DATA shuffle_clear_mask<>+0(SB)/8, $0x0706050403020100
+DATA shuffle_clear_mask<>+8(SB)/8, $0x0f0e0d0c0b0a0908
+DATA shuffle_clear_mask<>+16(SB)/8, $0xffffffffffffffff
+DATA shuffle_clear_mask<>+24(SB)/8, $0xffffffffffffffff
+DATA shuffle_clear_mask<>+32(SB)/8, $0xffffffffffffffff
+DATA shuffle_clear_mask<>+40(SB)/8, $0xffffffffffffffff
+GLOBL shuffle_clear_mask<>(SB), RODATA|NOPTR, $48
+
+DATA shuffle_mask<>+0(SB)/8, $0x0706050403020100
+DATA shuffle_mask<>+8(SB)/8, $0x0f0e0d0c0b0a0908
+DATA shuffle_mask<>+16(SB)/8, $0x0706050403020100
+DATA shuffle_mask<>+24(SB)/8, $0x0f0e0d0c0b0a0908
+DATA shuffle_mask<>+32(SB)/8, $0x0706050403020100
+DATA shuffle_mask<>+40(SB)/8, $0x0f0e0d0c0b0a0908
+GLOBL shuffle_mask<>(SB), RODATA|NOPTR, $48
+
+DATA blend_mask<>+0(SB)/8, $0xffffffffffffffff
+DATA blend_mask<>+8(SB)/8, $0xffffffffffffffff
+DATA blend_mask<>+16(SB)/8, $0x0000000000000000
+DATA blend_mask<>+24(SB)/8, $0x0000000000000000
+DATA blend_mask<>+32(SB)/8, $0xffffffffffffffff
+DATA blend_mask<>+40(SB)/8, $0xffffffffffffffff
+GLOBL blend_mask<>(SB), RODATA|NOPTR, $48
diff --git a/src/unicode/utf8/valid_linux_test.go b/src/unicode/utf8/valid_linux_test.go
new file mode 100644
index 0000000..d0d837e
--- /dev/null
+++ b/src/unicode/utf8/valid_linux_test.go
@@ -0,0 +1,63 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux
+
+package utf8_test
+
+import (
+ "bytes"
+ "fmt"
+ "os"
+ "runtime/debug"
+ "syscall"
+ "testing"
+)
+
+// Test the Valid() implementation does not touch unmapped/protected memory
+// when operating close to a page boundary. Only test on linux because
+// mmap/mprotect is used to test, but the code being tested should not be
+// OS-specific.
+func TestValidPageBoundary(t *testing.T) {
+ pg := os.Getpagesize()
+
+ // 3 pages: invalid | valid | invalid to test both starting and ending near
+ // a protected page.
+ data, err := syscall.Mmap(-1, 0, 3*pg, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+ if err != nil {
+ t.Fatalf("mmap: %v", err)
+ }
+ defer syscall.Munmap(data)
+
+ if err := syscall.Mprotect(data[0:pg], syscall.PROT_NONE); err != nil {
+ t.Fatalf("mprotect page 0: %v", err)
+ }
+
+ if err := syscall.Mprotect(data[2*pg:3*pg], syscall.PROT_NONE); err != nil {
+ t.Fatalf("mprotect page 2: %v", err)
+ }
+
+ page := data[pg : 2*pg]
+ example := bytes.Repeat(someutf8, 64/len(someutf8))
+
+ input := page[:64]
+ copy(input, example)
+ for i := 0; i <= 64; i++ {
+ t.Run(fmt.Sprintf("start-%d", i), func(t *testing.T) {
+ debug.SetPanicOnFault(true)
+ defer debug.SetPanicOnFault(false)
+ check(t, input[:i])
+ })
+ }
+
+ input = page[pg-64:]
+ copy(input, example)
+ for i := 0; i <= 64; i++ {
+ t.Run(fmt.Sprintf("end-%d", i), func(t *testing.T) {
+ debug.SetPanicOnFault(true)
+ defer debug.SetPanicOnFault(false)
+ check(t, input[i:])
+ })
+ }
+}
diff --git a/src/unicode/utf8/valid_noasm.go b/src/unicode/utf8/valid_noasm.go
new file mode 100644
index 0000000..99340ec
--- /dev/null
+++ b/src/unicode/utf8/valid_noasm.go
@@ -0,0 +1,17 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !amd64 || purego
+
+package utf8
+
+// Valid reports whether p consists entirely of valid UTF-8-encoded runes.
+func Valid(p []byte) bool {
+ return validDefault(p)
+}
+
+// ValidString reports whether s consists entirely of valid UTF-8-encoded runes.
+func ValidString(s string) bool {
+ return validStringDefault(s)
+}
diff --git a/src/unicode/utf8/valid_test.go b/src/unicode/utf8/valid_test.go
new file mode 100644
index 0000000..cab5563
--- /dev/null
+++ b/src/unicode/utf8/valid_test.go
@@ -0,0 +1,326 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package utf8_test
+
+import (
+ "bytes"
+ "fmt"
+ "strings"
+ "testing"
+ "unicode/utf8"
+)
+
+var baseExamples = []string{
+ "",
+ "a",
+ "abc",
+ "Ж",
+ "ЖЖ",
+ "брэд-ЛГТМ",
+ "☺☻☹",
+
+ // overlong
+ "\xE0\x80",
+ // unfinished continuation
+ "aa\xE2",
+
+ string([]byte{66, 250}),
+
+ string([]byte{66, 250, 67}),
+
+ "a\uFFFDb",
+
+ "\xF4\x8F\xBF\xBF", // U+10FFFF
+
+ "\xF4\x90\x80\x80", // U+10FFFF+1; out of range
+ "\xF7\xBF\xBF\xBF", // 0x1FFFFF; out of range
+
+ "\xFB\xBF\xBF\xBF\xBF", // 0x3FFFFFF; out of range
+
+ "\xc0\x80", // U+0000 encoded in two bytes: incorrect
+ "\xed\xa0\x80", // U+D800 high surrogate (sic)
+ "\xed\xbf\xbf", // U+DFFF low surrogate (sic)
+
+ // valid at boundary
+ strings.Repeat("a", 28) + "☺☻☹",
+ strings.Repeat("a", 29) + "☺☻☹",
+ strings.Repeat("a", 30) + "☺☻☹",
+ strings.Repeat("a", 31) + "☺☻☹",
+ strings.Repeat("a", 32+28) + "☺☻☹",
+ strings.Repeat("a", 32+29) + "☺☻☹",
+ strings.Repeat("a", 32+30) + "☺☻☹",
+ strings.Repeat("a", 32+31) + "☺☻☹",
+ // invalid at boundary
+ strings.Repeat("a", 31) + "\xE2a",
+ strings.Repeat("a", 32+31) + "\xE2a",
+
+ // sequences from fastvalidate-utf-8
+ "a",
+ "\xc3\xb1",
+ "\xe2\x82\xa1",
+ "\xf0\x90\x8c\xbc",
+ "안녕하세요, 세상",
+ "\xc2\x80",
+ "\xf0\x90\x80\x80",
+ "\xee\x80\x80",
+ "\xc3\x28",
+ "\xa0\xa1",
+ "\xe2\x28\xa1",
+ "\xe2\x82\x28",
+ "\xf0\x28\x8c\xbc",
+ "\xf0\x90\x28\xbc",
+ "\xf0\x28\x8c\x28",
+ "\xc0\x9f",
+ "\xf5\xff\xff\xff",
+ "\xed\xa0\x81",
+ "\xf8\x90\x80\x80\x80",
+ "123456789012345\xed",
+ "123456789012345\xf1",
+ "123456789012345\xc2",
+ "\xC2\x7F",
+ "\xce",
+ "\xce\xba\xe1",
+ "\xce\xba\xe1\xbd",
+ "\xce\xba\xe1\xbd\xb9\xcf",
+ "\xce\xba\xe1\xbd\xb9\xcf\x83\xce",
+ "\xce\xba\xe1\xbd\xb9\xcf\x83\xce\xbc\xce",
+ "\xdf",
+ "\xef\xbf",
+
+ // same inputs as benchmarks
+ "0123456789",
+ "日本語日本語日本語日",
+ "\xF4\x8F\xBF\xBF",
+
+ // bugs found with fuzzing
+ "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\xc60",
+ "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\xc300",
+ "߀0000000000000000000000000000訨",
+ "0000000000000000000000000000000˂00000000000000000000000000000000",
+}
+
+type byteRange struct {
+ Low byte
+ High byte
+}
+
+func one(b byte) byteRange {
+ return byteRange{b, b}
+}
+
+func genExamples(current string, ranges []byteRange) []string {
+ if len(ranges) == 0 {
+ return []string{string(current)}
+ }
+ r := ranges[0]
+ var all []string
+
+ elements := []byte{r.Low, r.High}
+
+ mid := (r.High + r.Low) / 2
+ if mid != r.Low && mid != r.High {
+ elements = append(elements, mid)
+ }
+
+ for _, x := range elements {
+ s := current + string(x)
+ all = append(all, genExamples(s, ranges[1:])...)
+ if x == r.High {
+ break
+ }
+ }
+ return all
+}
+
+func TestValid(t *testing.T) {
+ examples := baseExamples[:len(baseExamples):len(baseExamples)]
+
+ any := byteRange{0, 0xFF}
+ ascii := byteRange{0, 0x7F}
+ cont := byteRange{0x80, 0xBF}
+
+ rangesToTest := [][]byteRange{
+ {one(0x20), ascii, ascii, ascii},
+
+ // 2-byte sequences
+ {one(0xC2)},
+ {one(0xC2), ascii},
+ {one(0xC2), cont},
+ {one(0xC2), {0xC0, 0xFF}},
+ {one(0xC2), cont, cont},
+ {one(0xC2), cont, cont, cont},
+
+ // 3-byte sequences
+ {one(0xE1)},
+ {one(0xE1), cont},
+ {one(0xE1), cont, cont},
+ {one(0xE1), cont, cont, ascii},
+ {one(0xE1), cont, ascii},
+ {one(0xE1), cont, cont, cont},
+
+ // 4-byte sequences
+ {one(0xF1)},
+ {one(0xF1), cont},
+ {one(0xF1), cont, cont},
+ {one(0xF1), cont, cont, cont},
+ {one(0xF1), cont, cont, ascii},
+ {one(0xF1), cont, cont, cont, ascii},
+
+ // overlong
+ {{0xC0, 0xC1}, any},
+ {{0xC0, 0xC1}, any, any},
+ {{0xC0, 0xC1}, any, any, any},
+ {one(0xE0), {0x0, 0x9F}, cont},
+ {one(0xE0), {0xA0, 0xBF}, cont},
+ }
+
+ for _, r := range rangesToTest {
+ examples = append(examples, genExamples("", r)...)
+ }
+
+ for _, i := range []int{300, 316} {
+ d := bytes.Repeat(someutf8, i/len(someutf8))
+ examples = append(examples, string(d))
+ }
+
+ for _, tt := range examples {
+ t.Run(tt, func(t *testing.T) {
+ check(t, []byte(tt))
+ })
+
+ // Generate variations of the input to exercise errors at the
+ // boundary, using the vector implementation on 32-sized input,
+ // and on non-32-sized inputs.
+ //
+ // Large examples don't go through those variations because they
+ // are likely specific tests.
+
+ if len(tt) >= 32 {
+ continue
+ }
+
+ t.Run("boundary-"+tt, func(t *testing.T) {
+ size := 32 - len(tt)
+ prefix := strings.Repeat("a", size)
+ b := []byte(prefix + tt)
+ check(t, b)
+ })
+ t.Run("vec-padded-"+tt, func(t *testing.T) {
+ prefix := strings.Repeat("a", 32)
+ padding := strings.Repeat("b", 32-(len(tt)%32))
+ input := prefix + padding + tt
+ b := []byte(input)
+ if len(b)%32 != 0 {
+ panic("test should generate block of 32")
+ }
+ check(t, b)
+ })
+ t.Run("vec-"+tt, func(t *testing.T) {
+ prefix := strings.Repeat("a", 32)
+ input := prefix + tt
+ if len(tt)%32 == 0 {
+ input += "x"
+ }
+ b := []byte(input)
+ if len(b)%32 == 0 {
+ panic("test should not generate block of 32")
+ }
+ check(t, b)
+ })
+ }
+}
+
+func check(t *testing.T, b []byte) {
+ t.Helper()
+
+ expected := utf8.ValidDefault(b)
+ if utf8.Valid(b) != expected {
+ t.Errorf("Valid(%q) = %v; want %v", string(b), !expected, expected)
+ }
+}
+
+var valid1k = bytes.Repeat([]byte("0123456789日本語日本語日本語日abcdefghijklmnopqrstuvwx"), 16)
+var valid1M = bytes.Repeat(valid1k, 1024)
+var someutf8 = []byte("\xF4\x8F\xBF\xBF")
+
+type input struct {
+ name string
+ data []byte
+}
+
+func benchmarkInputs() []input {
+ inputs := []input{
+ {"1kValid", valid1k},
+ {"1MValid", valid1M},
+ {"10ASCII", []byte("0123456789")},
+ {"1kASCII", bytes.Repeat([]byte{'A'}, 1024)},
+ {"1MASCII", bytes.Repeat([]byte{'A'}, 1024*1024)},
+ {"1kInvalid", append([]byte{'\xF4'}, bytes.Repeat([]byte{'A'}, 1023)...)},
+ {"10Japan", []byte("日本語日本語日本語日")},
+ }
+
+ const KiB = 1024
+ const MiB = 1024 * 1024
+
+ for i := 0; i <= 400/(2*len(someutf8)); i++ {
+ d := bytes.Repeat(someutf8, i*2)
+ inputs = append(inputs, input{
+ name: fmt.Sprintf("small%d", len(d)),
+ data: d,
+ })
+ }
+
+ for _, i := range []int{1 * KiB, 8 * KiB, 16 * KiB, 64 * KiB, 1 * MiB, 8 * MiB, 32 * MiB, 64 * MiB} {
+ d := bytes.Repeat(someutf8, i/len(someutf8))
+ inputs = append(inputs, input{
+ name: fmt.Sprintf("%d", len(d)),
+ data: d,
+ })
+ }
+
+ for _, i := range []int{300, 316} {
+ d := bytes.Repeat(someutf8, i/len(someutf8))
+ inputs = append(inputs, input{
+ name: fmt.Sprintf("tail%d", len(d)),
+ data: d,
+ })
+ }
+ return inputs
+}
+
+func BenchmarkValid(b *testing.B) {
+ for _, input := range benchmarkInputs() {
+ b.Run(input.name, func(b *testing.B) {
+ b.SetBytes(int64(len(input.data)))
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ utf8.Valid(input.data)
+ }
+ })
+ }
+}
+func BenchmarkValidString(b *testing.B) {
+ for _, input := range benchmarkInputs() {
+ s := string(input.data)
+ b.Run(input.name, func(b *testing.B) {
+ b.SetBytes(int64(len(input.data)))
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ utf8.ValidString(s)
+ }
+ })
+ }
+}
+
+func FuzzValid(f *testing.F) {
+ f.Add(valid1k)
+ f.Add(valid1M)
+ f.Add(someutf8)
+ for _, e := range baseExamples {
+ f.Add([]byte(e))
+ }
+
+ f.Fuzz(check)
+}

To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: newchange
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
Gerrit-Change-Number: 535838
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
Gerrit-CC: Achille Roussel <achille...@gmail.com>

Gopher Robot (Gerrit)

unread,
Oct 16, 2023, 8:42:45 PM10/16/23
to Thomas Pelletier, goph...@pubsubhelper.golang.org, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

Attention is currently required from: Thomas Pelletier.

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://go.dev/doc/contribute#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://go.dev/s/release
for more details.

View Change

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 1
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-Comment-Date: Tue, 17 Oct 2023 00:42:41 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: No

    Thomas Pelletier (Gerrit)

    unread,
    Oct 16, 2023, 8:53:19 PM10/16/23
    to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

    Attention is currently required from: Thomas Pelletier.

    Thomas Pelletier uploaded patch set #2 to this change.

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: newpatchset
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 2

    Thomas Pelletier (Gerrit)

    unread,
    Oct 16, 2023, 8:55:08 PM10/16/23
    to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

    Attention is currently required from: Thomas Pelletier.

    Thomas Pelletier uploaded patch set #3 to this change.

    View Change

    unicode/utf8: make Valid use AVX2 on amd64
    Gerrit-PatchSet: 3

    Roland Shoemaker (Gerrit)

    unread,
    Oct 17, 2023, 1:25:26 PM10/17/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Gopher Robot, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

    Attention is currently required from: Thomas Pelletier.

    View Change

    1 comment:

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 3
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Roland Shoemaker <rol...@golang.org>
    Gerrit-Comment-Date: Tue, 17 Oct 2023 17:25:21 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No

    Achille Roussel (Gerrit)

    unread,
    Oct 17, 2023, 2:32:24 PM10/17/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Roland Shoemaker, Gopher Robot, triciu...@appspot.gserviceaccount.com, golang-co...@googlegroups.com

    Attention is currently required from: Roland Shoemaker, Thomas Pelletier.

    View Change

    1 comment:

    • File src/unicode/utf8/_asm/LICENSE:

      • Technically, this code will be released under the Go license, but the MIT license of the original work requires that we provide a copy of it.

        Do you know if there are precedents for incorporating MIT licensed code in the standard library?

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 3
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Roland Shoemaker <rol...@golang.org>
    Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
    Gerrit-Comment-Date: Tue, 17 Oct 2023 18:32:17 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Roland Shoemaker <rol...@golang.org>

    Ian Lance Taylor (Gerrit)

    unread,
    Oct 17, 2023, 2:47:17 PM10/17/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Roland Shoemaker, Gopher Robot, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

    Attention is currently required from: Achille Roussel, Roland Shoemaker, Thomas Pelletier.

    View Change

    1 comment:

    • File src/unicode/utf8/_asm/LICENSE:

      • Technically, this code will be released under the Go license, but the MIT license of the original wo […]

        The standard library should not add any additional code that has the requirement "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software." That is because it forces all people who distribute a Go program in binary form to include the new license with the new copyright statement "Copyright (c) 2021 Segment". We do not want to impose that requirement on all users of Go. Thanks.

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 3
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Roland Shoemaker <rol...@golang.org>
    Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-Attention: Achille Roussel <achille...@gmail.com>
    Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
    Gerrit-Comment-Date: Tue, 17 Oct 2023 18:47:12 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Achille Roussel <achille...@gmail.com>
    Comment-In-Reply-To: Roland Shoemaker <rol...@golang.org>

    Thomas Pelletier (Gerrit)

    unread,
    Nov 12, 2023, 9:09:59 AM11/12/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

    Attention is currently required from: Achille Roussel, Roland Shoemaker, Thomas Pelletier.

    Thomas Pelletier uploaded patch set #4 to this change.

    View Change

    A src/unicode/utf8/_asm/go.mod
    A src/unicode/utf8/_asm/go.sum
    A src/unicode/utf8/_asm/valid_asm.go
    A src/unicode/utf8/export_test.go
    M src/unicode/utf8/utf8.go
    M src/unicode/utf8/utf8_test.go
    A src/unicode/utf8/valid.go
    A src/unicode/utf8/valid_amd64.go
    A src/unicode/utf8/valid_amd64.s
    A src/unicode/utf8/valid_linux_test.go
    A src/unicode/utf8/valid_noasm.go
    A src/unicode/utf8/valid_test.go
    13 files changed, 1,399 insertions(+), 135 deletions(-)

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: newpatchset
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 4

    Thomas Pelletier (Gerrit)

    unread,
    Nov 12, 2023, 9:16:30 AM11/12/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Roland Shoemaker, Gopher Robot, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

    Attention is currently required from: Achille Roussel, Ian Lance Taylor, Roland Shoemaker.

    View Change

    1 comment:

    • File src/unicode/utf8/_asm/LICENSE:

      • The standard library should not add any additional code that has the requirement "The above copyrigh […]

        Segment/Twilio has graciously re-licensed the original code under MIT-0, removing the attribution requirement (https://github.com/segmentio/asm/pull/86). I've submitted patchset 4, removing this license file and adding a nod to the provenance of the code in the main generator program.

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 4
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Roland Shoemaker <rol...@golang.org>
    Gerrit-Attention: Achille Roussel <achille...@gmail.com>
    Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
    Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
    Gerrit-Comment-Date: Sun, 12 Nov 2023 14:16:25 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Achille Roussel <achille...@gmail.com>
    Comment-In-Reply-To: Ian Lance Taylor <ia...@golang.org>
    Comment-In-Reply-To: Roland Shoemaker <rol...@golang.org>

    Achille Roussel (Gerrit)

    unread,
    Nov 15, 2023, 12:47:35 PM11/15/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Roland Shoemaker, Gopher Robot, triciu...@appspot.gserviceaccount.com, golang-co...@googlegroups.com

    Attention is currently required from: Ian Lance Taylor, Roland Shoemaker, Thomas Pelletier.

    View Change

    1 comment:

    • File src/unicode/utf8/_asm/LICENSE:

      • Segment/Twilio has graciously re-licensed the original code under MIT-0, removing the attribution re […]

        Done

    To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-MessageType: comment
    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
    Gerrit-Change-Number: 535838
    Gerrit-PatchSet: 4
    Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-CC: Achille Roussel <achille...@gmail.com>
    Gerrit-CC: Gopher Robot <go...@golang.org>
    Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
    Gerrit-CC: Roland Shoemaker <rol...@golang.org>
    Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
    Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
    Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
    Gerrit-Comment-Date: Wed, 15 Nov 2023 17:47:31 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Thomas Pelletier <pelletie...@gmail.com>

    qiulaidongfeng (Gerrit)

    unread,
    Nov 16, 2023, 7:33:15 AM11/16/23
    to Thomas Pelletier, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Roland Shoemaker, Gopher Robot, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

    Attention is currently required from: Ian Lance Taylor, Roland Shoemaker, Thomas Pelletier.

    Patch set 4:Run-TryBot +1

    View Change

      To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
      Gerrit-Change-Number: 535838
      Gerrit-PatchSet: 4
      Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
      Gerrit-Reviewer: qiulaidongfeng <26454...@qq.com>
      Gerrit-CC: Achille Roussel <achille...@gmail.com>
      Gerrit-CC: Gopher Robot <go...@golang.org>
      Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
      Gerrit-CC: Roland Shoemaker <rol...@golang.org>
      Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
      Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
      Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
      Gerrit-Comment-Date: Thu, 16 Nov 2023 12:33:09 +0000
      Gerrit-HasComments: No
      Gerrit-Has-Labels: Yes

      Thomas Pelletier (Gerrit)

      unread,
      Nov 16, 2023, 9:43:42 AM11/16/23
      to Thomas Pelletier, goph...@pubsubhelper.golang.org, Gopher Robot, qiulaidongfeng, Ian Lance Taylor, Roland Shoemaker, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

      Attention is currently required from: Ian Lance Taylor, Roland Shoemaker, qiulaidongfeng.

      View Change

      1 comment:

      • Patchset:

        • Patch Set #4:

          1 of 46 TryBots failed. […]

          It looks like this test failed because trybot is using the latest commit of x/tools, which depends on the package go/version that was committed after this CL was opened (https://go-review.googlesource.com/c/go/+/538895). Rebased the change on the tip of master, which should fix that issue.

      To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-MessageType: comment
      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
      Gerrit-Change-Number: 535838
      Gerrit-PatchSet: 5
      Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
      Gerrit-Reviewer: Gopher Robot <go...@golang.org>
      Gerrit-Reviewer: qiulaidongfeng <26454...@qq.com>
      Gerrit-CC: Achille Roussel <achille...@gmail.com>
      Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
      Gerrit-CC: Roland Shoemaker <rol...@golang.org>
      Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
      Gerrit-Attention: qiulaidongfeng <26454...@qq.com>
      Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
      Gerrit-Comment-Date: Thu, 16 Nov 2023 14:43:38 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Gopher Robot <go...@golang.org>

      qiulaidongfeng (Gerrit)

      unread,
      Nov 16, 2023, 5:32:41 PM11/16/23
      to Thomas Pelletier, goph...@pubsubhelper.golang.org, Gopher Robot, Ian Lance Taylor, Roland Shoemaker, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com

      Attention is currently required from: Ian Lance Taylor, Roland Shoemaker, Thomas Pelletier.

      Patch set 5:Run-TryBot +1

      View Change

        To view, visit change 535838. To unsubscribe, or for help writing mail filters, visit settings.

        Gerrit-MessageType: comment
        Gerrit-Project: go
        Gerrit-Branch: master
        Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
        Gerrit-Change-Number: 535838
        Gerrit-PatchSet: 5
        Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
        Gerrit-Reviewer: Gopher Robot <go...@golang.org>
        Gerrit-Reviewer: qiulaidongfeng <26454...@qq.com>
        Gerrit-CC: Achille Roussel <achille...@gmail.com>
        Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
        Gerrit-CC: Roland Shoemaker <rol...@golang.org>
        Gerrit-Attention: Thomas Pelletier <pelletie...@gmail.com>
        Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
        Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
        Gerrit-Comment-Date: Thu, 16 Nov 2023 22:32:34 +0000
        Gerrit-HasComments: No
        Gerrit-Has-Labels: Yes

        Thomas Pelletier (Gerrit)

        unread,
        Apr 12, 2024, 9:51:07 AM4/12/24
        to Thomas Pelletier, goph...@pubsubhelper.golang.org, Gopher Robot, qiu laidongfeng2, Ian Lance Taylor, Roland Shoemaker, triciu...@appspot.gserviceaccount.com, Achille Roussel, golang-co...@googlegroups.com
        Attention needed from Ian Lance Taylor and Roland Shoemaker

        Thomas Pelletier added 1 comment

        Patchset-level comments
        File-level comment, Patchset 5 (Latest):
        Thomas Pelletier . resolved

        Hello, I'm wondering if anyone has input on this CL. Let me know if further discussion is needed on whether this feels too large or too risky of a change.

        I can try to upstream the required Avo changes, but it would be good to know if that's a blocker for merging this change.

        Open in Gerrit

        Related details

        Attention is currently required from:
        • Ian Lance Taylor
        • Roland Shoemaker
        Submit Requirements:
        • requirement is not satisfiedCode-Review
        • requirement satisfiedLegacy-TryBots-Pass
        • requirement satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        • requirement is not satisfiedTryBots-Pass
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: go
        Gerrit-Branch: master
        Gerrit-Change-Id: I31b2807aa3de40f8c5fc1594e0fba6141a5ad1bd
        Gerrit-Change-Number: 535838
        Gerrit-PatchSet: 5
        Gerrit-Owner: Thomas Pelletier <pelletie...@gmail.com>
        Gerrit-Reviewer: Gopher Robot <go...@golang.org>
        Gerrit-Reviewer: qiu laidongfeng2 <26454...@qq.com>
        Gerrit-CC: Achille Roussel <achille...@gmail.com>
        Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
        Gerrit-CC: Roland Shoemaker <rol...@golang.org>
        Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
        Gerrit-Attention: Roland Shoemaker <rol...@golang.org>
        Gerrit-Comment-Date: Fri, 12 Apr 2024 13:51:01 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        unsatisfied_requirement
        satisfied_requirement
        open
        diffy
        Reply all
        Reply to author
        Forward
        0 new messages