[text] encoding/simplifiedchinese: Fixes € encoding in GB18030

5 views
Skip to first unread message

Gerrit Bot (Gerrit)

unread,
Oct 2, 2021, 9:23:28 PM10/2/21
to goph...@pubsubhelper.golang.org, Alexander Yastrebov, golang-co...@googlegroups.com

Gerrit Bot has uploaded this change for review.

View Change

encoding/simplifiedchinese: Fixes € encoding in GB18030

The euro sign is an exception which is given a single byte code of 0x80
in Microsoft's later versions of CP936/GBK and a two byte code of A2 E3
in GB18030. https://en.wikipedia.org/wiki/GB_18030#cite_note-4

Fixes golang/go#48691

Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
GitHub-Last-Rev: ea0bf9b4285972d5c7383cb55986b4ff72e5c840
GitHub-Pull-Request: golang/text#26
---
M encoding/simplifiedchinese/gbk.go
M encoding/simplifiedchinese/all_test.go
2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/encoding/simplifiedchinese/all_test.go b/encoding/simplifiedchinese/all_test.go
index a556c94..01d609f 100644
--- a/encoding/simplifiedchinese/all_test.go
+++ b/encoding/simplifiedchinese/all_test.go
@@ -40,7 +40,6 @@
{enc, HZGB2312, "a갂", "a"},
{enc, HZGB2312, "\u6cf5갂", "~{1C~}"},

- {dec, GB18030, "\x80", "€"},
{dec, GB18030, "\x81", "\ufffd"},
{dec, GB18030, "\x81\x20", "\ufffd "},
{dec, GB18030, "\xfe\xfe", "\ufffd"},
@@ -125,6 +124,14 @@
encPrefix: "~{",
encoded: ";(<dR;:x>F#,6@WCN^O`GW!#",
utf8: "花间一壶酒,独酌无相亲。",
+ }, {
+ e: GBK,
+ encoded: "\x80",
+ utf8: "€",
+ }, {
+ e: GB18030,
+ encoded: "\xa2\xe3",
+ utf8: "€",
}}

for _, tc := range testCases {
diff --git a/encoding/simplifiedchinese/gbk.go b/encoding/simplifiedchinese/gbk.go
index b89c45b..dc684e7 100644
--- a/encoding/simplifiedchinese/gbk.go
+++ b/encoding/simplifiedchinese/gbk.go
@@ -55,7 +55,7 @@
// Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
// as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
// says to treat "gbk" as Code Page 936.
- case c0 == 0x80:
+ case !d.gb18030 && c0 == 0x80:
r, size = '€', 1

case c0 < 0xff:
@@ -180,7 +180,7 @@
// Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
// as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
// says to treat "gbk" as Code Page 936.
- if r == '€' {
+ if !e.gb18030 && r == '€' {
r = 0x80
goto write1
}

To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: text
Gerrit-Branch: master
Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
Gerrit-Change-Number: 353712
Gerrit-PatchSet: 1
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
Gerrit-MessageType: newchange

Gerrit Bot (Gerrit)

unread,
Oct 2, 2021, 11:13:27 PM10/2/21
to Alexander Yastrebov, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Marcel van Lohuizen.

Gerrit Bot uploaded patch set #2 to this change.

View Change

encoding/simplifiedchinese: Fixes € encoding in GB18030

The euro sign is an exception which is given a single byte code of 0x80
in Microsoft's later versions of CP936/GBK and a two byte code of A2 E3
in GB18030. https://en.wikipedia.org/wiki/GB_18030#cite_note-4

Fixes golang/go#48691

Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
GitHub-Last-Rev: acbbc50f20d663452f8da77cf2a66d8d893bec1d

GitHub-Pull-Request: golang/text#26
---
M encoding/simplifiedchinese/gbk.go
M encoding/simplifiedchinese/all_test.go
2 files changed, 32 insertions(+), 1 deletion(-)

To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: text
Gerrit-Branch: master
Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
Gerrit-Change-Number: 353712
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Marcel van Lohuizen <mp...@golang.org>
Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
Gerrit-CC: Go Bot <go...@golang.org>
Gerrit-Attention: Marcel van Lohuizen <mp...@golang.org>
Gerrit-MessageType: newpatchset

Nigel Tao (Gerrit)

unread,
Oct 4, 2021, 8:26:58 AM10/4/21
to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Go Bot, golang-co...@googlegroups.com

Patch set 2:Code-Review +2

View Change

    To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: text
    Gerrit-Branch: master
    Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
    Gerrit-Change-Number: 353712
    Gerrit-PatchSet: 2
    Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
    Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
    Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
    Gerrit-CC: Go Bot <go...@golang.org>
    Gerrit-Comment-Date: Mon, 04 Oct 2021 12:26:51 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: Yes
    Gerrit-MessageType: comment

    Nigel Tao (Gerrit)

    unread,
    Oct 4, 2021, 8:27:13 AM10/4/21
    to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Go Bot, golang-co...@googlegroups.com

    Patch set 2:Trust +1

    View Change

      To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-Project: text
      Gerrit-Branch: master
      Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
      Gerrit-Change-Number: 353712
      Gerrit-PatchSet: 2
      Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
      Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
      Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
      Gerrit-CC: Go Bot <go...@golang.org>
      Gerrit-Comment-Date: Mon, 04 Oct 2021 12:27:07 +0000

      Nigel Tao (Gerrit)

      unread,
      Oct 4, 2021, 8:27:46 AM10/4/21
      to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Dmitri Shuralyov, Go Bot, golang-co...@googlegroups.com

      Attention is currently required from: Dmitri Shuralyov.

      View Change

      1 comment:

      To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-Project: text
      Gerrit-Branch: master
      Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
      Gerrit-Change-Number: 353712
      Gerrit-PatchSet: 2
      Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
      Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
      Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
      Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
      Gerrit-CC: Go Bot <go...@golang.org>
      Gerrit-Attention: Dmitri Shuralyov <dmit...@golang.org>
      Gerrit-Comment-Date: Mon, 04 Oct 2021 12:27:39 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Gerrit-MessageType: comment

      Alberto Donizetti (Gerrit)

      unread,
      Oct 4, 2021, 8:30:19 AM10/4/21
      to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Dmitri Shuralyov, Nigel Tao, Go Bot, golang-co...@googlegroups.com

      Attention is currently required from: Dmitri Shuralyov.

      Patch set 2:Trust +1

      View Change

        To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

        Gerrit-Project: text
        Gerrit-Branch: master
        Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
        Gerrit-Change-Number: 353712
        Gerrit-PatchSet: 2
        Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
        Gerrit-Reviewer: Alberto Donizetti <alb.do...@gmail.com>
        Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
        Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
        Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
        Gerrit-CC: Go Bot <go...@golang.org>
        Gerrit-Attention: Dmitri Shuralyov <dmit...@golang.org>
        Gerrit-Comment-Date: Mon, 04 Oct 2021 12:30:11 +0000

        Nigel Tao (Gerrit)

        unread,
        Oct 4, 2021, 8:54:16 AM10/4/21
        to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Alberto Donizetti, Dmitri Shuralyov, Go Bot, golang-co...@googlegroups.com

        Attention is currently required from: Dmitri Shuralyov.

        Patch set 2:Run-TryBot +1

        View Change

          To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

          Gerrit-Project: text
          Gerrit-Branch: master
          Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
          Gerrit-Change-Number: 353712
          Gerrit-PatchSet: 2
          Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
          Gerrit-Reviewer: Alberto Donizetti <alb.do...@gmail.com>
          Gerrit-Reviewer: Dmitri Shuralyov <dmit...@golang.org>
          Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
          Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
          Gerrit-CC: Go Bot <go...@golang.org>
          Gerrit-Attention: Dmitri Shuralyov <dmit...@golang.org>
          Gerrit-Comment-Date: Mon, 04 Oct 2021 12:54:09 +0000

          Nigel Tao (Gerrit)

          unread,
          Oct 4, 2021, 8:54:23 AM10/4/21
          to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, Dmitri Shuralyov, Alberto Donizetti, Go Bot, golang-co...@googlegroups.com

          Nigel Tao removed Dmitri Shuralyov from this change.

          View Change

          To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

          Gerrit-Project: text
          Gerrit-Branch: master
          Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
          Gerrit-Change-Number: 353712
          Gerrit-PatchSet: 2
          Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
          Gerrit-Reviewer: Alberto Donizetti <alb.do...@gmail.com>
          Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
          Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
          Gerrit-CC: Go Bot <go...@golang.org>
          Gerrit-MessageType: deleteReviewer

          Nigel Tao (Gerrit)

          unread,
          Oct 4, 2021, 8:59:53 AM10/4/21
          to Alexander Yastrebov, Gerrit Bot, goph...@pubsubhelper.golang.org, golang-...@googlegroups.com, Go Bot, Alberto Donizetti, golang-co...@googlegroups.com

          Nigel Tao submitted this change.

          View Change


          Approvals: Nigel Tao: Looks good to me, approved; Trusted; Run TryBots Alberto Donizetti: Trusted Go Bot: TryBots succeeded
          encoding/simplifiedchinese: Fixes € encoding in GB18030

          The euro sign is an exception which is given a single byte code of 0x80
          in Microsoft's later versions of CP936/GBK and a two byte code of A2 E3
          in GB18030. https://en.wikipedia.org/wiki/GB_18030#cite_note-4

          Fixes golang/go#48691

          Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
          GitHub-Last-Rev: acbbc50f20d663452f8da77cf2a66d8d893bec1d
          GitHub-Pull-Request: golang/text#26
          Reviewed-on: https://go-review.googlesource.com/c/text/+/353712
          Reviewed-by: Nigel Tao <nige...@golang.org>
          Trust: Nigel Tao <nige...@golang.org>
          Trust: Alberto Donizetti <alb.do...@gmail.com>
          Run-TryBot: Nigel Tao <nige...@golang.org>
          TryBot-Result: Go Bot <go...@golang.org>

          ---
          M encoding/simplifiedchinese/gbk.go
          M encoding/simplifiedchinese/all_test.go
          2 files changed, 38 insertions(+), 1 deletion(-)

          diff --git a/encoding/simplifiedchinese/all_test.go b/encoding/simplifiedchinese/all_test.go
          index a556c94..fbb623c 100644
          --- a/encoding/simplifiedchinese/all_test.go
          +++ b/encoding/simplifiedchinese/all_test.go
          @@ -40,7 +40,9 @@

          {enc, HZGB2312, "a갂", "a"},
          {enc, HZGB2312, "\u6cf5갂", "~{1C~}"},

          +		{dec, GBK, "\xa2\xe3", "€"},

          {dec, GB18030, "\x80", "€"},
          +

          {dec, GB18030, "\x81", "\ufffd"},
          {dec, GB18030, "\x81\x20", "\ufffd "},
          {dec, GB18030, "\xfe\xfe", "\ufffd"},
          @@ -125,6 +127,14 @@

          encPrefix: "~{",
          encoded: ";(<dR;:x>F#,6@WCN^O`GW!#",
          utf8: "花间一壶酒,独酌无相亲。",
          + }, {
          + e: GBK,
          + encoded: "\x80",
          + utf8: "€",
          + }, {
          + e: GB18030,
          + encoded: "\xa2\xe3",
          + utf8: "€",
          }}

          for _, tc := range testCases {
          diff --git a/encoding/simplifiedchinese/gbk.go b/encoding/simplifiedchinese/gbk.go
          index b89c45b..0e0fabf 100644
          --- a/encoding/simplifiedchinese/gbk.go
          +++ b/encoding/simplifiedchinese/gbk.go
          @@ -55,6 +55,8 @@

          // Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
          // as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
          // says to treat "gbk" as Code Page 936.
          +		// GBK’s decoder is gb18030’s decoder. https://encoding.spec.whatwg.org/#gbk-decoder
          + // If byte is 0x80, return code point U+20AC. https://encoding.spec.whatwg.org/#gb18030-decoder
          case c0 == 0x80:

          r, size = '€', 1

          @@ -180,7 +182,9 @@

          // Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
          // as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
          // says to treat "gbk" as Code Page 936.
          - if r == '€' {
          +				// GBK’s encoder is gb18030’s encoder with its _is GBK_ set to true. https://encoding.spec.whatwg.org/#gbk-encoder
          + // If _is GBK_ is true and code point is U+20AC, return byte 0x80. https://encoding.spec.whatwg.org/#gb18030-encoder

          + if !e.gb18030 && r == '€' {
          r = 0x80
          goto write1
          }

          To view, visit change 353712. To unsubscribe, or for help writing mail filters, visit settings.

          Gerrit-Project: text
          Gerrit-Branch: master
          Gerrit-Change-Id: I6a4460274d4313ad1d03bcd8070373af674691eb
          Gerrit-Change-Number: 353712
          Gerrit-PatchSet: 3
          Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
          Gerrit-Reviewer: Alberto Donizetti <alb.do...@gmail.com>
          Gerrit-Reviewer: Go Bot <go...@golang.org>
          Gerrit-Reviewer: Nigel Tao <nige...@golang.org>
          Gerrit-CC: Alexander Yastrebov <yastreb...@gmail.com>
          Gerrit-MessageType: merged
          Reply all
          Reply to author
          Forward
          0 new messages