[go] rchive/zip: use utf8 encoding for non ASCII filename

257 views
Skip to first unread message

Shushan Chai (Gerrit)

unread,
Apr 25, 2015, 11:36:32 PM4/25/15
to Ian Lance Taylor, golang-co...@googlegroups.com
Shushan Chai uploaded a change:
https://go-review.googlesource.com/9381

rchive/zip: use utf8 encoding for non ASCII filename

If FileHeader.Flags Bit11 is not set, filename use different local
encoding on different platform.
For example, in chinese Win7 use GBK, in chinese OSX use utf8.

This change will force the non ASCII filename use utf8 encoding.

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

See Zip Spec 4.4.4 and APPENDIX D.

Change-Id: I27d2c3b15b01e844edab9c3875742633ee8a38ad
---
M src/archive/zip/writer.go
1 file changed, 6 insertions(+), 0 deletions(-)



diff --git a/src/archive/zip/writer.go b/src/archive/zip/writer.go
index 87ac694..957b110 100644
--- a/src/archive/zip/writer.go
+++ b/src/archive/zip/writer.go
@@ -11,6 +11,7 @@
"hash"
"hash/crc32"
"io"
+ "unicode/utf8"
)

// TODO(adg): support zip file comments
@@ -204,6 +205,11 @@
}
}

+ // if filename contains non ASCII character, force use utf8 encoding
+ if utf8.RuneCount([]byte(fh.Name)) != len(fh.Name) {
+ fh.Flags |= 1 << 11
+ }
+
fh.Flags |= 0x8 // we will write a data descriptor

fh.CreatorVersion = fh.CreatorVersion&0xff00 | zipVersion20 // preserve
compatibility byte

--
https://go-review.googlesource.com/9381

Minux Ma (Gerrit)

unread,
Apr 26, 2015, 12:38:03 AM4/26/15
to Shushan Chai, Minux Ma, golang-co...@googlegroups.com
Minux Ma has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1: Code-Review+1

(2 comments)

I think the change is fine (after addressing the comments),
but someone else need to approve it.

R=bradfitz

https://go-review.googlesource.com/#/c/9381/1//COMMIT_MSG
Commit Message:

Line 7: rchive/zip: use utf8 encoding for non ASCII filename
archive/zip


https://go-review.googlesource.com/#/c/9381/1/src/archive/zip/writer.go
File src/archive/zip/writer.go:

Line 210: fh.Flags |= 1 << 11
if we follow the tradition of assuming Go strings are always
encoded in utf-8, then we can always set the flag.
(Pure ASCII text is also valid UTF-8.)

Please also update the commit message after this change.

The only problem is whether there exists popular zip
implementation that doesn't recognize this flag and refuse
to work with files with this flag set.


--
https://go-review.googlesource.com/9381
Gerrit-Reviewer: Minux Ma <mi...@golang.org>
Gerrit-HasComments: Yes

Brad Fitzpatrick (Gerrit)

unread,
Apr 26, 2015, 8:48:02 AM4/26/15
to Shushan Chai, Minux Ma, Brad Fitzpatrick, golang-co...@googlegroups.com
Brad Fitzpatrick has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1:

(2 comments)

https://go-review.googlesource.com/#/c/9381/1/src/archive/zip/writer.go
File src/archive/zip/writer.go:

Line 209: RuneCount
if utf8.ValidString(fh.Name) {
....


Line 210: fh.Flags |= 1 << 11
> if we follow the tradition of assuming Go strings are always
add a new constant next to the other constants and use that here instead of
1<<11


--
https://go-review.googlesource.com/9381
Gerrit-Reviewer: Brad Fitzpatrick <brad...@golang.org>

Shushan Chai (Gerrit)

unread,
Apr 26, 2015, 5:53:50 PM4/26/15
to Minux Ma, Brad Fitzpatrick, golang-co...@googlegroups.com
Shushan Chai has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1:

(2 comments)

https://go-review.googlesource.com/#/c/9381/1/src/archive/zip/writer.go
File src/archive/zip/writer.go:

Line 209: RuneCount
> if utf8.ValidString(fh.Name) {
I think the pure ASCII text donot need use the utf8 encoding.
This will change the behavior for pure ascii text.


Line 210: fh.Flags |= 1 << 11
> if we follow the tradition of assuming Go strings are always
Pure ASCII text set UTF8 will change the behavior.
As you said, we donot know how the exists popular zip will recognize the
utf8 flag.


--
https://go-review.googlesource.com/9381
Gerrit-Reviewer: Brad Fitzpatrick <brad...@golang.org>
Gerrit-Reviewer: Minux Ma <mi...@golang.org>
Gerrit-Reviewer: Shushan Chai <chais...@gmail.com>
Gerrit-HasComments: Yes

Brad Fitzpatrick (Gerrit)

unread,
May 6, 2015, 2:42:56 PM5/6/15
to Shushan Chai, Brad Fitzpatrick, golang-co...@googlegroups.com
Brad Fitzpatrick has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1:

Shushan, are you going to keep working on this?

If not, please press the "Abandon" button.

--
https://go-review.googlesource.com/9381
Gerrit-Reviewer: Brad Fitzpatrick <brad...@golang.org>
Gerrit-Reviewer: Shushan Chai <chais...@gmail.com>
Gerrit-HasComments: No

Shushan Chai (Gerrit)

unread,
May 6, 2015, 7:24:43 PM5/6/15
to Brad Fitzpatrick, golang-co...@googlegroups.com
Shushan Chai has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1:

> Shushan, are you going to keep working on this?
>
> If not, please press the "Abandon" button.

Yes, i hope to fix this issue for chnese user.
But i need you review for this question:

Do we need change the pure ASCII text to utf8 encodeing?
I choose no change encoding for pure ASCII text.

I just hope valid utf8string(rune > 127) use utf8 encoding.

I need the answer.

Thanks.

Brad Fitzpatrick (Gerrit)

unread,
May 6, 2015, 7:28:46 PM5/6/15
to Shushan Chai, Brad Fitzpatrick, golang-co...@googlegroups.com
Brad Fitzpatrick has posted comments on this change.

rchive/zip: use utf8 encoding for non ASCII filename

Patch Set 1:

> > Shushan, are you going to keep working on this?
> >
> > If not, please press the "Abandon" button.
>
> Yes, i hope to fix this issue for chnese user.
> But i need you review for this question:
>
> Do we need change the pure ASCII text to utf8 encodeing?
> I choose no change encoding for pure ASCII text.
>
> I just hope valid utf8string(rune > 127) use utf8 encoding.
>
> I need the answer.
>
> Thanks.

Let's only set the bit if it's both not only ASCI (there is at least one
byte >= utf8.RuneSelf) and it's also valid utf8 (utf8.ValidString).

But make the bit be a constant next to the other package private constants,
not hard-coded.

Shushan Chai (Gerrit)

unread,
May 6, 2015, 11:07:31 PM5/6/15
to Brad Fitzpatrick, golang-co...@googlegroups.com
Shushan Chai uploaded a new patch set:
https://go-review.googlesource.com/9381

archive/zip: use utf8 encoding for non ASCII filename

If FileHeader.Flags Bit11 is not set, filename use different local
encoding on different platform.
For example, in chinese Win7 use GBK, in chinese OSX use utf8.

This change will force the non ASCII filename use utf8 encoding.

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

See Zip Spec 4.4.4 and APPENDIX D.

Change-Id: I27d2c3b15b01e844edab9c3875742633ee8a38ad
---
M src/archive/zip/struct.go
M src/archive/zip/writer.go
2 files changed, 9 insertions(+), 0 deletions(-)

Brad Fitzpatrick (Gerrit)

unread,
May 6, 2015, 11:09:06 PM5/6/15
to Shushan Chai, Brad Fitzpatrick, golang-co...@googlegroups.com
Brad Fitzpatrick has posted comments on this change.

archive/zip: use utf8 encoding for non ASCII filename

Patch Set 2:

(1 comment)

This needs tests.

https://go-review.googlesource.com/#/c/9381/2/src/archive/zip/writer.go
File src/archive/zip/writer.go:

Line 209: if utf8.RuneCount([]byte(fh.Name)) != len(fh.Name) &&
utf8.ValidString(fh.Name) {
this allocates. Just create a small function to do this instead. It'll also
be more readable.

func isASCII(s string) bool {
for _, c := range s {
if c >= utf8.RuneSelf { return false }
}
return true
}


--
https://go-review.googlesource.com/9381
Gerrit-Reviewer: Brad Fitzpatrick <brad...@golang.org>
Gerrit-Reviewer: Shushan Chai <chais...@gmail.com>
Gerrit-HasComments: Yes

Shushan Chai (Gerrit)

unread,
May 7, 2015, 2:22:34 AM5/7/15
to Brad Fitzpatrick, golang-co...@googlegroups.com
Shushan Chai uploaded a new patch set:
https://go-review.googlesource.com/9381

archive/zip: use utf8 encoding for non ASCII filename

If FileHeader.Flags Bit11 is not set, filename use different local
encoding on different platform.
For example, in chinese Win7 use GBK, in chinese OSX use utf8.

This change will force the non ASCII filename use utf8 encoding.

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

See Zip Spec 4.4.4 and APPENDIX D.

Change-Id: I27d2c3b15b01e844edab9c3875742633ee8a38ad
---
M src/archive/zip/struct.go
M src/archive/zip/writer.go
M src/archive/zip/writer_test.go
3 files changed, 30 insertions(+), 0 deletions(-)

Shushan Chai (Gerrit)

unread,
May 7, 2015, 3:23:02 AM5/7/15
to Brad Fitzpatrick, golang-co...@googlegroups.com
Shushan Chai has abandoned this change.

Change subject: archive/zip: use utf8 encoding for non ASCII filename
......................................................................


Abandoned

If this CL accept, we can't create a gbk local encoding for chinese user.
We need new api for local encoding.
I create a new issue:
https://github.com/golang/go/issues/10741
Reply all
Reply to author
Forward
0 new messages