Joseph Tsai has uploaded this change for review.
encoding/json: zstd compress the testdata
There is a non-public zstd decoder in the stdlib (CL 473356).
Using zstd v1.4.4, golang_source.json can be recompressed
with `zstd -19` to become ~91.502KiB large instead of the gzip'd
size of ~117.609KiB. This is a reduction of 22%.
This commit will increase the Git history by ~92KiB,
but reduce the final Go release by ~26KiB.
Presumably there are orders and orders of magnitude more downloads
of the final release than there are checkouts of the Git repo.
This commit will set the precedence for using zstd for other
testdata in the stdlib. For example, the largest testdata directory
is internal/trace/testdata at ~2.218MiB. With gzip compression
using zopfli, we can reduce this to ~1.212MiB.
However, with zstd, we can reduce this to ~1.101MiB.
Another compression format with better ratios is XZ,
which results in a size of ~0.951MiB.
However, there is no native XZ decompressor in the stdlib,
the LZMA algorithm is not well specified, and
LZMA decompression is really slow.
Zstandard probably represents the best balance of tradeoffs
for the next decade to come. It is well-specified,
fast to decompress, and an implementation exists in stdlib.
Change-Id: I6da2df27bd260befc0a44c6bc0255365be0a5b0f
---
M src/encoding/json/bench_test.go
D src/encoding/json/testdata/code.json.gz
A src/encoding/json/testdata/code.json.zst
3 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/src/encoding/json/bench_test.go b/src/encoding/json/bench_test.go
index f7bcf80..a93ecde 100644
--- a/src/encoding/json/bench_test.go
+++ b/src/encoding/json/bench_test.go
@@ -12,9 +12,9 @@
import (
"bytes"
- "compress/gzip"
"fmt"
"internal/testenv"
+ "internal/zstd"
"io"
"os"
"reflect"
@@ -44,15 +44,12 @@
var codeStruct codeResponse
func codeInit() {
- f, err := os.Open("testdata/code.json.gz")
+ f, err := os.Open("testdata/code.json.zst")
if err != nil {
panic(err)
}
defer f.Close()
- gz, err := gzip.NewReader(f)
- if err != nil {
- panic(err)
- }
+ gz := zstd.NewReader(f)
data, err := io.ReadAll(gz)
if err != nil {
panic(err)
diff --git a/src/encoding/json/testdata/code.json.gz b/src/encoding/json/testdata/code.json.gz
deleted file mode 100644
index 1572a92..0000000
--- a/src/encoding/json/testdata/code.json.gz
+++ /dev/null
Binary files differ
diff --git a/src/encoding/json/testdata/code.json.zst b/src/encoding/json/testdata/code.json.zst
new file mode 100644
index 0000000..f8ec690
--- /dev/null
+++ b/src/encoding/json/testdata/code.json.zst
Binary files differ
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Daniel Martí, Ian Lance Taylor.
Patch set 1:Run-TryBot +1
Attention is currently required from: Daniel Martí, Ian Lance Taylor, Joseph Tsai.
1 comment:
Patchset:
Failure looks real:
```
panic: zstd decompression error at 23310: offset past window
```
will look into it later.
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Ian Lance Taylor, Joseph Tsai.
1 comment:
Patchset:
Could we document what we used to generate the zstd archive? In particular, what zstd version and what compression level, so that other files can be generated in a similar way as well. I assume we want a higher compression level, assuming that doesn't significantly increase the decompression cpu cost or memory usage.
Generally I'm in favor of this change for the reasons you outline. I'd be against it if zstd was very new and a better compression mechanism might come soon, to avoid noise, but at this point zstd has been well known as a good middle ground for nearly a decade. It's unlikely we'll do this again in at least another ten years.
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Patchset:
Could we document what we used to generate the zstd archive? In particular, what zstd version and wh […]
Err, you did specify `zstd -19` at the top of the commit message. So just the version would be missing, for clarity.
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Patchset:
Failure looks real: […]
Could be that the built-in zstd decoder can't handle high compression levels, since those might use larger windows or more complex parts of the algorithm/encoding.
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Daniel Martí, Joseph Tsai.
1 comment:
Patchset:
Could be that the built-in zstd decoder can't handle high compression levels, since those might use […]
If the problem were a large window, I would expect a consistent failure, not one that only occurs with the race detector.
Unfortunately, I haven't been able to recreate the problem.
There aren't any global variables in internal/zstd, so it's hard to understand how there could be a race with a simple call to zstd.NewReader.
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Ian Lance Taylor, Joseph Tsai.
1 comment:
Patchset:
If the problem were a large window, I would expect a consistent failure, not one that only occurs wi […]
To view, visit change 525516. To unsubscribe, or for help writing mail filters, visit settings.
| Commit-Queue | +1 |
| Run-TryBot | +1 |
Rebasing as it seems that internal/zstd has since been fixed.
I think we should still pursue this as it will set the precedence for future testdata that we may add for the json/v2 work.
See CL 665796.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Updated code.json.zst to use the exact same binary as what @dneil is using in CL 665796.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +2 |
Thanks. (The cmd/go.TestScript/list_empty_importpath failure looks unrelated to the changes here.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Auto-Submit | +1 |
| TryBot-Bypass | +1 |
Damien NeilTryBots beginning. Status page: https://farmer.golang.org/try?commit=3b80926d
Acknowledged
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
encoding/json: use zstd compressed testdata
There is a non-public zstd decoder in the stdlib (CL 473356) and
also zstd compressed testdata already present.
Delete testdata/code.json.gz and
instead use internal/jsontest/testdata/golang_source.json.zst,
which has exactly the same content:
$ cat internal/jsontest/testdata/golang_source.json.zst | zstd -d | sha1sum
3f70b6fd429f4aba3e8e1c3e5a294c8f2e219a6e -
$ cat testdata/code.json.gz | zstd -d | sha1sum
3f70b6fd429f4aba3e8e1c3e5a294c8f2e219a6e -
This will reduce the size of the final Go release by 118KB.
Updates #71845
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |