gzip: compression ratio

1,092 views
Skip to first unread message

bsr

unread,
Oct 8, 2013, 6:56:03 AM10/8/13
to golan...@googlegroups.com
Hello,

I am trying to see the compression ratio of a json string. Is this program correct. gzipping the json string yields 94% reduction in size?


package main

import (
"bytes"
"compress/gzip"
"encoding/json"
"fmt"
"strconv"
)

const (
_           = iota
KB ByteSize = 1 << (10 * iota)
MB
)

type ByteSize float64

func (b ByteSize) String() string {
switch {
case b >= MB:
return fmt.Sprintf("%.2fMB", b/MB)
case b >= KB:
return fmt.Sprintf("%.2fKB", b/KB)
}
return fmt.Sprintf("%.2fB", b)
}

type Type struct {
I int64
N string
B bool
M int
}
type Cont struct {
I int64
A int64
B int64
C Type
D string
E int

F string
G int64
H int
J int
}

func main() {

var (
c  Cont
tp Type
)

cmap := make(map[string]Cont)
for i := 0; i < 1000; i++ {

tp = Type{
I: int64(8888888888888 + i),
N: "ASASASDFSFSFFFSFS",
B: false,
M: 25,
}
c = Cont{
I: int64(9999999999999 + i),
A: int64(8888888888888),
B: int64(9999999999999 + i),
C: tp,
D: "A",
E: 1,
F: "ASASDASASASASASA",
G: int64(8888888888888),
H: 10,
J: 1,
}

cmap[strconv.FormatInt(c.I, 10)] = c
}

b, err := json.Marshal(&cmap)
if err != nil {
panic(err)
}

fmt.Printf("Json string : %s \n", ByteSize(len(string(b))))

buf := new(bytes.Buffer)
gz := gzip.NewWriter(buf)

_, err = gz.Write(b)
if err != nil {
panic(err)
}

err = gz.Close()
if err != nil {
panic(err)
}

fmt.Printf("Buf : %s \n", ByteSize(buf.Len()))
fmt.Printf("Savings: %.2f %% \n", 100*(1-(float64(buf.Len())/float64(len(string(b))))))

}

chris dollin

unread,
Oct 8, 2013, 7:02:04 AM10/8/13
to bsr, golang-nuts
On 8 October 2013 11:56, bsr <bsr...@gmail.com> wrote:

Hello,

I am trying to see the compression ratio of a json string. Is this program correct. gzipping the json string yields 94% reduction in size?



I don't know whether the program is right, but that's a lot of lot of lot of
repetition in the string you're zipping up; it doesn't seem unreasonable
that it can compress by a factor of 20. This will of course not be a
/typical/ compression ratio.

Chris

--
Chris "allusive" Dollin

bsr

unread,
Oct 8, 2013, 7:34:21 AM10/8/13
to golan...@googlegroups.com, bsr, ehog....@googlemail.com
Thanks Chris. I was trying to add randomness, and made this change. (Still, there are repetition in Json keys.)

 the output is way different for local and play server. Is there anything to do with changes recently in std lib?


play server:
Json string : 201.17KB 
Buf : 17.18KB 
Savings: 91.46 % 

local:
Json string : 201.17KB 
Buf : 47.03KB 
Savings: 76.62 % 

go version
go version go1.1.2 darwin/amd64

Coda Hale

unread,
Oct 8, 2013, 11:59:45 AM10/8/13
to bsr, golan...@googlegroups.com
play.golang.org is sandboxed, which means crypto/rand (which usually reads from /dev/urandom) returns all zeros:


This adds dramatically less entropy to the input, allowing for higher compression ratios.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Coda Hale
http://codahale.com

bsr

unread,
Oct 8, 2013, 12:06:24 PM10/8/13
to golan...@googlegroups.com, bsr
great explanation. thank you very much.
Reply all
Reply to author
Forward
0 new messages