Base64 performance

1,969 views
Skip to first unread message

Kiswono Prayogo

unread,
Apr 12, 2015, 9:36:14 PM4/12/15
to golan...@googlegroups.com
Why Go perform quite bad compared to another implementation on this benchmark.
the code:
import "encoding/base64"
import "fmt"
import "time"
import "strings"
func main() {
STR_SIZE := 10000000
TRIES := 100
str := strings.Repeat("a", STR_SIZE)
str2 := ""
bytes := []byte(str)
coder := base64.StdEncoding
t := time.Now()
s := uint64(0)
for i := 0; i < TRIES; i += 1 {
str2 = coder.EncodeToString(bytes)
s += uint64(len(str2))
}
fmt.Printf("encode: %d, %.4f\n", s, float32(time.Since(t).Seconds()))
t = time.Now()
s = 0
for i := 0; i < TRIES; i += 1 {
str3, _ := coder.DecodeString(str2)
s += uint64(len(str3))
}
fmt.Printf("decode: %d, %.4f\n", s, float32(time.Since(t).Seconds()))
}

the result:
LanguageTime,sMemory, Mb
D Gdc2.4844.3
C2.7032.3
Ruby2.73125.3
D Ldc3.2744.1
Crystal3.3582.4
Nim4.1352.4
Ruby Rbx4.2930.7
C++ Openssl5.4565.2
D6.1889.1
Python7.6252.6
Rust7.4042.9
Javascript Node7.93777.1
Python Pypy8.22114.6
Julia8.91378.2
Ruby JRuby16.76496.6
Ruby JRuby9k17.72417.1
Go21.2494.2
Scala35.06301.2

Haddock

unread,
Apr 13, 2015, 3:21:02 AM4/13/15
to golan...@googlegroups.com

From a first look I wouldn't say that Go is doing bad. In most benchmarks Go is in the performance league with Java. Sometimes a bit faster, sometimes a bit slower. But all in all about in the same league as Java. In this benchmark Go is about 1,5x faster than Scala (which is JVM). The performance of Scala seems a bit lacking. Would be interesting to see how well Java would be doing. The other languages listed call C libraries (Ruby, Python) or are close to the metal (Rust, D, Nim). So that's no wonder they are fast.

Konstantin Khomoutov

unread,
Apr 13, 2015, 4:07:05 AM4/13/15
to Kiswono Prayogo, golan...@googlegroups.com
On Sun, 12 Apr 2015 18:36:14 -0700 (PDT)
Kiswono Prayogo <kis...@gmail.com> wrote:

> Why Go perform quite bad compared to another implementation on this
> benchmark <https://github.com/kostya/benchmarks/tree/master/base64>.

Please read [1] for a start.

The second thing to notice is that Go currently has at least two mature
implementations (the "standard" one, initially came out of Google and
dubbed "gc", and another one, which is a fron-end to GCC, dubbed
"gccgo"), and they have different performance characteristics -- mainly
because the gc uses its own compiler (and assembler) while gccgo, as
all things GCC, compiles first to some intermediate representation
which then gets heavily optimized by GCC engine which compiles down to
machine code. With this, you lose the gc's blazing compilation speed
but can gain computation speed improvements.

If you feel adventurous, do research on profiling tools available for
gc and gccgo, and profile that benchmark code; try to understand where
is go slow, and why. Searching this list's archive by the word
"profile" would yield lots of interesting posts about this matter.

All in all, please try imagining you a Go developer: you read a
question "why Go is slower in some random benchmark than language Foo?";
what would *your* answer? Please don't answer right away, think first.

1. http://tip.golang.org/doc/faq#Why_does_Go_perform_badly_on_benchmark_x

Egon

unread,
Apr 13, 2015, 4:24:29 AM4/13/15
to golan...@googlegroups.com
It simply looks like the base64 code for Go has not been optimized. With some trivial changes https://gist.github.com/egonelbre/dbe66ea24edd4db6dac5 it went from:

BenchmarkEncodeToString    20000             74759 ns/op         109.58 MB/s
to:
BenchmarkEncodeToString    20000             64458 ns/op         127.09 MB/s

Of course I'm guessing there's still room for improvement.

+ Egon
...

Haddock

unread,
Apr 13, 2015, 4:43:45 AM4/13/15
to golan...@googlegroups.com
I quickly ported the Scala code to Java (Java code at the end of the post). On my machine (Core i7-4600M, 2.90 GHz, quad-core, Windows 7) I get these results:

Go 1.4.1:

encode: 1333333600, 3.4110
decode: 1000000000, 15.3430

Scala (JDK8):

encode: 1368421200, 13.783206397
decode: 1000000000, 26.121356501

Java (JDK8):

encode: 1368421200, 31.008432298
decode: 1000000000, 58.633304231


So Go performs remarkably better than Scala. Quite astonishing that Java is so much slower than Scala. To me Go is doing really well compared to other languages that also don't call C (admittedly, I don't exactly know what the Java classes BASE64Encoder, BASE64Decoder are exactly doing).

-- H.

Java code:

mport sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;

import java.io.IOException;

/**
* Created by plohmann on 13.04.2015.
*/
public class Base64 {

public static void main(String[] args) throws IOException {

BASE64Encoder enc = new sun.misc.BASE64Encoder();
BASE64Decoder dec = new sun.misc.BASE64Decoder();

int STR_SIZE = 10000000;
int TRIES = 100;

StringBuffer buffer = new StringBuffer();
for (int i = 0; i < STR_SIZE; i++) {
buffer.append("a");
}
String str = buffer.toString();
String str2 = "";
long t = System.nanoTime();
long s = 0;

for (int i = 0; i < TRIES; i++) {
str2 = enc.encode(str.getBytes());
s += str2.length();
}

System.out.println("encode: " + s + ", " + (System.nanoTime() - t)/1e9);
s = 0;

for (int i = 0; i < TRIES; i++) {
byte[] str3 = dec.decodeBuffer(str2);
s += str3.length;
}

System.out.println("decode: " + s + ", " + (System.nanoTime() - t)/1e9);
}
}

Damian Gryski

unread,
Apr 13, 2015, 5:23:55 AM4/13/15
to golan...@googlegroups.com


On Monday, April 13, 2015 at 10:24:29 AM UTC+2, Egon wrote:
It simply looks like the base64 code for Go has not been optimized. With some trivial changes https://gist.github.com/egonelbre/dbe66ea24edd4db6dac5 it went from:

BenchmarkEncodeToString    20000             74759 ns/op         109.58 MB/s
to:
BenchmarkEncodeToString    20000             64458 ns/op         127.09 MB/s

Of course I'm guessing there's still room for improvement.


Will you submit your changes for 1.5?

Damian

Egon

unread,
Apr 13, 2015, 5:45:59 AM4/13/15
to golan...@googlegroups.com
Sure, I can.
 

Damian

Jedy

unread,
Apr 13, 2015, 5:48:19 AM4/13/15
to Damian Gryski, golan...@googlegroups.com
pprof shows "s = strings.Map(removeNewlinesMapper, s)" in base64.DecodeString takes 50% time to run. Simply changing it to "s = strings.Replace(strings.Replace(s, "\n", "", -1), "\r", "", -1)" will make a huge difference. So the base64 module is far from optimized.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Egon

unread,
Apr 13, 2015, 5:52:19 AM4/13/15
to golan...@googlegroups.com, dgr...@gmail.com
On Monday, 13 April 2015 12:48:19 UTC+3, Jedy Wu wrote:
pprof shows "s = strings.Map(removeNewlinesMapper, s)" in base64.DecodeString takes 50% time to run. Simply changing it to "s = strings.Replace(strings.Replace(s, "\n", "", -1), "\r", "", -1)" will make a huge difference. So the base64 module is far from optimized.

I'll incorporate that as well.

Egon

unread,
Apr 13, 2015, 6:30:39 AM4/13/15
to golan...@googlegroups.com, dgr...@gmail.com

Konstantin Shaposhnikov

unread,
Apr 13, 2015, 12:42:27 PM4/13/15
to golan...@googlegroups.com
Just want to make a few notes on the Java/Scala benchmark mostly to demonstrate that benchmarks like this not that useful/accurate.

1. The Java code that you provided is not exact equivalent of the original Scala code, str variable  that is declared as val in Scala should be declared final in Java (so Java will optimize str.getBytes() to be called only once):

  final String str = ""

2. When micro benchmarking Java you should generally warm up the hotspot compiler by calling the method that you are benchmarking a few times. Otherwise you are benchmarking un-optimized interpreted version of the method. Even better use a benchmarking framework (e.g. JMH). 

3. There is a bug in both Scala and Java benchmark code, t is not initialized before decode.

4. Finally C version and Java/Scala version are not exactly equivalent as Java/Scala version in addition to base64 also converts result to String.

The final benchmark that takes all above points in consideration completes in 2/4.5 seconds for encode/decode instead of original 27/16 seconds on my laptop):

import java.io.IOException;
import java.util.Base64;

public class Base64Perf {

    public static void main(String[] args) throws IOException {
        long s = 0;
        
        Base64.Encoder enc = Base64.getEncoder();
        // warm up
        for (int i = 0; i < 10000; i++) {
            s += enc.encode(new byte[] { 1 }).length;
        }
        
        Base64.Decoder dec = Base64.getDecoder();
        for (int i = 0; i < 10000; i++) {
            s += dec.decode("MQ==".getBytes()).length;
        }

        int STR_SIZE = 10000000;
        int TRIES = 100;

        byte[] bytes = new byte[STR_SIZE];
        for (int i = 0; i < STR_SIZE; i++) {
            bytes[i] = 'a';
        }
        byte[] bytes2 = null;

        long t = System.nanoTime();
        s = 0;
        for (int i = 0; i < TRIES; i++) {
            bytes2 = enc.encode(bytes);
            s += bytes2.length;
        }
        
        System.out.println("encode: " + s + ", " + (System.nanoTime() - t)/1e9);
        
        t = System.nanoTime();
        s = 0;

        for (int i = 0; i < TRIES; i++) {
            byte[] bytes3 = dec.decode(bytes2);
            s += bytes3.length;

Haddock

unread,
Apr 13, 2015, 4:21:17 PM4/13/15
to golan...@googlegroups.com
I gave the Java version some more memory (VM options -server -Xss15500k) and all of a sudden it had about the same performance as Scala. Looks like the Scala for loop creates less garbage.

Jour Java code with more memory and using Go 1.4 I now get on my other machine:

Go:

encode: 1333333600, 8.0265

decode: 1000000000, 28.5536


Java:

encode: 1333333600, 3.185907024
decode: 1000000000, 7.416037981

So things changed again ...

Dave Cheney

unread,
Apr 13, 2015, 4:40:52 PM4/13/15
to golan...@googlegroups.com
I don't think allocating 15.5gb of heap to win at a benchmarking game is reasonable, or to put it another way, does Go win if you export GOGC=off ?

Konstantin Shaposhnikov

unread,
Apr 13, 2015, 9:10:54 PM4/13/15
to golan...@googlegroups.com


On Tuesday, 14 April 2015 04:21:17 UTC+8, Haddock wrote:
I gave the Java version some more memory (VM options -server -Xss15500k) and all of a sudden it had about the same performance as Scala. Looks like the Scala for loop creates less garbage.

By specifying -Xss15500k you told JVM to use 15Mb stack size (not 15GB heap). I doubt that GC makes the program much slower.

I assume that you were comparing the original Scala and Java versions. In this case Scala for loop indeed create less garbage because 

val String str = ....

in Scala is equivalent of 

final String str = ... 

in Java (as I mentioned in my first email).

When a variable declared final the reference to it cannot change and str.getBytes() method is taken out of the loop by Java.

My point was that the programs in this benchmarks are not equivalent and some of them do not use the language they are written in optimally so comparing their speed is meaningless. I am sure if you implement test.c in Go (or Java) it will be almost as fast as C version. Or if you implement test.c in Python it will be much slower than base64 module that implemented in C.

Haddock

unread,
Apr 14, 2015, 10:45:55 AM4/14/15
to golan...@googlegroups.com
Running Kontantin's optimized Java code without increasing heap space I get this:


Go:

encode: 1333333600, 8.0265
decode: 1000000000, 28.5536

Java:

encode: 1333333600, 2.090984729
decode: 1000000000, 3.912914506

The measurement for Java includes the time for the warm up.

Kevin D

unread,
Apr 15, 2015, 3:59:39 PM4/15/15
to golan...@googlegroups.com
I decided to do a straight port of their C code to Go: http://play.golang.org/p/sVb6kv96if

These are the results on my main:

encode: 1333333600, 2.7952
decode: 1000000000, 14.1758
encode2: 1333333600, 1.9941
decode2: 1000000000, 3.0482

Egon

unread,
Apr 16, 2015, 3:16:50 AM4/16/15
to golan...@googlegroups.com
I'm getting on tip + patch

encode: 1333333600, 2.9762
decode: 1000000000, 11.1806
encode2: 1333333600, 3.1002
decode2: 1000000000, 4.1232

There are significant improvements in the decode part that could be still done. The main "problem" is that decode in stdlib has a lot of error handling, which complicates things.

+ Egon

Haddock

unread,
Apr 16, 2015, 4:53:51 AM4/16/15
to golan...@googlegroups.com
@Kevin and Egon: Nice work! With encode2/decode2 memory consumption has dropped by roughly 60% as well.

-- H.

ju...@pokko.net

unread,
Jun 22, 2015, 10:20:03 AM6/22/15
to golan...@googlegroups.com
Hi all,

I needed a fast base64 encoder for a project I'm working on, and ended up writing one in assembler, using SSE instructions. On my laptop (Core i7-5600U), it's about 10x faster than the stdlib (1.4.2) implementation.

Can be found from here: https://github.com/issuj/gofaster
Reply all
Reply to author
Forward
0 new messages