We have some legacy systems which use 3DES cryptos extensively and we are attempting to re-write some parts of it in Go recently.As an initial experiment, we found that the system overall is about 2x~3x slower than the original C version.Although the culprit can not be completely singled out yet, the Go pprof tool has shown that about half of the execution time is spent on the DES operation (and the feistel function, specifically).
BenchmarkEncrypt-4 2000000 852 ns/op 9.38 MB/s
BenchmarkDecrypt-4 2000000 896 ns/op 8.92 MB/s
> New
BenchmarkEncrypt-4 3000000 536 ns/op 14.90 MB/s
BenchmarkDecrypt-4 3000000 539 ns/op 14.82 MB/s
Besides unrolling the loop in the feistel function and adding bounds-check hints, I made a change to the the arrangement of the sbox matrix to eliminate the redundant calculations of rows and columns in the feistel function, which fortunately does not make the code uglier. The idea is to move the cost from the hot spot to the run-once initialization function whenever possible.
Hopefully there are other places that can be speeded up without messing up the code.