fast RC4 and MD5

565 views
Skip to first unread message

Russ Cox

unread,
Feb 7, 2013, 2:50:15 PM2/7/13
to golang-dev, minux ma, Dave Cheney, Adam Langley, John Graham-Cumming
These implementations are apparently best in class and also public domain. Anyone interested in running the assembly through tr a-z A-Z and so on and sending a CL?


minux

unread,
Feb 7, 2013, 3:27:25 PM2/7/13
to Russ Cox, golang-dev, Dave Cheney, Adam Langley, John Graham-Cumming
i think speeding up crypto/rc4 is of practical use (for example, TLS), does crypto/md5 need that speed?
i suspect speeding up crypto/sha* is more important than speeding up md5.

btw, do we need to support sha3 in the standard library?

ps: i'm think about using this to facilitate assembly translation:
1. adjust the GNU as source file to use our function calling convention.
2. test that using .syso mechanism.
3. use our cmd/objdump to translate the object file directly into Plan 9 syntax
4. manually polish the result and test

Adam Langley

unread,
Feb 7, 2013, 3:31:20 PM2/7/13
to minux, Russ Cox, golang-dev, Dave Cheney, John Graham-Cumming
On Thu, Feb 7, 2013 at 3:27 PM, minux <minu...@gmail.com> wrote:
> i think speeding up crypto/rc4 is of practical use (for example, TLS), does
> crypto/md5 need that speed?

Fast MD5 is nice to have, but crypto/tls doesn't care much. Only the
handshake hash uses MD5 and that's a low volume use.

> i suspect speeding up crypto/sha* is more important than speeding up md5.

I believe so too.

> btw, do we need to support sha3 in the standard library?

I hope not. I'm rather hoping that SHA-3 is never used.

> ps: i'm think about using this to facilitate assembly translation:
> 1. adjust the GNU as source file to use our function calling convention.
> 2. test that using .syso mechanism.
> 3. use our cmd/objdump to translate the object file directly into Plan 9
> syntax
> 4. manually polish the result and test

I have used almost the reverse path to translate ASM into GAS syntax
for gccgo, which worked fine. One thing that has bitten me in the past
is that 6a, at least, will change MOV $0, reg into XOR reg, reg and
the latter affects the flags. Most code probably doesn't depend on
that however.


Cheers

AGL

Patrick Mylund Nielsen

unread,
Feb 7, 2013, 4:25:39 PM2/7/13
to Adam Langley, minux, Russ Cox, golang-dev, Dave Cheney, John Graham-Cumming
I hope not. I'm rather hoping that SHA-3 is never used.

Seconded. Keccak was an odd choice for mainstream use. Very slow in software compared to the other contestants. Supporting it will probably cause a lot of people to pick it over SHA-2 just because of the name, and not do a whole lot of good.

Agree about SHA vs. MD5 in terms of usefulness. RC4 > SHA1 > SHA256 > SHA512 ~= MD5



--

---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



minux

unread,
Feb 7, 2013, 4:48:32 PM2/7/13
to golang-dev, Adam Langley, Russ Cox, Dave Cheney, John Graham-Cumming, Patrick Mylund Nielsen
can we adopt assembly implementations from http://www.cryptopp.com/?

Brad Fitzpatrick

unread,
Feb 7, 2013, 5:05:51 PM2/7/13
to minux, golang-dev, Adam Langley, Russ Cox, Dave Cheney, John Graham-Cumming, Patrick Mylund Nielsen
On Thu, Feb 7, 2013 at 1:48 PM, minux <minu...@gmail.com> wrote:
can we adopt assembly implementations from http://www.cryptopp.com/?

IANAL, but it looks like we can.  The individual files are Public Domain.

Russ Cox

unread,
Feb 7, 2013, 5:14:02 PM2/7/13
to minux, golang-dev, Dave Cheney, Adam Langley, John Graham-Cumming

1. adjust the GNU as source file to use our function calling convention.
2. test that using .syso mechanism.
3. use our cmd/objdump to translate the object file directly into Plan 9 syntax
4. manually polish the result and test

that seems like a lot of work.

it's probably easier to copy my shell script gen.sh from https://codereview.appspot.com/6259043/#ps2001 and adapt as needed.

russ

Russ Cox

unread,
Feb 7, 2013, 5:21:05 PM2/7/13
to minux, golang-dev, Adam Langley, Dave Cheney, John Graham-Cumming, Patrick Mylund Nielsen
On Thu, Feb 7, 2013 at 1:48 PM, minux <minu...@gmail.com> wrote:
can we adopt assembly implementations from http://www.cryptopp.com/?

please don't, at least not yet. that's the worst code i've seen all year, regardless of license.

russ

Adam Langley

unread,
Feb 7, 2013, 9:34:20 PM2/7/13
to John Crockett, golang-dev, minux, Russ Cox, Dave Cheney, John Graham-Cumming
On Thu, Feb 7, 2013 at 6:27 PM, John Crockett <jscroc...@gmail.com> wrote:
> Could you give a quick comment on why you dislike SHA-3? I'm an observer
> (not an expert) and would appreciate your two cents. For what it's worth, I
> come from an FPGA / hardware background and have heard a little about
> hardware vs. software SHA-3 discussions.

When the SHA-3 process was started, there was a real fear that SHA-2
would fall. A series of attacks on MD5 and SHA-1 shook the confidence
of the community.

However, over the years the sky didn't fall and SHA-2 is still looking good.

SHA-3 is a nice algorithm for hardware (it came from NXP folks after
all), but it's unimpressive in software. What we could really do with
is a fast, software hash (in my opinion). It appears that it's not too
hard to make a slow, secure hash function, what's valuable is a fast,
secure hash function.

So my fear is that SHA-3 gets used because of the name, for no real
benefit and at a fair cost in terms of complexity.

I think that BLAKE2 is a much more interesting hash.


Cheers

AGL

Patrick Mylund Nielsen

unread,
Feb 7, 2013, 11:20:15 PM2/7/13
to Adam Langley, John Crockett, golang-dev, minux, Russ Cox, Dave Cheney, John Graham-Cumming
Maybe that's why? NIST were able to pick a function that's worse than SHA-2 in software because SHA-2 doesn't need a replacement. That's definitely not what an average user will take away when they see "golang.org/pkg/crypto/sha256" vs "golang.org/pkg/crypto/sha3" though, understandable. 3 should be better than 2. Sponge vs. Merkle-Damgård is nice--no length extension, iteration problems, etc. and you can use it for everything from a MAC to a stream cipher--but it's hardly black and white/worth the performance loss.

BLAKE2 is awesome. Skein is also very fast in software. Hopefully TLS clients, OpenSSL, FDE and friends will support not just SHA-3, but something that's fast in software. IMO Go could help by including either before Keccak, and by having some kind of instructive summary.




Cheers

AGL

John Graham-Cumming

unread,
Feb 8, 2013, 7:53:35 AM2/8/13
to minux, Russ Cox, golang-dev, Dave Cheney, Adam Langley
On Thu, Feb 7, 2013 at 8:27 PM, minux <minu...@gmail.com> wrote:
does crypto/md5 need that speed?

For TLS it doesn't, but I think it makes sense to push the Go standard packages to be 'best of breed' so that there are no reasons to not use Go.

John.

Joel Sing

unread,
Feb 8, 2013, 10:28:52 AM2/8/13
to Russ Cox, golang-dev, minux ma, Dave Cheney, Adam Langley, John Graham-Cumming
I've reworked the RC4 implementation here:

https://codereview.appspot.com/7311062/

It results in ~30MB/sec speed up, which is still a fair way from the
reported 300MB/sec numbers:

Before:

BenchmarkRC4_128 5000000 638 ns/op 200.42 MB/s
BenchmarkRC4_1K 500000 5040 ns/op 203.17 MB/s
BenchmarkRC4_8K 50000 39874 ns/op 203.04 MB/s

After:

BenchmarkRC4_128 5000000 549 ns/op 233.00 MB/s
BenchmarkRC4_1K 500000 4334 ns/op 236.25 MB/s
BenchmarkRC4_8K 50000 34183 ns/op 236.84 MB/s

I'll have a further play and see if we are able to gain some
additional performance. It is also interesting to note that Marc's
rc4speed only achieves ~187MB/sec on the same machine (Dual Xeon X5650
@ 2.67GHz), whereas OpenSSL speed reports around 433MB/sec for 1KB
blocks.

Adam Langley

unread,
Feb 8, 2013, 10:40:19 AM2/8/13
to Joel Sing, Russ Cox, golang-dev, minux ma, Dave Cheney, John Graham-Cumming
On Fri, Feb 8, 2013 at 10:28 AM, Joel Sing <js...@google.com> wrote:
> It results in ~30MB/sec speed up, which is still a fair way from the
> reported 300MB/sec numbers:

I'm afraid that we're into the realm of optimising for different
revisions of amd64 here.

On my machine:

Before:

PASS
BenchmarkRC4_128 5000000 410 ns/op 311.67 MB/s
BenchmarkRC4_1K 1000000 2954 ns/op 346.56 MB/s
BenchmarkRC4_8K 100000 23304 ns/op 347.40 MB/s

With your change:

PASS
BenchmarkRC4_128 5000000 404 ns/op 316.82 MB/s
BenchmarkRC4_1K 500000 3066 ns/op 333.91 MB/s
BenchmarkRC4_8K 100000 24111 ns/op 335.77 MB/s

TurboBoost and hyperthreading are disabled and results are stable to
within 1 MB/s.

However, since you're seeing a fair speedup on older(?) CPUs, then
perhaps it's worthwhile.

(CPU in my benchmark machine:
http://ark.intel.com/products/64596/Intel-Xeon-Processor-E5-2690-20M-Cache-2_90-GHz-8_00-GTs-Intel-QPI)


Cheers

AGL

Dmitry Chestnykh

unread,
Feb 8, 2013, 12:28:47 PM2/8/13
to golan...@googlegroups.com, Joel Sing, Russ Cox, minux ma, Dave Cheney, John Graham-Cumming
Here's my old CPU, Core 2 Duo 2.26 GHz.
Your version at https://codereview.appspot.com/7311062/ is a little bit faster:

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          820          768   -6.34%
BenchmarkRC4_1K          6420         6073   -5.40%
BenchmarkRC4_8K         50373        47909   -4.89%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       156.00       166.54    1.07x
BenchmarkRC4_1K        159.49       168.61    1.06x
BenchmarkRC4_8K        160.72       168.99    1.05x

For comparison, the fastest version in unsafe Go I could do on this CPU

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          820          604  -26.34%
BenchmarkRC4_1K          6420         4444  -30.78%
BenchmarkRC4_8K         50373        35335  -29.85%
 
benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       156.00       211.80    1.36x
BenchmarkRC4_1K        159.49       230.40    1.44x
BenchmarkRC4_8K        160.72       229.12    1.43x

Which is closer to OpenSSL:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             239179.70k   263650.55k   270890.45k   271541.59k   272842.80k

and I'm not really sure why :)

However this version is slower on Core i7 than the current rc4_amd64.s.
I guess RC4 really needs CPU-specific, not architecture-specific optimizations to shine.

-Dmitry

gogila...@gmail.com

unread,
Feb 6, 2014, 10:41:27 PM2/6/14
to golan...@googlegroups.com, minux ma, Dave Cheney, Adam Langley, John Graham-Cumming
I wrote an online sha3 calculator. No download needed.

Michael Jones

unread,
Feb 7, 2014, 8:44:08 AM2/7/14
to gogila...@gmail.com, golan...@googlegroups.com, minux ma, Dave Cheney, Adam Langley, John Graham-Cumming
A while back, Intel found a 40% speedup for OpenSSL by using software pipelining and careful instruction scheduling...



On Thu, Feb 6, 2014 at 7:41 PM, <gogila...@gmail.com> wrote:
I wrote an online sha3 calculator. No download needed.

--
 
---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765
Reply all
Reply to author
Forward
0 new messages