I am experiencing very slow data transfers over TLS connections on an
ARM based Beagle board (1GHz). Here is what i'm doing:
The board receives unencrypted data from the network, base64 encodes it
and sends it through a TLS encrypted websocket connection back to the
network.
Data rates are at 20KB/s. Doing the same with unencrypted websocket
connections results in rates at 1MB/s. From time to time even the TLS
connections work correctly, giving data rates at 700KB/s.
When the 700KB/s rates are reached, the process takes about 67% CPU
load, the 20KB/s needs - you guessed it - 100%.
I measured the data rate by taking the current time with
time.Nanoseconds() right before the websocket.Write() and once again
right after it, so it shouldn't be interrupted by a parallel go routine[1].
It's not that the board is too weak for TLS encryption - as i said,
sometimes this works as expected, and scp a 50MB file always runs with
5MB/s.
Unfortunately, i can't show a simple example program; i modified the
websocket server and client example to send some large data amounts, and
this is always done with 580KB/s.
Now i don't have an idea any more how to trace this any further. I'm
pretty sure that it's not the rest of the program what causes this,
maybe some of you have an advice?
[1] It seems like Go does not like switching between go routines so far,
only when it's really necessary. But that's another theme..
bye
Norbert
Uh, what is this magic? It really does seem to help. Interesting,
because the actual work is done in just one go routine, and even more
interesting, the Beagle board only has one processor...
Norbert
What kernel and distribution are you using? GOMAXPROCS tells the go
runtime to split goroutines across multiple threads. If this really
seems to help, it would seem to imply that the kernel is doing better
at scheduling goroutines than the go runtime.
Bobby
> What kernel and distribution are you using?
Debian 6.0.1, Kernel 2.6.39.1-x1
> GOMAXPROCS tells the go
> runtime to split goroutines across multiple threads.
The documentation says this is the maximum number of CPUs, not threads..
> If this really
> seems to help, it would seem to imply that the kernel is doing better
> at scheduling goroutines than the go runtime.
The strange thing in my case is that though there are some other go
routines, they have nothing to do and shouldn't be get scheduled at all.
Norbert
Ah! yes, that makes more sense to me. (sorry for the double reply)
> ... or that goroutines execute blocking operations (including page faults)
> behind runtime's back.
I disabled swapping (there is no swap space at all). And blocking
operations should delay the data transfer only from time to time, but i
see constant data transfer rates of either 670KB/s or 20KB/s - and no
switching between the rates untill the program gets restarted.
Norbert
I have seen this kind of behaviour when there are goroutines doing
a significant amount of computation running concurrently with goroutines
that do a significant amount of communication.
My assumption is that when GOMAXPROCS=1 the computing
goroutines get the lion's share of the cycles while allowing the
communicating goroutines to progress only very slowly.
I've seen this happen even when the computation goroutine
is communicating with the others.
I think this is because the Go scheduler is very simplistic.
By raising GOMAXPROCS above 1, you are allowing the OS
to time slice and let communicating goroutines progress even
when there's significant computational load.
I think this behaviour can be observed without anything happening
"behind runtime's back".
is there something wrong with GOMAXPROCS > 1 ?
letting the OS do the time slicing seems like a reasonable
solution to me, assuming there aren't too many compute-bound
goroutines. perhaps it doesn't work in general - i've only
observed external behaviour in a limited number of cases.
> Uh, what is this magic? It really does seem to help.
Update: No, it didn't help. On the day of a presentation, it fell back
to the slow transfer again.
I'm afraid i have to trace it somewhere in the go sources...
Norbert
how annoying.
what value of GOMAXPROCS were you using, out of interest?
Any chance you could post your code, or a testcase that exhibits the
same behavior?
> what value of GOMAXPROCS were you using, out of interest?
First i took 4 (absolutely randomly), and now i tried it with 2 again..
Norbert
> Any chance you could post your code, or a testcase that exhibits the
> same behavior?
The original code is too large, and the simple testcase which i created
didn't show this behaviour, unfortunately.
Well, i'm going to do some research and will post any interesting
results here!
Norbert
> I am experiencing very slow data transfers over TLS connections on an
> ARM based Beagle board (1GHz). Here is what i'm doing:
I found the reasons; actually there are two:
The first is that the TLS connection uses different encryptions from
time to time. I don't know why this is the case, as the TLS handshake
negotiates TLS_ECDHE_RSA_WITH_RC4_128_SHA in both cases.
The high data rates are running with an XOR encryption as a Stream. The
slow data rates are running with 3DES in BlockMode (see file
crypto/tls/conn.go in func (hc *halfConn) encrypt(b *block)).
The second - and main - reason is that the 3DES encryption is running
very slow, with about 20KB/s. It seems that the implementation of (3)DES
is ineffecient, as other programs are much faster.
Here are the test results when encrypting 10MB:
System program:
# dd if=/dev/zero of=unenc bs=1000 count=10000
# time mcrypt -k 123456789012345678901234 -a tripledes unenc
Warning: It is insecure to specify keywords in the command line
File unenc was encrypted.
real 0m6.044s
user 0m5.820s
sys 0m0.211s
Go program:
# time ./des
real 7m39.010s
user 7m38.922s
sys 0m0.008s
(similar results on the PC: 0m1.389s vs. 0m37.987s)
Source code of "des":
package main
import (
"crypto/des"
"fmt"
)
func main() {
key := []byte("123456789012345678901234")
c, e := des.NewTripleDESCipher(key)
if e != nil {
fmt.Printf("%s\n", e.String())
return
}
in := make([]byte, c.BlockSize())
out := make([]byte, c.BlockSize())
cnt := 10000000 / c.BlockSize()
for i := 0; i < cnt; i++ {
c.Encrypt(out, in)
}
}
It would be nice if the DES encryption could be reworked to become
usable on ARM, too..
Norbert
Like I say, just a theory.
Cheers
Dave
Sent from my iPad
Untested theory, TLS handshaking may be relying on the order of a map that stores names to ciphers. This behavior was previously deterministic, sort of, as long as you didn't mutate the map, but has recently been changed to always be non deterministic.
Like I say, just a theory.
Hm, but still strange that the server selects a RC4 cipher (according to
wireshark in the ServerHello message), and the actual encryption is done
with DES. But it does work somehow. Even though too slow.
Norbert
I've fixed this in http://code.google.com/p/go/source/detail?r=532f5e24c3c1
However, this would only affect the TLS client. The TLS server accepts
the client's preferences in cipher suite selection.
Cheers
AGL