Hi,
Aravind and I paired on this today morning and were able to find the cause of this slowness.
tl;dr:
Use /dev/urandom instead of /dev/random for the random number generation.
Run Go server with the option: -Djava.security.egd=file:/dev/./urandom. With this the dashboard came up quite quickly.
Longer version:
Go server uses JRuby which was trying to generate some uuids using Java's
SecureRandom and was blocked due to starved Linux entropy pool. Part of stacktrace below:
"180936571@qtp-1983968298-15" prio=10 tid=0x00007f2a6819c800 nid=0x38ae in Object.wait() [0x00007f2a274ee000]
java.lang.Thread.State: RUNNABLE
at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:203)
- locked <0x00000000fb5fbc28> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:455)
- locked <0x00000000fb5fbf40> (a java.security.SecureRandom)
at org.jruby.ext.securerandom.SecureRandomLibrary.nextBytes(SecureRandomLibrary.java:49)
at org.jruby.ext.securerandom.SecureRandomLibrary.uuid(SecureRandomLibrary.java:36)
at org.jruby.ext.securerandom.SecureRandomLibrary$INVOKER$s$0$0$uuid.call(SecureRandomLibrary$INVOKER$s$0$0$uuid.gen)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
......
The Digital Ocean box Hugh shared with me had an available entropy count of approximately 200 on it. On the Linux box I had, the entropy count was 3210. Vast difference!
The below command can be used to check the amount of available entropy on the Linux box:
# cat /proc/sys/kernel/random/entropy_avail
From a tutorial page on Digital Ocean: Linux already gets very good quality random data from the aforementioned hardware sources, but since a headless machine usually has no keyboard or mouse, there is much less entropy generated. Disk and network I/O represent the majority of entropy generation sources for these machines, and these produce very sparse amounts of entropy.
To test how much time it took to get random data using /dev/random and /dev/urandom, we ran the following commands and observed that reading even 100 bytes from /dev/random was very slow when compared to /dev/urandom.
For /dev/urandom:
root@go-cd:~# dd if=/dev/urandom count=1 bs=100 >/dev/null
1+0 records in
1+0 records out
100 bytes (100 B) copied, 0.00030182 s, 331 kB/s
root@go-cd:~# dd if=/dev/urandom count=1 bs=100 >/dev/null
1+0 records in
1+0 records out
100 bytes (100 B) copied, 9.3816e-05 s, 1.1 MB/s
root@go-cd:~# dd if=/dev/urandom count=1 bs=100 >/dev/null
1+0 records in
1+0 records out
100 bytes (100 B) copied, 0.000887821 s, 113 kB/s
For /dev/random (Note it does not even give 100 bytes):
root@go-cd:~# dd if=/dev/random count=1 bs=100 >/dev/null
0+1 records in
0+1 records out
16 bytes (16 B) copied, 0.000168808 s, 94.8 kB/s
root@go-cd:~# dd if=/dev/random count=1 bs=100 >/dev/null
0+1 records in
0+1 records out
6 bytes (6 B) copied, 117.944 s, 0.0 kB/s
root@go-cd:~# dd if=/dev/random count=1 bs=100 >/dev/null
0+1 records in
0+1 records out
6 bytes (6 B) copied, 56.5452 s, 0.0 kB/s
The amount of available entropy could have been low on the Digital Ocean box due it being a headless VM. When the server was starting this amount of available entropy went down (we saw it going down to 0) and due to this the server was waiting for the entropy pool to get populated again, thus increasing the startup time for the Go server.
By default Java/JRuby uses cryptographically secure entropy from /dev/random. This pool usually gets starved and threads waiting on it might get blocked for a long time. This default can be changed to PRNG (Pseudo Random Number Generator) /dev/urandom with the following property:
-Djava.security.egd=file:/dev/./urandom (note the extra '/./'. It's required to workaround
this defect).
I added this property in /etc/default/go-server: GO_SERVER_SYSTEM_PROPERTIES=-Djava.security.egd=file:/dev/./urandom, and observed the server started quite quickly.
Thanks to Hugh for providing access to his Digital Ocean box.
Some references:
Regards,
Ali and Aravind pairing