Java 7

31 views
Skip to first unread message

Thomas Degris

unread,
Jul 27, 2012, 11:42:53 AM7/27/12
to github...@googlegroups.com
Hello,

Java 7 has been out for a while now (since last summer). Today, I did some quick benchmarking on my laptop: 

For 448 demons with 20160 actives features on 50000, I have the following performance (over 5 minutes of running time, the files I used for the benchmark are attached):
* Java 6              : 7,535 ticks per second (+/-  53 std) 
* Java 7              : 8,412 ticks per second (+/- 47 std) (an increase of about 10% in average)
* Java 7 + GPU : 23 ticks per second (+/- 0 std). 

I used only 5000 features for the GPU because 50000 did not fit on my video memory card. I do not know why I have such a terrible performance with the GPU. If I have time, I will do some profiling to check this out. Maybe it is just because my hardware is not appropriate (not much memory, same number of cores than my CPU). My hardware is a 2,53 GHz Intel Core 2 Duo, 4 GB 1067 MHz DDR3 of memory, and a NVIDIA GeForce 9400M 256 MB. For all evaluations, the options for the JVM was: "-Xmx1024m -server". 

Additional advantages of Java 7 are : 
- some nice functionalities for dispatching computation on multiple cores
- some language simplifications
See http://openjdk.java.net/projects/jdk7/features/ for more information.

Finally, the installation of Java 7 was easy and both Zephyr and RLPark compile without modification. Java 7 can be downloaded at http://www.oracle.com/technetwork/java/javase/downloads/index.html
Do not forget to set it as default by running the "Java Preferences" application for MacOSX. 

Note that new versions of Zephyr and RLPark may not be compatible with Java 6. 

Thomas

PS: Java 8 is not expected before September 2013.

Clement

unread,
Jul 28, 2012, 10:25:50 AM7/28/12
to github...@googlegroups.com
This reply is not so much about java 7 but I can imagine a few problems with the GPU version. 

First of all, in the current implementation, each demons is a thread on the GPU. GPUs need many many threads to be able to hide memory access and use the hardware fully. I wouldn't be surprised that (if memory permits) you could increase the number of demons with little cost increase. It is best to have at least 1000 threads.

Secondly, the current GPU implementation does not make use of sparse vectors which i assume your CPU version does.

Thirdly, your hardware and drivers do not support many features that add a lot of speed to the GPU implementation (i.e. hardware accelerated vector operations, out of order queues)

The two first points are significant weaknesses in the GPU implementation and I am currently working on this.

Thomas Degris

unread,
Jul 30, 2012, 8:09:47 PM7/30/12
to github...@googlegroups.com
Last time, I told there was a weird difference between CPU and GPU performance. As I realized just a few minutes after sending my email, and as Clement noticed as well, there was a bug in the benchmark: there was no active features in the agent state. Useless to say that sparse representation had a certain advantage.

So, new benchmark: 896 demons with 410 actives features (in practice only 379 were active because of collisions) on 5000:
* CPU, SVector:      27.83 ticks per second (+/- .57 std)
* GPU:                      13.85 ticks per second (+/- .01 std)
* CPU, PVector:      12.03 ticks per second (+/- .1 std) 

Conclusion: I am thinking of merging the code of Clement in some ways in RLPark. The idea would be to be able switch back and forth between horde implemented on CPU and GPU. I think it would also be a great feature to be able to update the agent state (e.g. network of LTUs) on the CPU while demons are being updated on the GPU for instance (or have twice the number of demons : one half updated on GPU the other half updated on CPU). Please let me know if you have some ideas about this topic. 

Two important notes on Java 7:
- on this benchmark, Java 7 was actually slightly slower than Java 6
- I have passed Zephyr and RLPark to Java 7 and noticed a crash in the JVM (SIGBUS) when running Zephyr and RLPark JUnit tests. I just have no time to investigate this.

Conclusion: Java 7 will wait. I made branches in Zephyr and RLPark for those who want to try. 

Thomas


--
 
 
 

Clement

unread,
Jul 30, 2012, 9:34:51 PM7/30/12
to github...@googlegroups.com
Here are some of my results:

Using 896 demons with 5000 features on my AMD HD5870 graphics card:
GPU (current version): ~37 ticks per second
GPU (new low demon version): ~217 ticks per second

So as I suspected, there is not enough threads for the current version to fully use the GPU. I am currently working on a version that uses several threads per demon and gives significantly better results on any task with less than 4000 demons. On tasks with more than 25k demons, the current version is slightly faster (~10-15%) than the 'low demon' version.

Also I would like to say that the apple opencl driver are horrible...use linux or windows!

Clement

Thomas Degris

unread,
Jul 31, 2012, 3:24:45 AM7/31/12
to github...@googlegroups.com
I have forgotten to send my benchmark files. Here they are.

Thomas

CLBenchmark.java
CritterbotDemonsPredictionOffPolicy.java

Joseph Modayil

unread,
Jul 31, 2012, 3:40:17 PM7/31/12
to github...@googlegroups.com
On our linux box gremlin2 (with an NVidia  GeForce 310), I get

36.35 ticks per second. 383.4 active features in average.

On my Macbook Pro (GeForce 9400M), I see
21.8 ticks per second. 375.3 active features in average
 
Joseph

On Tue, Jul 31, 2012 at 1:24 AM, Thomas Degris <thomas...@gmail.com> wrote:
I have forgotten to send my benchmark files. Here they are.

Thomas


--




 


--
 
 
 



Reply all
Reply to author
Forward
0 new messages