Cluster with 19 * gtx260 client results

474 views
Skip to first unread message

Imre Andor

unread,
Feb 9, 2011, 5:59:06 AM2/9/11
to py...@googlegroups.com
Hi!


First of all, my english is very bad.
Secondly, I borrowed my school computers room. The room includes 24 students PC and 1 teacher PC. Unfortunately 5 PC gone bad, so I have 20PC totally.
Configuration aida benchmark and cpu info:
http://img218.imageshack.us/img218/7439/cpuidu.png
http://img40.imageshack.us/img40/2107/cachememir.png
- Core2 q6600 - 4core and every core runs at 2,4ghz (65nm, G0 stepping)
- Nvidia Geforce GTX260 - 896MB (192Cuda core on 648 Mhz and 1296 mhz shader clock with 1000mhz 448bit memory interface)
- 1gigabit ethernet connection but the switch support only 100mbit
Server and clients have similar configuration, but there are many differences:
- clients run backtrack4 r2 (kernel 2.6.35.8) LIVE DVD (use this option: graphical mode from RAM)
Configuration: 
  • run 4 script file:
  1. apt-get update & upgrade, apt-get install nvidia-driver --> drop to gui (startx) and back to console
  2. apt-get install cuda-toolkit
  3. apt-get install cuda-sdk (sometimes it doesn't work so you can download from nvidia homepage: http://developer.download.nvidia.com/compute/cuda/3_1/sdk/gpucomputingsdk_3.1_linux.run (my fault that come 3.2 version in 2011 end of january )
  4. apt-get clean (for the free memory capacity), apt-get install cpyrit-cuda, ufw disable and  pyrit list_cores of course
  • every client modify pyrit's config: change rpc_server = true, and rpc_knowclients = set the server ip

The server pyrit configuration show the snapshot2.png.
http://img26.imageshack.us/img26/9833/snapshot1fy.png
http://img545.imageshack.us/img545/1989/snapshot2u.png
- server run  backtrack4 r2 (kernel 2.6.35.8) on usb2 hdd (hitachi 250gb - only 5400 rpm)


So the results: http://img828.imageshack.us/img828/8274/snapshot3g.png
First I get only around 80000PMK/sec. Secondly I get 105042 PMK/s. (snapshot3)
The highest result is the 126603 PMK

In my oppinion, the bottleneck is the ethernet network. I run xnetload eth0 and the highest incoming "only" 5,4Mb. (I reached this result with 13clients)



I will make a new test which include these: 
-ubuntu x64, faster server, postgres sql server, cal++ and 21 machine :)



Best regards,
Andor Imre - University of Óbuda - John von Neumann Faculty of Informatics


lascia perdere

unread,
Feb 9, 2011, 1:13:43 PM2/9/11
to Pyrit
> In my oppinion, the bottleneck is the ethernet network. I run xnetload eth0
> and the highest incoming "only" 5,4Mb. (I reached this result with
> 13clients)

Hi there.
I hope you will do this big load of job because it will give a veri
interesting dataset (all the PC are identical, so the data are
coherent)
My suggestion is: run 1 pc and write the PMK. Add a second pc, re run
the test and write down the PMK. And so on, every time add a new pc
and redo the test. At the end there will be an interesting graph that
will describe a the growing of PMK related to number of node. So,
every you add a nod, you should have x2, x3, x4 etc increment, but it
will be not. You will have xA, xB, xC, etc, It will be *very*
interesting to know A, B, C, etc.

Last test is to split the database in 20 sub database and distribuite
between the 20 nodes, so each of them can run idipendently. So, the
speed ud should be near x20: the interesting thing will be develop the
right group of scripts to pilot the 20 nodes.

Jimmy

unread,
Mar 3, 2011, 1:37:07 PM3/3/11
to Pyrit
The scaling close to 95% for each client.
According to my mind I made some mistake:
- didn't use ramdisk, ssd or much faster drive for the server
- cpu cores on a server were slow - only 2,4ghz and the
synchronization is really important because it is run only ONE cpu
core.
- pure lan network...

Summarizing, this is NOT good way. Why? Because AMD ATI HD 5970 made
140000PMK and cheaper than 20 personal computer and needs less energy.
I don't recommend cluster building because easily became outdated. The
facts are stubborn things.

Free, The All Powerful

unread,
Mar 29, 2011, 6:55:13 PM3/29/11
to Pyrit
Would taking the GPUs out of the PCs and piling them into a single
node be better? If you had a room of 20 PCs with 5870s vs a room with
10 PCs with 2x 5870s; which do you think would win? Assuming of
course, you disable to the additional CPUs with nlimit_cpus to make it
fair. :P

I would be very interested to test this as well. My campus has a load
of iMacs with ATI cards. Is there anyway to install CAL++ on Mac OS or
would I need to live boot Backtrack?
Reply all
Reply to author
Forward
0 new messages