"Cluster GPU Instances"

Showing 1-16 of 16 messages
"Cluster GPU Instances" Utunga 11/15/10 7:06 PM

How about NVidia M2050 GPUs?

News from today makes that a more interesting GPU to support, but I think it already would be fine yeah?

http://aws.amazon.com/about-aws/whats-new/2010/11/15/announcing-cluster-gpu-instances-for-amazon-ec2/

As of today, Amazon EC2 is providing what they call "Cluster GPU Instances":  An instance in the Amazon cloud that provides you with the power of two NVIDIA Tesla “Fermi” M2050 GPUs. The exact specifications look like this:

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge


--


Miles Thompson 



Re: [theano-users] "Cluster GPU Instances" Josh Bleecher Snyder 11/16/10 5:32 AM
> How about NVidia M2050 GPUs?
>
> News from today makes that a more interesting GPU to support, but I think it
> already would be fine yeah?

In theory, it should already work.


> http://aws.amazon.com/about-aws/whats-new/2010/11/15/announcing-cluster-gpu-instances-for-amazon-ec2/
>
> As of today, Amazon EC2 is providing what they call "Cluster GPU Instances":

For some folks, it's possible those might make sense. However, at
$2.10 an hour -- working out to nearly $20k per year -- it is
extraordinarily expensive.


I wanted to try a benchmark on the machine. Alas, right now, the only
AMI available is running CentOS (with which I am unfamiliar), and
after mucking with it for an hour, I gave up on trying to get all the
theano dependencies installed, so I have no numbers to share yet. I'll
try again later when there are more OS options.

It would be great to build a public theano AMI, so that someone
curious about theano could boot it up and having a functioning machine
to play with immediately. (It also makes for great installation
instructions: Boot machine, use.) Again, if more OS options show up at
some point in the future, I'll put one together.

In the meantime, I'd be curious to hear performance numbers if someone
else gets it up and running.


-josh

Re: [theano-users] "Cluster GPU Instances" Nicolas Pinto 11/16/10 5:46 AM
Josh,

> I wanted to try a benchmark on the machine. Alas, right now, the only
> AMI available is running CentOS (with which I am unfamiliar), and
> after mucking with it for an hour, I gave up on trying to get all the
> theano dependencies installed, so I have no numbers to share yet. I'll
> try again later when there are more OS options.

Other AMIs are available, some with Ubuntu and atlas-enabled numpy,
scipy, etc. For example: starcluster [1] provides all that, in
addition to automatically configuring MPI, SGE, having load-balancing
support to add/remove notes to the queue, and plans to include
PyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.

> It would be great to build a public theano AMI, so that someone
> curious about theano could boot it up and having a functioning machine
> to play with immediately. (It also makes for great installation
> instructions: Boot machine, use.) Again, if more OS options show up at
> some point in the future, I'll put one together.

This would be awesome! Maybe forking a specific starcluster AMI would
make sense or and we could ask Justin Riley to include it by default
or as a plug-in [3,4].

HTH

Best,

N

[1] https://github.com/jtriley/StarCluster

[2] http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html?cid=6a00d8341c534853ef013488ff64a9970c#comment-6a00d8341c534853ef013488ff64a9970c

[3] http://web.mit.edu/stardev/cluster/docs/plugins.html

[4] https://github.com/jtriley/StarClusterPlugins

Re: [theano-users] "Cluster GPU Instances" Josh Bleecher Snyder 11/16/10 9:46 AM
>> I wanted to try a benchmark on the machine. Alas, right now, the only
>> AMI available is running CentOS (with which I am unfamiliar), and
>> after mucking with it for an hour, I gave up on trying to get all the
>> theano dependencies installed, so I have no numbers to share yet. I'll
>> try again later when there are more OS options.
>
> Other AMIs are available, some with Ubuntu and atlas-enabled numpy,
> scipy, etc. For example: starcluster [1] provides all that, in
> addition to automatically configuring MPI, SGE, having load-balancing
> support to add/remove notes to the queue, and plans to include
> PyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.

As of right now, starcluster's amis don't appear to allow you to
launch a gpu instance (I tried ami-0af31963, which is their ami for
ubuntu 10.04). I believe it takes extra work to make an ami compatible
with a new instance type, and it just hasn't been done yet...and I'm
not the right person to take it on. :)


>> It would be great to build a public theano AMI, so that someone
>> curious about theano could boot it up and having a functioning machine
>> to play with immediately. (It also makes for great installation
>> instructions: Boot machine, use.) Again, if more OS options show up at
>> some point in the future, I'll put one together.
>
> This would be awesome! Maybe forking a specific starcluster AMI would
> make sense or and we could ask Justin Riley to include it by default
> or as a plug-in [3,4].

Starcluster looks like an excellent base ami to build a theano ami on
top of. And from a very cursory inspection, theano looks like it would
fit excellently as a starcluster plugin. Thanks for pointing these
out.

I'll plan on checking back on starcluster occasionally to find out
when their amis are gpu-ready; created
https://github.com/jtriley/StarCluster/issues#issue/9 to request it.

-josh

Re: "Cluster GPU Instances" jtriley 12/20/10 6:57 PM
StarCluster now has a GPU/Cluster Compute release candidate AMI
available for testing:

ami-12b6477b

This AMI contains the following GPU software in addition to the usual
StarCluster stack:

    * NVIDIA Driver 260.19.21
    * NVIDIA Cuda Toolkit 3.2 (cublas, cufft, curand)
    * PyCuda and PyOpenCL (recent git checkouts)
    * MAGMA 1.0-rc2

This AMI is currently not compatible with StarCluster 0.91.2, however,
if you just want to play around with the new GPU instances you're
probably better off launching a single instance from the AWS
management console. If you need a GPU cluster the latest github code
does work with this new AMI and instance type (both cg1.4xlarge and
cc1.4xlarge) if you're interested in testing.

A few notes:

   1. CUDA is installed in /usr/local/cuda
   2. MAGMA library is installed in /usr/local/magma
   3. Custom python2.6 installation in /usr/lib64/python2.6/site-
packages
   4. NumPy/SciPy/PyCuda/OpenCL/etc are installed in the custom
python2.6 installation
   5. All software sources used are in /usr/local/src (look here for
PyCuda/PyOpenCL/MAGMA examples, etc)

Let me know if you have issues...

~Justin

On Nov 16, 12:46 pm, Josh Bleecher Snyder <joshar...@gmail.com> wrote:
> >> I wanted to try a benchmark on the machine. Alas, right now, the only
> >> AMI available is running CentOS (with which I am unfamiliar), and
> >> after mucking with it for an hour, I gave up on trying to get all the
> >> theano dependencies installed, so I have no numbers to share yet. I'll
> >> try again later when there are more OS options.
>
> > Other AMIs are available, some with Ubuntu and atlas-enabled numpy,
> > scipy, etc. For example:starcluster[1] provides all that, in
> > addition to automatically configuring MPI, SGE, having load-balancing
> > support to add/remove notes to the queue, and plans to include
> > PyCUDA+PyOpenCL [2], ipython's parallel processing, hadoop, etc.
>
> As of right now,starcluster'samis don't appear to allow you to
> launch a gpu instance (I tried ami-0af31963, which is their ami for
> ubuntu 10.04). I believe it takes extra work to make an ami compatible
> with a new instance type, and it just hasn't been done yet...and I'm
> not the right person to take it on. :)
>
> >> It would be great to build a public theano AMI, so that someone
> >> curious about theano could boot it up and having a functioning machine
> >> to play with immediately. (It also makes for great installation
> >> instructions: Boot machine, use.) Again, if more OS options show up at
> >> some point in the future, I'll put one together.
>
> > This would be awesome! Maybe forking a specificstarclusterAMI would
> > make sense or and we could ask Justin Riley to include it by default
> > or as a plug-in [3,4].
>
> Starclusterlooks like an excellent base ami to build a theano ami on
> top of. And from a very cursory inspection, theano looks like it would
> fit excellently as astarclusterplugin. Thanks for pointing these
> out.
>
> I'll plan on checking back onstarclusteroccasionally to find out
Re: [theano-users] Re: "Cluster GPU Instances" Josh Bleecher Snyder 12/21/10 3:47 PM
> StarCluster now has a GPU/Cluster Compute release candidate AMI
> available for testing:
>
> ami-12b6477b

This is awesome. It is now incredibly easy to get theano up and
running atop this AMI. Here's all it took:

pip install Theano
git clone https://github.com/lisa-lab/DeepLearningTutorials.git
cd DeepLearningTutorials/data/
./download.sh
cd ../code/
THEANO_FLAGS=device=gpu0,floatX=float32 python2.6 logistic_sgd.py
THEANO_FLAGS=device=gpu1,floatX=float32 python2.6 convolutional_mlp.py


The performance, however, was less awesome. It definitely used the GPU
(a Tesla M2050), but not to great effect.

logistic_sgd.py took 113.7s to complete, compared with 6.8s on my
machine (GTX480/i7). convolutional_mlp.py took 61.54m to complete,
compared with 31.44m on my machine.


As for why it is slower, here is the top opwise output from
ProfileMode for logistic_sgd.py on the AWS instance:

   14.5%   14.5%  14.303s  14.303s  8.73e-04s   16384  6 HostFromGpu
   12.9%   27.5%  12.719s  27.022s  6.81e-04s   18672  3 GpuFromHost
   12.0%   39.5%  11.821s  38.843s  1.90e-03s *  6224  1
GpuCrossentropySoftmaxArgmax1HotWithBias
    9.3%   48.8%  9.175s  48.018s  1.12e-03s *  8192  3 GpuDot22
    5.9%   54.7%  5.825s  53.843s  9.36e-04s *  6224  1 GpuAlloc
    5.7%   60.4%  5.629s  59.472s  9.04e-04s *  6224  1
GpuCrossentropySoftmax1HotWithBiasDx

    Spent 0.472s(0.480%) in cpu Op, 70.936s(72.068%) in gpu Op and
27.022s(27.453%) transfert Op


And on my machine:

   29.1%   29.1%  1.851s  1.851s  2.26e-04s *  8192  3 GpuDot22
   20.1%   49.2%  1.276s  3.127s  2.05e-04s *  6224  1 GpuGemm{inplace}
    8.6%   57.7%  0.544s  3.670s  8.74e-05s *  6224  1
GpuCrossentropySoftmaxArgmax1HotWithBias
    6.1%   63.9%  0.389s  4.059s  2.37e-05s   16384  6
GpuSubtensor{int64:int64:}
    6.0%   69.8%  0.379s  4.439s  2.32e-05s   16384  6 HostFromGpu

    Spent 0.302s(4.751%) in cpu Op, 5.326s(83.783%) in gpu Op and
0.729s(11.466%) transfert Op

So it looks like GPU bandwidth is much slower, and individual GPU
operations are a little slower. Nothing obviously broken/fixable that
I see; it just looks plain slower.

-josh

Re: [theano-users] Re: "Cluster GPU Instances" nouiz 12/22/10 9:28 AM
Hi,

The gpu on the computer are put in mitch mode? ECC to memory?
Exclusive mode to be sure nothing else run on it?

Their is slower and slower!

HostFromGpu go from 0.379s to 14.303s (slowdown by 37x!)
GpuDot22 1.851s to  9.175s (slowdown by 5x!)
GpuCrossentropySoftmaxArgmax1HotWithBias 0.544s to 11.821s (22x slowdown!)

Such slowdown mean it is worthless to use them with theano if we don't
fix this problem.

Are you sure your code on the GTX480 is clean? The profiles you show
don't have the same node on them.

Fred

Re: [theano-users] Re: "Cluster GPU Instances" Josh Bleecher Snyder 12/22/10 9:46 AM
> The gpu on the computer are put in mitch mode? ECC to memory?
> Exclusive mode to be sure nothing else run on it?

I used the cluster instance as I found it; not sure what the default
config is. If there was something else running on those GPUs, it
wasn't me that started it. I guess it is possible that Amazon is
sharing one physical GPU across multiple instances.


> HostFromGpu go from 0.379s to 14.303s (slowdown by 37x!)

I'd guess this has to do with their virtualization layer.


> GpuDot22 1.851s to  9.175s (slowdown by 5x!)
> GpuCrossentropySoftmaxArgmax1HotWithBias 0.544s to 11.821s (22x slowdown!)
>
> Such slowdown mean it is worthless to use them with theano if we don't
> fix this problem.

Or if they don't, depending on the root cause. :) Unfortunately, at >
$2 per hour, it strikes me as possibly a bit too expensive to leave a
machine running just to try to optimize code for it.


> Are you sure your code on the GTX480 is clean? The profiles you show
> don't have the same node on them.

I'm pretty sure it was clean. One difference is that the cluster
instance was running the 0.3.0 release, whereas I run theano tip on my
machine.


-josh

Re: [theano-users] Re: "Cluster GPU Instances" James 12/22/10 10:06 AM
Have you done any benchmarking of other gpu code on the AMI ?  If other libs are running at full speed then it might be something about what Theano is doing.--
http://www-etud.iro.umontreal.ca/~bergstrj
Re: [theano-users] Re: "Cluster GPU Instances" Josh Bleecher Snyder 12/22/10 10:19 AM
> Have you done any benchmarking of other gpu code on the AMI ?  If other libs
> are running at full speed then it might be something about what Theano is
> doing.

I didn't; do you have suggestions for other good benchmarks to run?
All my other GPU code is too entangled with theano to make it
worthwhile...

-josh

Re: [theano-users] Re: "Cluster GPU Instances" James 12/22/10 10:40 AM
No I don't have anything else on hand.

I guess that's the beauty of theano - usually you don't have to worry about what you'd do when it's not there to help :)--
http://www-etud.iro.umontreal.ca/~bergstrj
Re: [theano-users] Re: "Cluster GPU Instances" nouiz 12/22/10 10:41 AM
could we benchmark with pycuda? Did you tried it? Or maybe nvidia own
sdk sample as the one that check the memory bandwidth?

Fred

Re: "Cluster GPU Instances" Sam Goldman 1/9/12 11:33 AM
Reviving an old thread here...

I would be willing to spend some of my time running benchmarks on a
Cluster GPU instance, if it would be of any help. Maybe there is some
configuration that helps Theano perform on these machines that we can
figure out and document.

Since the billing gets rounded up to the closest hour, it would be
efficient if we made a list of benchmarks/tests to try before
provisioning.

From this thread:
1. nVidia CUDA SDK bandwidth test program
2. PyCUDA benchmark programs
3. logistic_sgd.py from DeepLearningTutorials
4. confolutional_mlp.py from DeepLearningTutorials

Would anyone be interested in further benchmarks or configurations?

Sam

On Dec 22 2010, 1:41 pm, Frédéric Bastien <no...@nouiz.org> wrote:
> could we benchmark with pycuda? Did you tried it? Or maybe nvidia own
> sdk sample as the one that check the memory bandwidth?
>
> Fred
>
> On Wed, Dec 22, 2010 at 1:40 PM, James Bergstra
>
>
>
>
>
>
>
> <james.bergs...@gmail.com> wrote:
> > No I don't have anything else on hand.
>
> > I guess that's the beauty of theano - usually you don't have to worry about
> > what you'd do when it's not there to help :)
>
> > On Wed, Dec 22, 2010 at 1:19 PM, Josh Bleecher Snyder <joshar...@gmail.com>
> > wrote:
>
> >> > Have you done any benchmarking of othergpucode on the AMI ?  If other

> >> > libs
> >> > are running at full speed then it might be something about what Theano
> >> > is
> >> > doing.
>
> >> I didn't; do you have suggestions for other good benchmarks to run?
> >> All my otherGPUcode is too entangled with theano to make it

> >> worthwhile...
>
> >> -josh
>
> >> > On Wed, Dec 22, 2010 at 12:46 PM, Josh Bleecher Snyder
> >> > <joshar...@gmail.com>
> >> > wrote:
>
> >> >> > Thegpuon the computer are put in mitch mode? ECC to memory?

> >> >> > Exclusive mode to be sure nothing else run on it?
>
> >> >> I used theclusterinstance as I found it; not sure what the default

> >> >> config is. If there was something else running on those GPUs, it
> >> >> wasn't me that started it. I guess it is possible that Amazon is
> >> >> sharing one physicalGPUacross multiple instances.

>
> >> >> > HostFromGpu go from 0.379s to 14.303s (slowdown by 37x!)
>
> >> >> I'd guess this has to do with their virtualization layer.
>
> >> >> > GpuDot22 1.851s to  9.175s (slowdown by 5x!)
> >> >> > GpuCrossentropySoftmaxArgmax1HotWithBias 0.544s to 11.821s (22x
> >> >> > slowdown!)
>
> >> >> > Such slowdown mean it is worthless to use them with theano if we
> >> >> > don't
> >> >> > fix this problem.
>
> >> >> Or if they don't, depending on the root cause. :) Unfortunately, at >
> >> >> $2 per hour, it strikes me as possibly a bit too expensive to leave a
> >> >> machine running just to try to optimize code for it.
>
> >> >> > Are you sure your code on the GTX480 is clean? The profiles you show
> >> >> > don't have the same node on them.
>
> >> >> I'm pretty sure it was clean. One difference is that thecluster
> >> >> instance was running the 0.3.0 release, whereas I run theano tip on my
> >> >> machine.
>
> >> >> -josh
>
> >> > --
> >> >http://www-etud.iro.umontreal.ca/~bergstrj
>
> > --
> >http://www-etud.iro.umontreal.ca/~bergstrj

Re: [theano-users] Re: "Cluster GPU Instances" James 1/9/12 12:13 PM
Why do you bring this up again now? This thread was related to diagnosing a problem which I think has disappeared. I think EC2 support for GPUs has been improved within the last few months, so that GPU performance is similar to what you'd get natively.

- James

2012/1/9 Sam Goldman <samwg...@gmail.com>

Re: [theano-users] Re: "Cluster GPU Instances" Sam Goldman 1/9/12 12:22 PM
I ran the logistic_sgd.py program on a Cluster GPU instance yesterday
and I got performance worse than Josh Bleecher Snyder's "6.8s on [his]
machine (GTX480/i7)". I don't remember exactly, but it was around
60-70 seconds. Better than his earlier result of 113.7s on the EC2
unit, but might warrant further investigation.

Unfortunately, I don't currently have a GPU card to test against,
which is why I am doing experiments using these machines.

If the expectation that EC2 Cluster GPU instances should be as fast as
native, then maybe I should just run the benchmarks listed and report
back.

2012/1/9 James Bergstra <james.b...@gmail.com>:

Re: [theano-users] Re: "Cluster GPU Instances" James 1/9/12 12:35 PM
Interesting - the program that gave me hope was bound by GPU convolutions rather than anything else, such as host-device transfers.  I should back off and say that *some* improvement was evident, but it's still certainly worth benchmarking various aspects of the EC2 GPU platform.

- James