I managed to get Caffe running on a GPU instance. I'm running it under Docker because it makes it easy to spin up instances without having to wrangle with installing all of the dependencies. (the dependencies, along with a pre-compiled caffe, are baked into the Docker image.).
Here is a blog post that describes how to get up and running:
Running Caffe on AWS GPU Instance via Docker
and it has links to the Docker image and Dockerfile.
Using a g2.2xlarge AWS instance (without cudnn) here's how long it took to train the MNIST LeNet example:
real 3m51.072s
user 2m57.399s
sys 0m58.011s
It's faster than on my laptop using CPU, but I'd be curious to hear how it compares to people running it on their workstations in non-cudnn GPU mode.