TensorFlow. СPU enabled instance on AWS. Step by step

946 views
Skip to first unread message

Evgeny Shaliov

unread,
Nov 25, 2015, 7:20:59 AM11/25/15
to Discuss



  1. Create an instance on AWS


  1. Create the AWS instance from ami-d93622b8 *


2. I have choosen c3.2xlarge, 16GB of SSD



3. Configure security group, generate (or reuse) key pair for access to the instance


* When I tried to install TensorFlow into “Amazon ECS-Optimized Amazon Linux AMI” I had the result “tensorflow-0.5.0-cp27-none-linux_x86_64.whl is not a supported wheel on this platform”. Also I tried to use “CentOS 7 (x86_64) with Updates HVM”, amzn-ami-hvm-2013.09.2.x86_64-s3” AMIs but unsuccessful too.

  1. Configure environment


1. Login on the remote instance using SSH (default username: ec2-user).


2. Check python and pip availability.


python --version

pip --version


3. Install gcc compiler


sudo yum install gcc

4. Install TensorFlow


sudo pip install  https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl


5. Check configured environment is correct. Open a python terminal:


$ python

>>> import tensorflow as tf

>>> hello = tf.constant('Hello, TensorFlow!')

>>> sess = tf.Session()

>>> print sess.run(hello)

Hello, TensorFlow!

>>> a = tf.constant(10)

>>> b = tf.constant(32)

>>> print sess.run(a+b)

42

>>>


  1. Run a sample


1. Install Git


yum install git -y


2. Clone the project


git clone --recurse-submodules https://github.com/tensorflow/tensorflow


3. Run tensorflow neural net model


python tensorflow/tensorflow/models/image/mnist/convolutional.py


Initialized!

Epoch 0.00

Minibatch loss: 12.054, learning rate: 0.010000

Minibatch error: 90.6%

Validation error: 84.6%

Epoch 0.12

Minibatch loss: 3.285, learning rate: 0.010000

Minibatch error: 6.2%

Validation error: 7.0%

Epoch 0.23

Minibatch loss: 3.473, learning rate: 0.010000

Minibatch error: 10.9%

Validation error: 3.7%

Epoch 0.35

Minibatch loss: 3.221, learning rate: 0.010000

Minibatch error: 4.7%

Validation error: 3.2%

Epoch 0.47

Minibatch loss: 3.193, learning rate: 0.010000

Minibatch error: 4.7%

Validation error: 2.7%


….


  1. Run TensorBoard


1. Check /usr/local/bin is added into PATH


echo $PATH


2. If /usr/local/bin is not added into PATH add next row into .bashrc


PATH=$PATH:$HOME/bin:/usr/local/bin


3. Update AWS security group. Need add new inbound TCP port 6006.



4. Run TensorBoard on the server:


tensorboard --logdir /var/log


5. Open TensorBoard in a browser.



mebe...@nvidia.com

unread,
Feb 10, 2016, 5:28:29 PM2/10/16
to Discuss
I Got TensorBoard running on a GPU instance in AWS, but when I go to the Graph tab, I get the following JS error: `Uncaught TypeError: Polymer.dom(...).unobserveNodes is not a function`.  The Events tab seems to work fine.  Anyone else see this?

nsd...@gmail.com

unread,
Oct 20, 2017, 2:36:04 AM10/20/17
to Discuss
In step no.4 of "Run Tensor Board", EC2 instance asks me to navigate to specific URL (http://172.31.30.35:6006/). However,am not able to access the URL (http://172.31.30.35:6006/) even after changing inbound security settings of EC2 instance. Help please !!! 

Toby Boyd

unread,
Oct 20, 2017, 12:09:28 PM10/20/17
to nsd...@gmail.com, Discuss
Likely a better question for stackoverflow.  I might be making a bad guess but when you start tensorboard it is going to start on the private IP of an AWS instance as the instance as zero (or near zero) knowledge of its public ip or public DNS which EC2 also provides.   If you go to the EC2 web interface you will see the public ip for the machine.  Make sure port 6006 is open, which is looks like you did, and then navigate to it using the public ip or public DNS.  

Good luck

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/15c2efd1-1547-4a01-84d9-45655358ffed%40tensorflow.org.

Toby Boyd

unread,
Oct 20, 2017, 12:09:58 PM10/20/17
to nsd...@gmail.com, Discuss
ah OH MY TensorFlow 0.5.0 might pre-date my time on the team.  
Reply all
Reply to author
Forward
0 new messages