Can't run master on Ubuntu - cudnn-6_0 runs but can't get MNIST accuracy above 78%

45 views
Skip to first unread message

Bobby Harris

unread,
Nov 24, 2017, 11:39:19 AM11/24/17
to clojure-cortex

Hello all.

Trying to start to play with cortex and running into some issues.  I'm running java 9 (java 8 results were similar) on Ubuntu 17.10.

I couldn't run following the instructions on the cortex github page, mostly, I believe, because  nvidia-cuda-toolkit is 7.5.

I followed tensorflow install instructions and run fine on the gpu.

Now, "lein test" on master gives:

--  lein test :only cortex.compute.cuda-driver-test/indexed-copy-f
--  
--  ERROR in (indexed-copy-f) (Compiler.java:6748)
--  Uncaught exception, not in assertion.
--  expected: nil
--    actual: clojure.lang.Compiler$CompilerException: java.lang.UnsatisfiedLinkError: no jnicudnn in java.library.path, compiling:(cortex/compute/cuda/driver.clj:234:5)
--  

"lein test" runs fine on cudnn-6_0 without changes and "lein run" will work on the mnist example if I change project.clj to have:
 
  [org.bytedeco.javacpp-presets/cuda-platform "8.0-6.0-1.3"].  
 
  (which I got from the base project.clj. I also had to add --add-modules java.xml.bind for java 9)

However, accuracy never gets above 78%

If I make the same change to the base project.clj in master and run "lein test" I get:

-- lein test :only cortex.compute.cuda.tensor-test/convolution-operator-f
--
-- ERROR in (convolution-operator-f) (tensor_math.clj:1093)
-- Uncaught exception, not in assertion.
-- expected: nil
--   actual: java.lang.Exception: Cudnn error: CUDNN_STATUS_BAD_PARAM
--

Thanks in advance if you have the time!






Bobby Harris

unread,
Nov 25, 2017, 12:56:03 AM11/25/17
to clojure-cortex
Okay, I didn't realize that I needed to merge the cudnn-6_0 branch into master.  That resolves all the issues in master. However, my mnist accuracy still oscilates around .77 using my locally built snapshot.

I checked to see if this is the same issues reported by Carin Meier.  It doesn't seem to be.  I checked out 260c00de7f291d726b5f7c9e8225db149306766c Xor example (#195), merged cudnn-6_0 and built local snapshots but still the same oscillating results.

Thanks in advance, again.

Bobby Harris

unread,
Nov 26, 2017, 11:06:43 AM11/26/17
to clojure-cortex
Okay, learning in public here.  :).  Removing the dropout layer got me to 97%.

Carin Meier

unread,
Nov 27, 2017, 8:24:52 AM11/27/17
to clojure-cortex
I was noticing this the other day. 

Right now the description of the example is 

[(layers/input input-w input-h 1 :id :data)
   (layers/convolutional 5 0 1 20)
   (layers/max-pooling 2 0 2)
   (layers/dropout 0.9)
   (layers/relu)
   (layers/convolutional 5 0 1 50)
   (layers/max-pooling 2 0 2)
   (layers/batch-normalization)
   (layers/linear 1000)
   (layers/relu :center-loss {:label-indexes {:stream :labels}
                              :label-inverse-counts {:stream :labels}
                              :labels {:stream :labels}
                              :alpha 0.9
                              :lambda 1e-4})
   (layers/dropout 0.5)
   (layers/linear num-classes)
   (layers/softmax :id :labels)]

It seems like we should get rid of the initial dropout of  (layers/dropout 0.9)in the example? Any reason that I'm missing that it should be in there? 

- Carin

Harold

unread,
Nov 27, 2017, 11:55:05 AM11/27/17
to clojure-cortex
Crain, Bobby,

Thanks for looking into this.

If removing the dropout layer gives an accuracy bump here, then removing it might make us look a little smarter to people carefully investigating the test output.

Right now, (I believe) this is the only test that actually trains through dropout. So, if we do remove it, we should perhaps add another separate test that is part of the cortex verification suite so that we don't lose any test coverage. Bonus points if the new test actually benefits in generalization accuracy by employing dropout.

I'm not personally inclined to make any changes in this area since the model accuracy in this unit test doesn't matter as much to me as actually training through the dropout code in CI with each build. PR definitely welcome as always, though.

hth,
-Harold

Carin Meier

unread,
Nov 27, 2017, 5:40:07 PM11/27/17
to clojure-cortex
Thanks for the response Harold.

I wasn't proposing changing the network in any of the tests, just in the "examples/mnist-classification/src/mnist_classification/core.clj"
The reason being that people new to the project that are trying it out would get better accuracies when they ran the examples. At least of my computer it makes it look better with results at 98% rather than in the 70% :)

I can certainly put a PR together for it with the differences that I see before and after.

Best,
Carin

Carin Meier

unread,
Nov 27, 2017, 6:01:06 PM11/27/17
to clojure-cortex
Here is the PR with my results from my local computer https://github.com/thinktopic/cortex/pull/249 :)

Bobby Harris

unread,
Nov 27, 2017, 8:48:53 PM11/27/17
to clojure-cortex
Thanks Carin and Harold for looking at this.

PS  Carin, You stole my first chance to commit!  :).


On Monday, November 27, 2017 at 6:01:06 PM UTC-5, Carin Meier wrote:
Here is the PR with my results from my local computer https://github.com/thinktopic/cortex/pull/249 :)

On Monday, November 27, 2017 at 5:40:07 PM UTC-5, Carin Meier wrote:
Thanks for the response Harold

Carin Meier

unread,
Nov 28, 2017, 9:09:58 AM11/28/17
to clojure-cortex
Oh no! Bobby, I'll definitely be happy to help you get involved in contributing. Feel free to reach out to me on #cortex in Clojarian slack and happy to chat about issues and generally give a helping hand or collaborate :)

- Carin
Reply all
Reply to author
Forward
0 new messages