something wrong happens when making runtest

1,641 views
Skip to first unread message

neuk...@gmail.com

unread,
Oct 9, 2014, 10:20:41 PM10/9/14
to caffe...@googlegroups.com
My computer is mbp late 2013, so I installed caffe with non-CPU way.
But I got an error report after I finished the make-all and make-test processes.

[  FAILED  ] 3 tests, listed below:
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU

 3 FAILED TESTS
make: *** [runtest] Error 1

And the errors in detail were:

#######################################################################################################################
[ RUN      ] PowerLayerTest/0.TestPowerGradientShiftZero
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.16171693801879883, which exceeds threshold_ * scale, where
computed_gradient evaluates to 6.6543664932250977,
estimated_gradient evaluates to 6.8160834312438965, and
threshold_ * scale evaluates to 0.068160831928253174.
debug: (top_id, top_data_id, blob_id, feat_id)=0,65,0,65; feat = 0.027440188452601433; objective+ = 0.55363553762435913; objective- = 0.41731387376785278
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU (3 ms)

#######################################################################################################################
[ RUN      ] PowerLayerTest/1.TestPowerGradientShiftZero
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.66645549482483268, which exceeds threshold_ * scale, where
computed_gradient evaluates to 9.0545713301684909,
estimated_gradient evaluates to 9.7210268249933236, and
threshold_ * scale evaluates to 0.097210268249933243.
debug: (top_id, top_data_id, blob_id, feat_id)=0,66,0,66; feat = 0.016829367669263143; objective+ = 0.48941214282974049; objective- = 0.29499160632987403
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.48462038369962279, which exceeds threshold_ * scale, where
computed_gradient evaluates to 8.4754232528829139,
estimated_gradient evaluates to 8.9600436365825367, and
threshold_ * scale evaluates to 0.089600436365825362.
debug: (top_id, top_data_id, blob_id, feat_id)=0,71,0,71; feat = 0.01869104873835549; objective+ = 0.50171265479916738; objective- = 0.32251178206751663
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 0.24489777273781588, which exceeds threshold_ * scale, where
computed_gradient evaluates to 7.3061654292184715,
estimated_gradient evaluates to 7.5510632019562873, and
threshold_ * scale evaluates to 0.075510632019562873.
debug: (top_id, top_data_id, blob_id, feat_id)=0,99,0,99; feat = 0.023657563288965969; objective+ = 0.53224239845290788; objective- = 0.38122113441378214
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU (4 ms)

#######################################################################################################################
[ RUN      ] PowerLayerTest/1.TestPowerGradient
./include/caffe/test/test_gradient_check_util.hpp:166: Failure
The difference between computed_gradient and estimated_gradient is 1.206900511134485, which exceeds threshold_ * scale, where
computed_gradient evaluates to 10.15816285551514,
estimated_gradient evaluates to 11.365063366649625, and
threshold_ * scale evaluates to 0.11365063366649626.
debug: (top_id, top_data_id, blob_id, feat_id)=0,57,0,57; feat = 2.9055876775560447; objective+ = 0.46979585546340097; objective- = 0.24249458813040844
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU (3 ms)

As it mentioned, it seems something wrong with the algorithms because it said in common that the difference between computed_gradient and estimated_gradient exceeds threshold. Thus I have no idea if there is something wrong in my configuration.

Any ideas? Hope your answers asap.

Thanks, 
Neuk

Evan Shelhamer

unread,
Oct 10, 2014, 1:33:01 AM10/10/14
to neuk...@gmail.com, caffe...@googlegroups.com
What OS and ? What is your BLAS set to? I have not run into this with OS X 10.9 or 10.8. It seems for whatever reason these operations have bad numerical stability in your installation.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/71cf96af-ded9-4535-944c-359c70a4fd46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

neuk...@gmail.com

unread,
Oct 10, 2014, 1:47:17 AM10/10/14
to caffe...@googlegroups.com, neuk...@gmail.com, shel...@eecs.berkeley.edu
Thanks a lot. My OS is OS X 10.9.4, and BLAS is set to atlas. The installation page has a word that, for OS X, it's already installed as the Accelerate / vecLib Framework, so I just simply skipped this step. Could it be the reason for my case?

Plus, how should I do to achieve further information? I'm fed up with this case cuz I'm stuck up a couple of days.

thanks again.

在 2014年10月10日星期五UTC+8下午1时33分01秒,Evan Shelhamer写道:

Sanjeev J

unread,
Oct 10, 2014, 2:00:01 AM10/10/14
to caffe...@googlegroups.com, neuk...@gmail.com, shel...@eecs.berkeley.edu
I have the same issue. The same three tests fail for me . I am running on Mac OSx with CPU only mode.
Please let me know what the fix is . I am stuck with this for a while.

Thanks
Sanjeev

Sanjeev J

unread,
Oct 10, 2014, 2:05:01 AM10/10/14
to caffe...@googlegroups.com, neuk...@gmail.com, shel...@eecs.berkeley.edu
I am running Mac OSx 10.9.5 with CPU only mode and this the last bit of  my log

[  FAILED  ] 3 tests, listed below:

[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU

[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU

[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU


 3 FAILED TESTS

make: *** [runtest] Error 1


Evan Shelhamer

unread,
Oct 10, 2014, 2:12:38 AM10/10/14
to Sanjeev J, caffe...@googlegroups.com, neuk...@gmail.com
This seems to be a fault in Accelerate / vecLib or our integration of it on OS X. I have reproduced this with `BLAS := atlas`. The workaround for now is to install OpenBLAS or MKL and set the `BLAS` flag accordingly.

Evan Shelhamer

Sanjeev J

unread,
Oct 10, 2014, 3:28:25 AM10/10/14
to caffe...@googlegroups.com, sanjeev.ja...@gmail.com, neuk...@gmail.com, shel...@eecs.berkeley.edu
Hi Evan Shelhamer, 

I tried switching to open blas , but the issue still remains.. The log is below . Please help . Out of curiosity, what do these errors mean ? Are the gradients being computed incorrectly ?

I am grateful for all your responses . 

[  PASSED  ] 454 tests.

[  FAILED  ] 3 tests, listed below:

[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU

[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU

[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU


Regards

Sanjeev

neuk...@gmail.com

unread,
Oct 10, 2014, 8:41:43 AM10/10/14
to caffe...@googlegroups.com, sanjeev.ja...@gmail.com, shel...@eecs.berkeley.edu
Hi Evan Shelhamer,

I followed your advice switching to openblas, but the error still prompted. And the log remains the same.

Besides, I find such three errors are raised in common by ./include/caffe/test/test_gradient_check_util.hpp:166. Would it be some clue for the issue?

Please help, and thanks for your reply.

在 2014年10月10日星期五UTC+8下午2时12分38秒,Evan Shelhamer写道:

Yangqing Jia

unread,
Oct 10, 2014, 12:57:22 PM10/10/14
to neuk...@gmail.com, Jeff Donahue, caffe...@googlegroups.com, sanjeev.ja...@gmail.com, Evan Shelhamer
I was reading the PowerLayerTest and found the bottom blob initialization to be:

GaussianFiller<Dtype> filler(filler_param);

Is it intended behavior? If we provide a non-integer power parameter, the output with a negative input may not be real numbers (they will be complex numbers). This will write EDOM to errno but the program will continue to run, making it a little tricky.


I am looping in Jeff who may have a clearer idea on the intended usage of power layers.

Yangqing

neuk...@gmail.com

unread,
Oct 11, 2014, 12:17:35 AM10/11/14
to caffe...@googlegroups.com, neuk...@gmail.com, jeff.d...@gmail.com, sanjeev.ja...@gmail.com, shel...@eecs.berkeley.edu
Thanks, Yangqing.

Then could it indicate that my installation has been perfectly finished? I'm worried if my caffe can work well.

And I'm also curious how you tracked back and found out the initial problem.

Thanks again, and please let me know asap if there is any results.

在 2014年10月11日星期六UTC+8上午12时57分22秒,Yangqing Jia写道:

Sanjeev J

unread,
Oct 12, 2014, 3:31:13 PM10/12/14
to caffe...@googlegroups.com, neuk...@gmail.com, jeff.d...@gmail.com, sanjeev.ja...@gmail.com, shel...@eecs.berkeley.edu
PLease let me know if there has been a solution to the three power tests failing.. I am still trying to fix this issue
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU

Evan Shelhamer

unread,
Oct 12, 2014, 3:39:08 PM10/12/14
to neuk...@gmail.com, caffe...@googlegroups.com, Sanjeev J
Did you remember to set your BLAS flag and check that you are actually linked to OpenBLAS and not vecLib / Accelerate?

Evan Shelhamer

On Fri, Oct 10, 2014 at 5:41 AM, <neuk...@gmail.com> wrote:

Sanjeev J

unread,
Oct 12, 2014, 3:50:00 PM10/12/14
to caffe...@googlegroups.com, neuk...@gmail.com, sanjeev.ja...@gmail.com, shel...@eecs.berkeley.edu
Yeah , I did .. I changed to open blas and set the flag accordingly ... It did not fix the issue . 


He tried the same fix and it did not work for him as well. 

Daniel Seita

unread,
Oct 13, 2014, 1:47:57 PM10/13/14
to caffe...@googlegroups.com
I am also running into the same problem on OS X 10.9.5, with both BLAS := atlas and := open in the Makefile.config. I first ran into the problem with atlas, and then with open after I installed OpenBLAS, modified the Makefile.config, and did 'make clean', 'make all', 'make test', and 'make runtest'. From reading the Makefile, it looks like setting BLAS := open should be all I need to do to force it to link to OpenBLAS. And the  'make all' and 'make test' results proceeded normally (indicating that OpenBLAS seems to be okay -- I just installed it with 'brew install OpenBLAS').

Evan Shelhamer

unread,
Oct 13, 2014, 5:12:59 PM10/13/14
to Daniel Seita, caffe...@googlegroups.com
Since this has been reported for atlas and openblas the fault must lie in the non-MKL `vs*` substitutes defined in mkl_alternate.hpp. I don't have time to investigate this now, but I can confirm that OS X + MKL do not have this issue and suggest that anyone encountering this problem either

(1) diagnose the numerical instability of using `Powx` in the vsPowx macro in mkl_alternate.hpp and submit a fix.
(2) switch to MKL as a workaround[

Evan Shelhamer

On Mon, Oct 13, 2014 at 10:47 AM, Daniel Seita <takesh...@gmail.com> wrote:
I am also running into the same problem on OS X 10.9.5, with both BLAS := atlas and := open in the Makefile.config. I first ran into the problem with atlas, and then with open after I installed OpenBLAS, modified the Makefile.config, and did 'make clean', 'make all', 'make test', and 'make runtest'. From reading the Makefile, it looks like setting BLAS := open should be all I need to do to force it to link to OpenBLAS. And the  'make all' and 'make test' results proceeded normally (indicating that OpenBLAS seems to be okay -- I just installed it with 'brew install OpenBLAS').

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

P Raz

unread,
Oct 18, 2014, 11:23:12 PM10/18/14
to caffe...@googlegroups.com, takesh...@gmail.com, shel...@eecs.berkeley.edu
I tried both atlas and openblas on mac OS X 10.10 (CPU mode) and everything passed except the 3 PowerLayerTests. However, when I downgraded my boost 1.56 to 1.55, all the tests passed and everything seems to be working. I'm new to Caffe so I'm not quite sure about the inner workings of it yet, but it seems quite odd to me that boost 1.56 is causing numerical instability.

I suggest other people to downgrade to boost 1.55 to see what happens:

cd /usr/local
git checkout a252214 /usr/local/Library/Formula/boost.rb
brew edit boost-python

 and change the file so that it uses boost 1.55, basically:
 class BoostPython < Formula
   homepage "http://www.boost.org"
-  sha1 "f94bb008900ed5ba1994a1072140590784b9b5df"
+  sha1 'cef9a0cc7084b1d639e06cd3bc34e4251524c840'
+  revision 2

make clean and make all caffe and hopefully it'll pass all the tests now.

Michele Pratusevich

unread,
Nov 13, 2014, 4:50:01 PM11/13/14
to caffe...@googlegroups.com, takesh...@gmail.com, shel...@eecs.berkeley.edu
I am having similar problems with the exact same three errors. OSX 10.9, no GPU.

Using openBLAS and downgrading / recompiling both boost and boost-python with the libstdc++ compiler from scratch did not resolve the issues. Does anyone have any suggestions for how to get this working?

Michele Pratusevich

unread,
Nov 14, 2014, 3:02:33 PM11/14/14
to caffe...@googlegroups.com, takesh...@gmail.com, shel...@eecs.berkeley.edu
Conformed that using boost 1.55 and MKL solves the issue - I do not have the errors any more.

Steph van Schalkwyk

unread,
Jan 23, 2015, 8:06:35 PM1/23/15
to caffe...@googlegroups.com, takesh...@gmail.com, shel...@eecs.berkeley.edu
Still getting it. 
This one with boost_1_55_0 and Atlas on Ubuntu 14/04, CUDA 6.5, g++ 4.8.
I have tried all conpilers (g++, gcc, Intel 15).
All BLAS (MKL, Atlas). 
Any ideas folks?
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] PowerLayerTest/0.TestPowerGradientShiftZero, where TypeParam = caffe::FloatCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradient, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/1.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleCPU
[  FAILED  ] PowerLayerTest/2.TestPowerGradientShiftZero, where TypeParam = caffe::FloatGPU
[  FAILED  ] PowerLayerTest/3.TestPowerGradient, where TypeParam = caffe::DoubleGPU
[  FAILED  ] PowerLayerTest/3.TestPowerGradientShiftZero, where TypeParam = caffe::DoubleGPU
 

Daniel Golden

unread,
Feb 6, 2015, 11:36:29 AM2/6/15
to caffe...@googlegroups.com, takesh...@gmail.com, shel...@eecs.berkeley.edu
Evan authored a PR that fixes the boost > 1.55 problem: https://github.com/BVLC/caffe/pull/1840

Thanks, Evan!
Reply all
Reply to author
Forward
0 new messages