pycaffe + MKL segfault on 'import caffe' on OS X 10.11.3

561 views
Skip to first unread message

Andrew Hundt

unread,
Feb 27, 2016, 10:54:13 PM2/27/16
to Caffe Users
I'm trying to run pycaffe and I'm getting a segfault on `import caffe`.

Initially I was getting an error due to not finding libmkl_rt.dylib
I fixed that by ensuring my Makefile.config is set to mkl and the paths are set to the mkl install location. Furthermore, I added:
export DYLD_LIBRARY_PATH=/opt/intel/compilers_and_libraries/mac/mkl/lib

Now when I run I've stepped through the code with PyDev (and run from command line) to discover it is segfaulting on the following pycaffe.py line which is reached by stepping into the call to import caffe:

from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \

        RMSPropSolver, AdaDeltaSolver, AdamSolver



I don't believe that the following is the cause of the error because it is just the loading stage. It is also worth noting that my build is with a modified version of caffe needed to run the code for the paper Fully Convolutional networks for Semantic Segmentation.

Here is the version I'm running including a merge of master to ensure it includes the latest fixes:

I'm on OS X 10.11.3 using a 2014 macbook pro with a NVIDIA GeForce GT 750M 2048 MB

Thanks for any help!

Simon Bächler

unread,
Feb 28, 2016, 9:32:36 AM2/28/16
to Caffe Users
Try rebuilding Caffe using the newest makefile from trunk:

They have fixed some issues with El Capitan and its System Integrity Protection.

You have to rebuild Caffe and PyCaffe.

If you are using Boost 1.6.0 you are going to run into another issue and have to apply this fix (and rebuild PyCaffe):

Regards

Simon

Andrew Hundt

unread,
Feb 28, 2016, 2:39:24 PM2/28/16
to Simon Bächler, Caffe Users

Thanks for the tips! I reply below.

On Sun, Feb 28, 2016 at 9:32 AM, Simon Bächler <stbae...@gmail.com> wrote:
Try rebuilding Caffe using the newest makefile from trunk:

They have fixed some issues with El Capitan and its System Integrity Protection.

I merged from master again, while I believe I had already incorporated the changes you mentioned, now that I’m on today’s master it is definitely incorporating the SIP fixes.
 

You have to rebuild Caffe and PyCaffe.

If you are using Boost 1.6.0 you are going to run into another issue and have to apply this fix (and rebuild PyCaffe):

I’ve merged the fixes for boost 1.60.0 which is in fact the version of boost I am using. I’ve also cleaned and completely recompiled caffe.


However, I’m still crashing at the same place. I’ve now got a stack trace and it does seem to occur when passing through the boost python initialization phase. Does boost itself perhaps need a patch for this issue?

Here is the relevant portion of the report and stack trace:

Process:               Python [33677]
Path:                  /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               2.7.11 (2.7.11)
Code Type:             X86-64 (Native)
Parent Process:        eclipse [5802]
Responsible:           Python [33677]
User ID:               502

Date/Time:             2016-02-28 14:29:38.999 -0500
OS Version:            Mac OS X 10.11.3 (15D21)
Report Version:        11
Anonymous UUID:        49B92278-12A6-2394-C846-F2CF84AE1011

Sleep/Wake UUID:       0F960AD0-0267-4B4A-A200-D53551C36866

Time Awake Since Boot: 310000 seconds
Time Since Wake:       2500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0:
--> 
    __TEXT                 000000010feae000-000000010feb0000 [    8K] r-x/rwx SM=COW  /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   ???                           000000000000000000 0 + 0
1   org.python.python             0x000000011af1b1ad PyEval_GetGlobals + 23
2   org.python.python             0x000000011af2a8e1 PyImport_Import + 137
3   org.python.python             0x000000011af29001 PyImport_ImportModule + 31
4   _caffe.so                     0x0000000111b1b916 caffe::init_module__caffe() + 5718
5   libboost_python.dylib         0x000000011ae3d351 boost::python::handle_exception_impl(boost::function0<void>) + 81
6   libboost_python.dylib         0x000000011ae3e3b9 boost::python::detail::init_module(char const*, void (*)()) + 121
7   org.python.python             0x000000010ff5898b _PyImport_LoadDynamicModule + 140
8   org.python.python             0x000000010ff57689 import_submodule + 267
9   org.python.python             0x000000010ff5724f load_next + 284
10  org.python.python             0x000000010ff56433 PyImport_ImportModuleLevel + 1139
11  org.python.python             0x000000010ff3770a builtin___import__ + 135
… snip …


Thanks again.

Cheers!
Andrew Hundt

Evan Shelhamer

unread,
Feb 28, 2016, 3:20:15 PM2/28/16
to Andrew Hundt, Simon Bächler, Caffe Users
Hi Andrew,

I authored the El Capitan fixes because I ran into trouble myself. However I haven't encountered any issues with the latest boost (1.60) in running Caffe + pycaffe. Did you build from source with python as directed in the OS X install guide? Another common issue is to build/link against one Python (say homebrew) but then try to import caffe into another Python (like the included system Python). I'm suspicious since eclipse is the parent process in your trace... have you tried importing from python/ipython itself?

In particular I have not applied the patch in #3575 -- I would like to understand why some seem to need it.

Good luck,

Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/CAMCxpGtPqew%3DiTk3xdvQE8tZg-KjCejYBHDQKg_mOgjwyuAd%2BQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Simon Bächler

unread,
Feb 28, 2016, 5:11:05 PM2/28/16
to Caffe Users, stbae...@gmail.com
Hi Andrew

I had Python crashing on my system after I applied those fixes. I don't know if it is the same issue as yours.

It turned out to be this one here: https://github.com/rbgirshick/py-faster-rcnn/issues/2. Most likely the GPU
architecture I compiled Caffe against did not match my installed GPU.


Regards
Simon

Andrew Hundt

unread,
Feb 28, 2016, 6:08:54 PM2/28/16
to Evan Shelhamer, Simon Bächler, Caffe Users
On Sun, Feb 28, 2016 at 3:20 PM, Evan Shelhamer <evan.sh...@gmail.com> wrote:
Hi Andrew,

I authored the El Capitan fixes because I ran into trouble myself. However I haven't encountered any issues with the latest boost (1.60) in running Caffe + pycaffe. Did you build from source with python as directed in the OS X install guide?

While I did go through those instructions, unfortunately they are missing lots of steps that are commonly encountered. The instructions that got me through many of the other additional steps I needed are at the following link, even though they are slightly out of date:
 
Another common issue is to build/link against one Python (say homebrew) but then try to import caffe into another Python (like the included system Python). I'm suspicious since eclipse is the parent process in your trace... have you tried importing from python/ipython itself?

I’m familiar with that sort of issue, had to deal with them 6 months ago when I was first setting up python to run with OpenCV. I’ve verified that the same eval.py is able to import numpy and cv2 successfully.
 
In particular I have not applied the patch in #3575 -- I would like to understand why some seem to need it.

Unfortunately my problem seems to be separate from this one since I’m continuing to experiencing crashes. 

I’ve had to deal with the build/link problems in the past when I was getting things to run with opencv, I know eclipse is configured correctly (I’m using pydev), plus I run into the same problem when I simply run “python eval.py”.

However, I did have at least one mistake. I ran the following:
± python2-config --include
-I/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/include/python2.7

I then updated my Makefile.config to the following python related paths:

PYTHON_INCLUDE := $(shell python2-config --prefix)/include
PYTHON_LIB := $(shell python2-config --prefix)/lib

# Homebrew installs numpy in a non standard path (keg only)
PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
PYTHON_LIB += $(shell brew --prefix numpy)/lib

I’m used to using CMake and some things seem to differ, so is changing the Makefile.config then running:

± make clean; make -j16 all pycaffe

sufficient to rebuild with the new configuration?


Assuming it is, I re-ran the full build as described above, then executed python eval.py supplied with the fcn-32s gist. Unfortunately, I continue to experience essentially the same crash as before, reproduced below.

Any possible additional ideas or ways I can try to narrow the issue down?

Cheers!
Andrew Hundt 

Process:               Python [69693]
Path:                  /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               2.7.11 (2.7.11)
Code Type:             X86-64 (Native)
Parent Process:        zsh [27884]
Responsible:           Python [69693]
User ID:               502

Date/Time:             2016-02-28 17:55:28.744 -0500
OS Version:            Mac OS X 10.11.3 (15D21)
Report Version:        11
Anonymous UUID:        49B92278-12A6-2394-C846-F2CF84AE1011

Sleep/Wake UUID:       0F960AD0-0267-4B4A-A200-D53551C36866

Time Awake Since Boot: 330000 seconds
Time Since Wake:       14000 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0:
--> 
    __TEXT                 000000010121c000-000000010121e000 [    8K] r-x/rwx SM=COW  /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   ???                           000000000000000000 0 + 0
1   org.python.python             0x000000010b7e71ad PyEval_GetGlobals + 23
2   org.python.python             0x000000010b7f68e1 PyImport_Import + 137
3   org.python.python             0x000000010b7f5001 PyImport_ImportModule + 31
4   _caffe.so                     0x00000001023e7916 caffe::init_module__caffe() + 5718
5   libboost_python.dylib         0x000000010b709351 boost::python::handle_exception_impl(boost::function0<void>) + 81
6   libboost_python.dylib         0x000000010b70a3b9 boost::python::detail::init_module(char const*, void (*)()) + 121
7   org.python.python             0x00000001012c098b _PyImport_LoadDynamicModule + 140
8   org.python.python             0x00000001012bf689 import_submodule + 267
9   org.python.python             0x00000001012bf24f load_next + 284
10  org.python.python             0x00000001012be433 PyImport_ImportModuleLevel + 1139
11  org.python.python             0x000000010129f70a builtin___import__ + 135
12  org.python.python             0x000000010122aef0 PyObject_Call + 99

… snip …


Graphics: NVIDIA GeForce GT 750M, NVIDIA GeForce GT 750M, PCIe, 2048 MB

… snip …

Evan Shelhamer

unread,
Feb 28, 2016, 7:51:24 PM2/28/16
to Andrew Hundt, Simon Bächler, Caffe Users
I’m used to using CMake and some things seem to differ

​I see. I'm only familiar with the Makefile myself, so I'm sorry that I can't speak to the CMake config in detail.
However, if you edit your Makefile.config then `make clean && make && make pycaffe` (with whatever parallelization you like such as `-j16`) ​
 
​then that will compile with the Make build.​

Evan Shelhamer




Andrew Hundt

unread,
Feb 28, 2016, 7:53:55 PM2/28/16
to Simon Bächler, Caffe Users
I’ve been able to get substantially further! “make runtest" has an issue that should be straightforward, and I’m able to partially run eval.py for fcn-8s-py. However in both cases I run into problems detailed below:

Here is the updated Makefile.config for others running into this problem with a Mac, homebrew, and are using intel MKL which can be acquired for free if you’re at an educational institution:


Specifically one of my include pyhonpaths wasn’t correct in the previous one, I needed :

PYTHON_INCLUDE := $(shell python2-config --prefix)/include/python2.7/

When I try make runtest it still seems to fail:

± make runtest -j16
.build_release/tools/caffe
dyld: Library not loaded: libmkl_rt.dylib
  Referenced from: /Users/athundt/source/git/caffe/.build_release/tools/caffe
  Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

I tried the DYLD_LIBRARY_PATH settings one should always try first, to no effect:

export DYLD_FALLBACK_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/lib:/usr/lib:/usr/local/Cellar/hdf5/:/usr/local/Cellar/:/opt/intel/compilers_and_libraries/mac/mkl/lib/:$DYLD_FALLBACK_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/lib:/usr/lib:/usr/local/Cellar/hdf5/:/usr/local/Cellar/:/opt/intel/compilers_and_libraries/mac/mkl/lib/:$DYLD_LIBRARY_PATH


When I try running fcn-8s

my eval.py:
However I’m getting a failure:


I0228 19:44:08.119972 2066743296 net.cpp:387] relu1_1 -> conv1_1 (in-place)

F0228 19:44:08.277551 2066743296 cudnn_relu_layer.cpp:13] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR

*** Check failure stack trace: ***



Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff8df7f002 __pthread_kill + 10
1   libsystem_pthread.dylib       0x00007fff971fa5c5 pthread_kill + 90
2   libsystem_c.dylib             0x00007fff908f86e7 abort + 129
3   libglog.0.dylib               0x000000011cc5107f google::logging_fail() + 9
4   libglog.0.dylib               0x000000011cc51076 google::LogMessage::Fail() + 10
5   libglog.0.dylib               0x000000011cc50757 google::LogMessage::SendToLog() + 1389
6   libglog.0.dylib               0x000000011cc50cc5 google::LogMessage::Flush() + 189
7   libglog.0.dylib               0x000000011cc54015 google::LogMessageFatal::~LogMessageFatal() + 15
8   libglog.0.dylib               0x000000011cc51363 google::LogMessageFatal::~LogMessageFatal() + 9
9   libcaffe.so.1.0.0-rc3         0x0000000117eb0bcd caffe::CuDNNReLULayer<float>::LayerSetUp(std::__1::vector<caffe::Blob<float>*, std::__1::allocator<caffe::Blob<float>*> > const&, std::__1::vector<caffe::Blob<float>*, std::__1::allocator<caffe::Blob<float>*> > const&) + 557

Andrew Hundt

unread,
Feb 28, 2016, 10:47:13 PM2/28/16
to Simon Bächler, Caffe Users
On Sun, Feb 28, 2016 at 7:53 PM, Andrew Hundt <ath...@gmail.com> wrote:

When I try running fcn-8s

my eval.py:
However I’m getting a failure:


I0228 19:44:08.119972 2066743296 net.cpp:387] relu1_1 -> conv1_1 (in-place)

F0228 19:44:08.277551 2066743296 cudnn_relu_layer.cpp:13] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR

*** Check failure stack trace: ***



Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff8df7f002 __pthread_kill + 10
1   libsystem_pthread.dylib       0x00007fff971fa5c5 pthread_kill + 90
2   libsystem_c.dylib             0x00007fff908f86e7 abort + 129
3   libglog.0.dylib               0x000000011cc5107f google::logging_fail() + 9
4   libglog.0.dylib               0x000000011cc51076 google::LogMessage::Fail() + 10
5   libglog.0.dylib               0x000000011cc50757 google::LogMessage::SendToLog() + 1389
6   libglog.0.dylib               0x000000011cc50cc5 google::LogMessage::Flush() + 189
7   libglog.0.dylib               0x000000011cc54015 google::LogMessageFatal::~LogMessageFatal() + 15
8   libglog.0.dylib               0x000000011cc51363 google::LogMessageFatal::~LogMessageFatal() + 9
9   libcaffe.so.1.0.0-rc3         0x0000000117eb0bcd caffe::CuDNNReLULayer<float>::LayerSetUp(std::__1::vector<caffe::Blob<float>*, std::__1::allocator<caffe::Blob<float>*> > const&, std::__1::vector<caffe::Blob<float>*, std::__1::allocator<caffe::Blob<float>*> > const&) + 557

This error is due to insufficient video memory on my GPU, closing all other applications fixed it. I still need to figure out why MKL can’t be found when running make runtest.

Reply all
Reply to author
Forward
0 new messages