I have a highly multithreaded application that uses Caffe. I mostly run it on a machine with a GPU and therefore have Caffe compiled with CUDA support. Recently I did some testing on a machine without a GPU and therefore compiled Caffe with CPU_ONLY. While running on this machine with Caffe compiled in CPU_ONLY configuration (Ubuntu 16.04 using gcc 5.4) I discovered a segfault that I'd never seen before.
To help troubleshoot the original segfault issue I enabled the Address Sanitizer (ASAN) and recompiled my application. When I ran a simple test I got this (I have removed a couple of my code paths from the trace):
=================================================================
==32585==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6030000a1fa0 at pc 0x7fa7bef65b25 bp 0x7fa787804fc0 sp 0x7fa787804fb0
READ of size 4 at 0x6030000a1fa0 thread T7
#0 0x7fa7bef65b24 in caffe::Caffe::mode() /opt/bl/include/caffe/common.hpp:142
#1 0x7fa7bef65b24 in int (omitted)/CaffeYOLOv2Model.h:182
#2 0x7fa7beea02c4 in (omitted)/CaffeModelController.cc:276
#3 0x7fa7be5c94ed (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd04ed)
#4 0x7fa7beb616b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
#5 0x7fa7be01e41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
0x6030000a1fa0 is located 0 bytes to the right of 32-byte region [0x6030000a1f80,0x6030000a1fa0)
allocated by thread T7 here:
#0 0x7fa7bf2cc592 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99592)
#1 0x7fa7bd53ebb3 in caffe::Caffe::Get() (/opt/bl/lib/libcaffe.so.1.0.0+0x1b1bb3)
Thread T7 created by T0 here:
#0 0x7fa7bf2691e3 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x361e3)
#1 0x7fa7be5c95f3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd05f3)
SUMMARY: AddressSanitizer: heap-buffer-overflow /opt/bl/include/caffe/common.hpp:142 caffe::Caffe::mode()
Shadow bytes around the buggy address:
0x0c068000c3a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c3b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c3c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c3d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c3e0: fa fa fa fa fa fa fa fa fa fa 00 00 00 fa fa fa
=>0x0c068000c3f0: 00 00 00 00[fa]fa fd fd fd fd fa fa fd fd fd fd
0x0c068000c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c410: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c420: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c430: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c068000c440: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
I then annotated the Get() function which is referenced by common.h line 142 as so:
It appears based on this annotation that this heap overflow issue is somehow only happening on the second thread's Caffe object. In my code I can move the second thread's call to caffe::Caffe::mode() and the issue will follow it but always fails with the same behavior.
Does anyone have any idea what's going on? I cannot replicate this issue on the other machine with the CUDA-compiled version of Caffe but I can replicate it if I use the CPU_ONLY version. I suppose this could be an ASAN bug, but I can't find any mention of something similar.
Any assistance would be appreciated.