Possible heap overflow with CPU_ONLY mode

7 views
Skip to first unread message

David Sorber

unread,
Aug 7, 2019, 3:05:15 PM8/7/19
to Caffe Users
Hi,

I have a highly multithreaded application that uses Caffe.  I mostly run it on a machine with a GPU and therefore have Caffe compiled with CUDA support.  Recently I did some testing on a machine without a GPU and therefore compiled Caffe with CPU_ONLY.  While running on this machine with Caffe compiled in CPU_ONLY configuration (Ubuntu 16.04 using gcc 5.4) I discovered a segfault that I'd never seen before.

To help troubleshoot the original segfault issue I enabled the Address Sanitizer (ASAN) and recompiled my application.  When I ran a simple test I got this (I have removed a couple of my code paths from the trace):

=================================================================
==32585==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6030000a1fa0 at pc 0x7fa7bef65b25 bp 0x7fa787804fc0 sp 0x7fa787804fb0
READ of size 4 at 0x6030000a1fa0 thread T7
    #0 0x7fa7bef65b24 in caffe::Caffe::mode() /opt/bl/include/caffe/common.hpp:142
    #1 0x7fa7bef65b24 in int (omitted)/CaffeYOLOv2Model.h:182
    #2 0x7fa7beea02c4 in (omitted)/CaffeModelController.cc:276
    #3 0x7fa7be5c94ed  (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd04ed)
    #4 0x7fa7beb616b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
    #5 0x7fa7be01e41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)

0x6030000a1fa0 is located 0 bytes to the right of 32-byte region [0x6030000a1f80,0x6030000a1fa0)
allocated by thread T7 here:
    #0 0x7fa7bf2cc592 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99592)
    #1 0x7fa7bd53ebb3 in caffe::Caffe::Get() (/opt/bl/lib/libcaffe.so.1.0.0+0x1b1bb3)

Thread T7 created by T0 here:
    #0 0x7fa7bf2691e3 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x361e3)
    #1 0x7fa7be5c95f3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xd05f3)

SUMMARY: AddressSanitizer: heap-buffer-overflow /opt/bl/include/caffe/common.hpp:142 caffe::Caffe::mode()
Shadow bytes around the buggy address:
  0x0c068000c3a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c3b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c3c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c3d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c3e0: fa fa fa fa fa fa fa fa fa fa 00 00 00 fa fa fa
=>0x0c068000c3f0: 00 00 00 00[fa]fa fd fd fd fd fa fa fd fd fd fd
  0x0c068000c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c410: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c420: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c430: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c068000c440: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7

I then annotated the Get() function which is referenced by common.h line 142 as so:

Caffe& Caffe::Get() {
  std
::cerr << "Inside Get(); tid: " << std::this_thread::get_id() << std::endl;
 
 
if (!thread_instance_.get()) {
    std
::cerr << "    Get(): reset ptr" << std::endl;
    thread_instance_
.reset(new Caffe());
 
}
  std
::cerr << "    Get(): return ptr" << std::endl;
 
return *(thread_instance_.get());
}

When I run my simple test with the annotated version I get:

Inside Get(); tid: 140358461213248
    Get(): reset ptr
    Get(): return ptr
Inside Get(); tid: 140358461213248
    Get(): return ptr
Inside Get(); tid: 140358461213248
    Get(): return ptr
(previous two lines repeated many times...)
Inside Get(); tid: 140357509609216
    Get(): reset ptr
    Get(): return ptr

Followed by the same ASAN trace as above.  

It appears based on this annotation that this heap overflow issue is somehow only happening on the second thread's Caffe object.  In my code I can move the second thread's call to caffe::Caffe::mode() and the issue will follow it but always fails with the same behavior.  

Does anyone have any idea what's going on?  I cannot replicate this issue on the other machine with the CUDA-compiled version of Caffe but I can replicate it if I use the CPU_ONLY version.  I suppose this could be an ASAN bug, but I can't find any mention of something similar.  

Any assistance would be appreciated.
Reply all
Reply to author
Forward
0 new messages