dead lock in 0.9.5.6

57 views
Skip to first unread message

gcc.lua

unread,
Apr 23, 2012, 11:03:26 PM4/23/12
to Hypertable Development
user thread logic like follow:
TableScannerPtr aScanner = tbSourcelist-
>create_scanner( specbuilder.get(), 5000 );
while( aScanner->next( gotCell ) )
{
.....
}

dead lock between user thread and scanner thread:

1. user thread TableScanner

TableScannerAsync::~TableScannerAsync() {
try {
cancel();
wait_for_completion();
}
catch (Exception &e) {
HT_ERROR_OUT << e << HT_END;
}
if (m_use_index) {
delete m_cb;//<=========================dead lock entry
m_cb = 0;
}
}
/////////////////////////////////////////
virtual ~IndexScannerCallback() {
ScopedLock lock(m_mutex);//<========= user thread got this
IndexScannerCallback::m_mutex
if (m_mutator)
delete m_mutator;

foreach (TableScannerAsync *s, m_scanners)
delete s;//dead lock 1<=============user thread wait
TableScannerAsync::m_mutex


2. scanner thread

void TableScannerAsync::handle_result(int scanner_id, EventPtr
&event, bool is_create) {

bool cancelled = is_cancelled();
ScopedLock lock(m_mutex);<============scanner thread got
TableScannerAsync::m_mutex
ScanCellsPtr cells;

. . . . . .
maybe_callback_ok();<================call m_cb->scan_ok(this,
cells);

}
//////////////////////////////
class IndexScannerCallback : public ResultCallback {

virtual void scan_ok(TableScannerAsync *scanner, ScanCellsPtr
&scancells) {
bool is_eos = scancells->get_eos();
String table_name = scanner->get_table_name();

ScopedLock lock(m_mutex);//dead lock 2<============scanner
thread wait IndexScannerCallback::m_mutex

Doug Judd

unread,
Apr 24, 2012, 1:33:04 PM4/24/12
to hyperta...@googlegroups.com
Thanks for posting this.  I've filed issue 827 to track it.  We should be able to turn around a fix fairly quickly.  If you'd like us to build you a package with the fix, let us know and we're happy to do that for you.

- Doug


--
You received this message because you are subscribed to the Google Groups "Hypertable Development" group.
To post to this group, send email to hyperta...@googlegroups.com.
To unsubscribe from this group, send email to hypertable-de...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.


Christoph Rupp

unread,
Apr 25, 2012, 12:56:02 PM4/25/12
to hyperta...@googlegroups.com
Hi,

thanks for the great bug report.

I am not able to reproduce this issue, but i think i came up with a fix. If you want to check out the sources then you can get them here:
https://github.com/cruppstahl/hypertable branch "v0.9.5"

This is the commit:
commit 2572b5dcb524e1c36dc23307c37784fd34c1bdde
Author: Christoph Rupp <ch...@crupp.de>
Date:   Wed Apr 25 18:54:11 2012 +0200

    issue 827: fixed deadlock when scanning secondary indices

And here's the diff:

diff --git a/src/cc/Hypertable/Lib/IndexScannerCallback.h b/src/cc/Hypertable/Li
index 70ffda7..1b37127 100644
--- a/src/cc/Hypertable/Lib/IndexScannerCallback.h
+++ b/src/cc/Hypertable/Lib/IndexScannerCallback.h
@@ -118,13 +118,12 @@ static String last;
     }
 
     virtual ~IndexScannerCallback() {
-      ScopedLock lock(m_mutex);
-      if (m_mutator)
-        delete m_mutator;

       foreach (TableScannerAsync *s, m_scanners)
         delete s;
       m_scanners.clear();
       sspecs_clear();
+      if (m_mutator)
+        delete m_mutator;

Can you please give it a try and see if this helps?

Thanks
Christoph

2012/4/24 gcc.lua <gcc...@gmail.com>

gcc.lua

unread,
Apr 26, 2012, 5:31:18 AM4/26/12
to Hypertable Development
Hi,

thanks to reply quickly, but the commit just remove m_mutex inside
virtual ~IndexScannerCallback() ,
I try it, will a new problem occured, see end of report,
some additional info about reproduce this issue before you commit

void run()
{
TableScannerPtr aScanner = tbSourcelist-
>create_scanner( specbuilder.get(), 5000 );

while( aScanner->next( gotCell ) )
{
....
if(condition)
break;//if have next result, now break, internel scanner
thread running
....
}
return;//trigger TableScanner destructor, next info see my first
post please
}

//////////////////////////////////////////////////////////////////////////////////////


pure virtual method called
terminate called without an active exception

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe6ff5700 (LWP 23887)]
0x00007ffff48db1b5 in raise () from /lib/libc.so.6


(gdb) where
#0 0x00007ffff48db1b5 in raise () from /lib/libc.so.6
#1 0x00007ffff48ddfc0 in abort () from /lib/libc.so.6
#2 0x00007ffff516fdc5 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/libstdc++.so.6
#3 0x00007ffff516e166 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007ffff516e193 in std::terminate() () from /usr/lib/libstdc+
+.so.6
#5 0x00007ffff516ea6f in __cxa_pure_virtual () from /usr/lib/libstdc+
+.so.6
#6 0x00000000005c43c6 in
Hypertable::TableScannerAsync::maybe_callback_ok
(this=0x7fffb432ecd0,
scanner_id=19373, next=true, do_callback=true, cells=...)
at
/root/qiao/Project/hypertable-0.9.5.6/src/cc/Hypertable/Lib/
TableScannerAsync.cc:520
#7 0x00000000005c393f in
Hypertable::TableScannerAsync::handle_result
(this=0x7fffb432ecd0, scanner_id=19373, event=..., is_create=true)
at
/root/qiao/Project/hypertable-0.9.5.6/src/cc/Hypertable/Lib/
TableScannerAsync.cc:464
#8 0x00000000005fdc5e in Hypertable::TableScannerHandler::run
(this=0x7fff99915850) at
/root/qiao/Project/hypertable-0.9.5.6/src/cc/Hypertable/Lib/
TableScannerHandler.cc:40
#9 0x000000000045f2c5 in
Hypertable::ApplicationQueue::Worker::operator() (this=0xaaa120) at
/root/qiao/Project/hypertable-0.9.5.6/src/cc/AsyncComm/
ApplicationQueue.h:173
#10 0x0000000000470f04 in
boost::detail::thread_data<Hypertable::ApplicationQueue::Worker>::run
(this=0xaa9ff0) at /usr/include/boost/thread/detail/thread.hpp:56
#11 0x00007ffff77b5200 in thread_proxy () from
/usr/lib/libboost_thread.so.1.42.0
#12 0x00007ffff79c58ca in start_thread () from /lib/libpthread.so.0
#13 0x00007ffff497892d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

On 4月26日, 上午12时56分, Christoph Rupp <ch...@hypertable.com> wrote:
> Hi,
>
> thanks for the great bug report.
>
> I am not able to reproduce this issue, but i think i came up with a fix. If
> you want to check out the sources then you can get them here:https://github.com/cruppstahl/hypertablebranch "v0.9.5"
> 2012/4/24 gcc.lua <gcc....@gmail.com>

Christoph Rupp

unread,
May 9, 2012, 6:21:10 AM5/9/12
to hyperta...@googlegroups.com
Sorry for the delay - i was finally able to reproduce it and i also fixed it.

The commit is a bit larger than my first try.

https://github.com/cruppstahl/hypertable/commits/v0.9.5

commit b45ba15b701373c3a1f689f8997f31bde8ff5165

Author: Christoph Rupp <ch...@crupp.de>
Date:   Wed Apr 25 18:54:11 2012 +0200

    issue 827: fixed deadlock when scanning secondary indices

Thanks again for your great help!

Best regards
Christoph

2012/4/26 gcc.lua <gcc...@gmail.com>

BigQiao

unread,
Aug 8, 2012, 9:24:04 PM8/8/12
to hyperta...@googlegroups.com, ch...@hypertable.com
This deadlock still exists in 0.9.6.0,   when delete a TableScanner

a TableScanner destructor  lock IndexScannerCallback then TableScannerAsync
a Database Working Thread lock TableScannerAsync then IndexScannerCallback

Thread 14 (Thread 0x7fffee266700 (LWP 10936)):      //Database Working Thread
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007ffff79c8179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007ffff79c7f9b in __pthread_mutex_lock (mutex=0xc08630) at pthread_mutex_lock.c:61
#3  0x0000000000477886 in boost::mutex::lock (this=0xc08630) at /usr/include/boost/thread/pthread/mutex.hpp:50
#4  0x000000000047e790 in boost::unique_lock<boost::mutex>::lock (this=0x7fffee2638e0) at /usr/include/boost/thread/locks.hpp:349
#5  0x000000000047d51d in unique_lock (this=0x7fffee2638e0, m_=...) at /usr/include/boost/thread/locks.hpp:227
#6  0x00000000005f0a07 in Hypertable::IndexScannerCallback::scan_ok(Hypertable::TableScannerAsync*, boost::intrusive_ptr<Hypertable::ScanCells>&) ()
#7  0x00000000005ed180 in Hypertable::TableScannerAsync::maybe_callback_ok (this=0x10e7b50, scanner_id=1, next=true, do_callback=true, cells=...)
    at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:522
#8  0x00000000005ec5cc in Hypertable::TableScannerAsync::handle_result (this=0x10e7b50, scanner_id=1, event=..., is_create=true)
    at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:459
#9  0x00000000006286d2 in Hypertable::TableScannerHandler::run (this=0x7fffe8049e30) at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerHandler.cc:40
#10 0x000000000047b625 in Hypertable::ApplicationQueue::Worker::operator()() ()
#11 0x000000000048dbd2 in boost::detail::thread_data<Hypertable::ApplicationQueue::Worker>::run() ()
#12 0x00007ffff77b5200 in thread_proxy () from /usr/lib/libboost_thread.so.1.42.0
#13 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007ffff4978b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()


Thread 27 (Thread 0x7fffe33ee700 (LWP 10949)):         //TableScanner Destructor Thread
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000047d87f in boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> > (this=0x10e7c18, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:84
#2  0x00000000005ed224 in Hypertable::TableScannerAsync::wait_for_completion (this=0x10e7b50)
    at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:535
#3  0x00000000005eb370 in ~TableScannerAsync (this=0x10e7b50, __in_chrg=<value optimized out>)
    at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:318
#4  0x00000000005f01d3 in Hypertable::IndexScannerCallback::~IndexScannerCallback() ()
#5  0x00000000005eb579 in ~TableScannerAsync (this=0xc04ca0, __in_chrg=<value optimized out>)
    at /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:324
#6  0x0000000000444847 in Hypertable::intrusive_ptr_release (rc=0xc04ca0) at /opt/hypertable/0.9.6.0/include/Common/ReferenceCount.h:73
#7  0x00000000005e70e3 in boost::intrusive_ptr<Hypertable::TableScannerAsync>::~intrusive_ptr() ()
#8  0x00000000005e6cf3 in Hypertable::TableScanner::~TableScanner() ()
#9  0x000000000043c943 in DBRecycled::run (this=0xa95c60) at /home/qiao/Project/Bingo/DistributedSpider/DBRecycled.cpp:48
#10 0x000000000046eed7 in thread_proc (param=0x7fffe805ae00) at /home/qiao/Project/Bingo/DistributedSpider/shared/Threading/ThreadPool.cpp:331
#11 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#12 0x00007ffff4978b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()

Doug Judd

unread,
Aug 9, 2012, 2:55:26 PM8/9/12
to hyperta...@googlegroups.com, ch...@hypertable.com
We'll try to track it down.  I've re-opened issue 827 to track it.

- Doug

To view this discussion on the web visit https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J.

To post to this group, send email to hyperta...@googlegroups.com.
To unsubscribe from this group, send email to hypertable-de...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.



--
Doug Judd
CEO, Hypertable Inc.

Christoph Rupp

unread,
Aug 10, 2012, 6:56:38 AM8/10/12
to hyperta...@googlegroups.com
Hi,

i am looking into this...

as far as i remember, the original problem from last April was caused by a scanner which was scanning a large dataset, and then the scanner was deleted (but results were still outstanding). Is this still the same problem?

Thanks
Christph

2012/8/9 BigQiao <zhen...@gmail.com>
To view this discussion on the web visit https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J.

Christoph Rupp

unread,
Aug 10, 2012, 10:07:27 AM8/10/12
to hyperta...@googlegroups.com
ok, i can reproduce it... will work on a fix till next tuesday/wednesday.

Thanks
Christoph

2012/8/9 BigQiao <zhen...@gmail.com>
To view this discussion on the web visit https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J.

Christoph Rupp

unread,
Aug 13, 2012, 4:12:11 PM8/13/12
to hyperta...@googlegroups.com
Fix is available (it includes a regression test):

https://github.com/cruppstahl/hypertable/commit/614a7d5e34c254ffa77f8d29b456866e8d71bbec

Thanks
Christoph

2012/8/10 Christoph Rupp <ch...@hypertable.com>
Reply all
Reply to author
Forward
0 new messages