Possible start-up race condition for C driver leading to segfault

39 прегледа
Пређи на прву непрочитану поруку

Robin Mahony

непрочитано,
29. 4. 2015. 18:10:3429.4.15.
– cpp-dri...@lists.datastax.com
Hi guys,

So we got a crash using the driver recently. The line below with the calls to std::string::empty() crash; I suspect because it is dereferencing something that doesn't exist.

The situation is we are attempting to connect to Cassandra using the driver while the Cassandra node is bootstrapping (after a fresh install). I suspect that the cause is a race where the system tables don't exist yet when the driver tries to read them. Seems like this should be something that is handled more gracefully than to segfault.

We have a potential workaround, but still think this should be investigated and fixed on the driver end. I have attached our system.log and the core dump is pasted in below.

Cheers,

Robin


void DCAwarePolicy::init(const SharedRefPtr<Host>& connected_host, const HostMap& hosts) {
if (local_dc_.empty() && !connected_host->dc().empty()) {
LOG_INFO("Using '%s' for the local data center "
"(if this is incorrect, please provide the correct data center)",
connected_host->dc().c_str());
local_dc_ = connected_host->dc();
}

for (HostMap::const_iterator i = hosts.begin(),
end = hosts.end(); i != end; ++i) {
on_add(i->second);
}
}


Driver Version: 1.0.1
Cassandra version: ReleaseVersion: 2.0.12.274

#0 0x00007f0b21ecab55 in raise () from /lib64/libc.so.6
#1 0x00007f0b21ecc131 in abort () from /lib64/libc.so.6
#2 0x0000000000b30bed in OSL_Debug_Halt () at libs/osl/OSL_Debug.cc:92
#3 0x000000000080833f in Signal_FatalHandler (SigNum=<optimized out>) at modules/SignalHandler_Module/SignalHandler_Module.cc:203
#4 <signal handler called>
#5 0x00007f0b2275ca40 in std::string::empty() const () from /usr/lib64/libstdc++.so.6
#6 0x00007f0b232c2535 in cass::DCAwarePolicy::init(cass::SharedRefPtr<cass::Host> const&, std::map<cass::Address, cass::SharedRefPtr<cass::Host>, std::less<cass::Address>, std::allocator<std::pair<cass::Address const, cass::SharedRe
fPtr<cass::Host> > > > const&) () from /usr/lib64/libcassandra.so.1
#7 0x00007f0b2326130b in cass::ChainedLoadBalancingPolicy::init(cass::SharedRefPtr<cass::Host> const&, std::map<cass::Address, cass::SharedRefPtr<cass::Host>, std::less<cass::Address>, std::allocator<std::pair<cass::Address const, c
ass::SharedRefPtr<cass::Host> > > > const&) () from /usr/lib64/libcassandra.so.1
#8 0x00007f0b23284239 in cass::Session::on_control_connection_ready() () from /usr/lib64/libcassandra.so.1
#9 0x00007f0b2328ccb8 in cass::ControlConnection::on_query_meta_all(std::vector<cass::Response*, std::allocator<cass::Response*> > const&) () from /usr/lib64/libcassandra.so.1
#10 0x00007f0b23296236 in boost::_mfi::mf1<void, cass::ControlConnection, std::vector<cass::Response*, std::allocator<cass::Response*> > const&>::operator()(cass::ControlConnection*, std::vector<cass::Response*, std::allocator<cass::
Response*> > const&) const () from /usr/lib64/libcassandra.so.1
#11 0x00007f0b23294fec in void boost::_bi::list2<boost::_bi::value<cass::ControlConnection*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, cass::ControlConnection, std::vector<cass::Response*, std::allocator<cass::Response*> >
const&>, boost::_bi::list1<std::vector<cass::Response*, std::allocator<cass::Response*> > const&> >(boost::_bi::type<void>, boost::_mfi::mf1<void, cass::ControlConnection, std::vector<cass::Response*, std::allocator<cass::Response*>
> const&>&, boost::_bi::list1<std::vector<cass::Response*, std::allocator<cass::Response*> > const&>&, int) () from /usr/lib64/libcassandra.so.1
#12 0x00007f0b232943e8 in void boost::_bi::bind_t<void, boost::_mfi::mf1<void, cass::ControlConnection, std::vector<cass::Response*, std::allocator<cass::Response*> > const&>, boost::_bi::list2<boost::_bi::value<cass::ControlConnecti
on*>, boost::arg<1> > >::operator()<std::vector<cass::Response*, std::allocator<cass::Response*> > >(std::vector<cass::Response*, std::allocator<cass::Response*> > const&) () from /usr/lib64/libcassandra.so.1
#13 0x00007f0b2329366f in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, cass::ControlConnection, std::vector<cass::Response*, std::allocator<cass::Response*> > const&>, boost::_bi
::list2<boost::_bi::value<cass::ControlConnection*>, boost::arg<1> > >, void, std::vector<cass::Response*, std::allocator<cass::Response*> > const&>::invoke(boost::detail::function::function_buffer&, std::vector<cass::Response*, std:
:allocator<cass::Response*> > const&) () from /usr/lib64/libcassandra.so.1
#14 0x00007f0b23291995 in boost::function1<void, std::vector<cass::Response*, std::allocator<cass::Response*> > const&>::operator()(std::vector<cass::Response*, std::allocator<cass::Response*> > const&) const ()
from /usr/lib64/libcassandra.so.1
#15 0x00007f0b2328f79c in cass::ControlConnection::ControlMultipleRequestHandler::on_set(std::vector<cass::Response*, std::allocator<cass::Response*> > const&) () from /usr/lib64/libcassandra.so.1
#16 0x00007f0b23264163 in cass::MultipleRequestHandler::InternalHandler::on_set(cass::ResponseMessage*) () from /usr/lib64/libcassandra.so.1
#17 0x00007f0b232c8dcf in cass::Connection::consume(char*, unsigned long) () from /usr/lib64/libcassandra.so.1
#18 0x00007f0b232c9666 in cass::Connection::on_read(uv_stream_s*, long, uv_buf_t const*) () from /usr/lib64/libcassandra.so.1
#19 0x00007f0b1fe518f1 in ?? () from /usr/lib64/libuv.so.1
#20 0x00007f0b1fe51ea8 in ?? () from /usr/lib64/libuv.so.1
#21 0x00007f0b1fe5609a in ?? () from /usr/lib64/libuv.so.1
#22 0x00007f0b1fe49865 in uv_run () from /usr/lib64/libuv.so.1
#23 0x00007f0b23285dc6 in cass::LoopThread::on_run_internal(void*) () from /usr/lib64/libcassandra.so.1
#24 0x00007f0b1fe52ee0 in ?? () from /usr/lib64/libuv.so.1
#25 0x00007f0b257ce7b6 in start_thread () from /lib64/libpthread.so.0
#26 0x00007f0b21f71d6d in clone () from /lib64/libc.so.6
system.log

Michael Penick

непрочитано,
29. 4. 2015. 18:41:3529.4.15.
– cpp-dri...@lists.datastax.com
Thanks for the report. I'll investigate how this happened and report back what I find. 

Could you share a small snippet of code that reproduces the issue? How often does the issue happen?

Mike

To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Michael Penick

непрочитано,
29. 4. 2015. 19:05:1129.4.15.
– cpp-dri...@lists.datastax.com
You're right if the "system.local" and "system.peers" are empty when connecting this reproduces the issue because the connected host is null.

What version of Cassandra are you using?

Mike

Robin Mahony

непрочитано,
29. 4. 2015. 19:13:0729.4.15.
– cpp-dri...@lists.datastax.com
Cassandra version 2.0.12.274

Michael Penick

непрочитано,
30. 4. 2015. 11:56:0130.4.15.
– cpp-dri...@lists.datastax.com
Thanks for the information. I created a ticket (w/ patch) to track this: https://datastax-oss.atlassian.net/browse/CPP-257

Mike
Одговори свима
Одговори аутору
Проследи
0 нових порука