It seems I found a bug in the driver.
When my test program is trying to 'connect' a big number of sessions from different threads the driver shows, "ERROR Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted' followed by SEGFAULT.
It does not crush when I switch my program to single threaded mode.
Unfortunately the core files are not informative. But it is 100% repeatable.
The corresponding logs and source code is below. Please take a look.
Thanks.
-Stan
Environment:
Cassandra server = 2.1.12
Cassandra CPP driver = 2.4
GCC = 4.8.1
OS = CentOS 7.2
CPU = Intel Xeon X5550
MULTITHREADED MODE LOG:
[user1@myhost Debug]$ ./sess1000
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
...
...
1467820507.837 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.837 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.840 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.842 [ERROR] (src/control_connection.cpp:250:virtual void cass::ControlConnection::on_close(cass::Connection*)): Unable to establish a control connection to host A.B.C.D because of the following error: Connect error 'operation not permitted'
Aborted (core dumped)
SINGLETHREAD MODE LOG:
[user1@myhost Debug]$ ./sess1000
Creating cassandra cluster...
Setting contact points...
Running : 1000 working threads which create Cassandra sessions...
1467818527.859 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467818528.486 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467818528.667 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
...
...
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
...
...
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Sessions have been created, this text never appears...
Deleting of the created connect futures...
Deleting of the created sessions...
[user1@myhost Debug]$./sess1000 2>&1 | grep 'Error initializing session' -c
809
--
You received this message because you are subscribed to the Google Groups "DataStax C++ Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.
You are right the application reaches the max number of open files.
I am using DC aware connection policy and there are 32 Cassandra instances in a data-center. My guest every driver's session has at least one TCP connection to each Cassandra instance.
I use a design when each working thread owns its own session for Cassandra requesting (one request at the time per thread). Sometimes the production load can reach 1000 requests per seconds or more which easily reaches the default Linux limit (8192).
So now I have to either redesign the application to share sessions between threads or just simply increase the number or open files. Would you give me you opinion on this?
BTW, I did some tests and found that sharing sessions from multiple threads also causes driver SEGFAULT!
Thanks.
-stan
I have created a test program that creates 1000 threads requesting Cassandra using 10 sessions. The source is attached.
It crushes after about a minute of working. Exception stack trace:
_ZNK4cass14ResultResponse4rowsEv, FP=7fb52e633bd0
_ZN4cass14ResultIteratorC2EPKNS_14ResultResponseE, FP=7fb52e633c00
cass_iterator_from_result, FP=7fb52e633c30
_ZZ4mainENKUlvE_clEv, FP=7fb52e633e80
_ZNSt12_Bind_simpleIFZ4mainEUlvE_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE, FP=7fb52e633ea0
_ZNSt12_Bind_simpleIFZ4mainEUlvE_vEEclEv, FP=7fb52e633ed0
_ZNSt6thread5_ImplISt12_Bind_simpleIFZ4mainEUlvE_vEEE6_M_runEv, FP=7fb52e633ef0
execute_native_thread_routine, FP=7fb52e633f10
start_thread, FP=7fb52e633fb0
__clone, FP=7fb52e633fb8
Is there any news regarding my report about crush in cass_iterator_from_result while sharing sessions among a number of threads?
Were you able to reproduce it?
Thanks
-stan
for (size_t i2 = 0; i2 < num_of_req_per_thr or num_of_req_per_thr == 0; ) {
stringstream err_stream;
CassFuture* result_future = cass_session_execute(*it_session, statement);
if (result_future) {
if (cass_future_error_code(result_future) != CASS_OK) {
err_stream << "Unable to run query: " << get_future_error_text(result_future);
continue; // <-- Add this.
}
#ifdef NO_THREADS
else {
cout << "Requset processed OK" << "\n";
}
#endif //NO_THREADS
const CassResult* result = cass_future_get_result(result_future);
CassIterator* rows = cass_iterator_from_result(result);
Mike