Creating of a big amount of sessions from different threads causes Segfault on CentOS7

174 views
Skip to first unread message

Stanislav Podlesny

unread,
Jul 6, 2016, 12:09:31 PM7/6/16
to DataStax C++ Driver for Apache Cassandra User Mailing List

Hi,

It seems I found a bug in the driver.

When my test program is trying to 'connect' a big number of sessions from different threads the driver shows, "ERROR Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted' followed by SEGFAULT.

It does not crush when I switch my program to single threaded mode.

Unfortunately the core files are not informative. But it is 100% repeatable.
The corresponding logs and source code is below. Please take a look.

Thanks.
-Stan

Environment:
Cassandra server = 2.1.12
Cassandra CPP driver = 2.4
GCC = 4.8.1
OS = CentOS 7.2
CPU = Intel Xeon X5550

MULTITHREADED MODE LOG:

[user1@myhost Debug]$ ./sess1000
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820504.803 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
...
...
1467820507.837 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.837 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.840 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467820507.842 [ERROR] (src/control_connection.cpp:250:virtual void cass::ControlConnection::on_close(cass::Connection*)): Unable to establish a control connection to host A.B.C.D because of the following error: Connect error 'operation not permitted'
Aborted (core dumped)


SINGLETHREAD MODE LOG:

[user1@myhost Debug]$ ./sess1000
Creating cassandra cluster...
Setting contact points...
Running : 1000 working threads which create Cassandra sessions...
1467818527.859 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467818528.486 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
1467818528.667 [WARN] (src/control_connection.cpp:229:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host A.B.C.D does not support protocol version 4. Trying protocol version 3...
...
...
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
1467818580.925 [ERROR] (src/pool.cpp:340:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to connect to host A.B.C.D because of the following error: Connect error 'operation not permitted'
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
...
...
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Session 'connect' error: Error initializing session
Sessions have been created, this text never appears...
Deleting of the created connect futures...
Deleting of the created sessions...

[user1@myhost Debug]$./sess1000 2>&1 | grep 'Error initializing session' -c
809

sess1000.cpp

Michael Penick

unread,
Jul 6, 2016, 1:38:15 PM7/6/16
to cpp-dri...@lists.datastax.com
Thanks for the report. From that error 'operation not permitted' it looks like you're probably running out of file descriptors and need to up the limit. The driver shouldn't segmentation fault when that happens though. I'm going to attempt to reproduce the issue later today and I'll let you know what I find.

As a stop gap, you can probably increase your file descriptor limit to avoid the crash. Please let me know if this isn't the case.

Mike


--
You received this message because you are subscribed to the Google Groups "DataStax C++ Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cpp-driver-us...@lists.datastax.com.

Stanislav Podlesny

unread,
Jul 7, 2016, 11:55:48 AM7/7/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Thank you, Mike, for the reply.

You are right the application reaches the max number of open files.

I am using DC aware connection policy and there are 32 Cassandra instances in a data-center. My guest every driver's session has at least one TCP connection to each Cassandra instance.

I use a design when each working thread owns its own session for Cassandra requesting (one request at the time per thread). Sometimes the production load can reach 1000 requests per seconds or more which easily reaches the default Linux limit (8192).

So now I have to either redesign the application to share sessions between threads or just simply increase the number or open files. Would you give me you opinion on this?

BTW, I did some tests and found that sharing sessions from multiple threads also causes driver SEGFAULT!

Thanks.
-stan

Michael Penick

unread,
Jul 7, 2016, 3:41:32 PM7/7/16
to cpp-dri...@lists.datastax.com
I would definitely recommend using a single session (or a small number of sessions) shared among multiple threads.

"BTW, I did some tests and found that sharing sessions from multiple threads also causes driver SEGFAULT!"  

If haven't run into any issues executing queries from multiple threads with a single session. The driver is specifically designed to do this. 

Do you have an reproducible example?

Mike

Message has been deleted
Message has been deleted

Stanislav Podlesny

unread,
Jul 8, 2016, 12:31:21 PM7/8/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Thanks for your advice.
Regarding segfault.

I have created a test program that creates 1000 threads requesting Cassandra using 10 sessions. The source is attached.
It crushes after about a minute of working. Exception stack trace:

_ZNK4cass14ResultResponse4rowsEv, FP=7fb52e633bd0
_ZN4cass14ResultIteratorC2EPKNS_14ResultResponseE, FP=7fb52e633c00
cass_iterator_from_result, FP=7fb52e633c30
_ZZ4mainENKUlvE_clEv, FP=7fb52e633e80
_ZNSt12_Bind_simpleIFZ4mainEUlvE_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE, FP=7fb52e633ea0
_ZNSt12_Bind_simpleIFZ4mainEUlvE_vEEclEv, FP=7fb52e633ed0
_ZNSt6thread5_ImplISt12_Bind_simpleIFZ4mainEUlvE_vEEE6_M_runEv, FP=7fb52e633ef0
execute_native_thread_routine, FP=7fb52e633f10
start_thread, FP=7fb52e633fb0
__clone, FP=7fb52e633fb8

fsmt.cpp

Michael Penick

unread,
Jul 8, 2016, 3:10:10 PM7/8/16
to cpp-dri...@lists.datastax.com
Thanks for the example. I'll work on reproducing this crash.

I've been able to reproduce the connecting sessions from multiple threads crash. The function `uv_signal_start()` has issues being called from multiple threads (even though it should be thread-safe as it locks using pipes). I'm just going to avoid using the libuv signal mechanism and I'm working on a fix now.

Mike

Michael Penick

unread,
Jul 8, 2016, 7:56:52 PM7/8/16
to cpp-dri...@lists.datastax.com
I was able to reproduce the connects from multiple threads with this example: https://gist.github.com/mpenick/9276a25e7795f26470d9bd8030ab7871

Here's the, somewhat tested, fix: https://github.com/datastax/cpp-driver/pull/304

Mike



Stanislav Podlesny

unread,
Jul 19, 2016, 12:22:17 PM7/19/16
to DataStax C++ Driver for Apache Cassandra User Mailing List
Hi Mike,

Is there any news regarding my report about crush in cass_iterator_from_result while sharing sessions among a number of threads?
Were you able to reproduce it?

Thanks
-stan

Michael Penick

unread,
Jul 20, 2016, 6:27:03 PM7/20/16
to cpp-dri...@lists.datastax.com
There's a bug in your code where you're calling "CassIterator* rows = cass_iterator_from_result(result);" with a NULL result object.

You can fix that error by continuing the loop when an error happens:

                for (size_t i2 = 0; i2 < num_of_req_per_thr or num_of_req_per_thr == 0; ) {

                    stringstream err_stream;

                    CassFuture* result_future = cass_session_execute(*it_session, statement);
                    if (result_future) {
                        if (cass_future_error_code(result_future) != CASS_OK) {
                            err_stream << "Unable to run query: " <<  get_future_error_text(result_future);
                            continue; // <-- Add this.
                        }
#ifdef NO_THREADS
                        else  {
                            cout << "Requset processed OK" << "\n";
                        }
#endif //NO_THREADS

                        const CassResult* result = cass_future_get_result(result_future);
                        CassIterator* rows = cass_iterator_from_result(result);


Mike

Reply all
Reply to author
Forward
0 new messages