mongocxx r3.2.0-rc1 and mongo v3.4.2

280 views
Skip to first unread message

Peter Saunderson

unread,
Jan 3, 2018, 4:43:22 PM1/3/18
to mongodb-user
I am using mongo on a number of machines and get different results on each machine and at different times.  Please help determine if this is a bug or a coding problem.

One of the failures is an exception thrown in collection::bulk_write(const class bulk_write& bulk_write) during the deletion of an entry.  The exception - throw_exception<bulk_write_exception>(reply.steal(), error); the message received reported by gdb is "Unknown option to delete command: dfletes: generic server error"

Now the strange thing is that the exact same test code run on the same or a different machine with the same mongo server does not cause a crash.  My code is as follows:

    std::unique_ptr<mongocxx::client, std::function<void (mongocxx::client*)> > conn = pool->acquire();
    std
::string uuid = mongo_uri_to_uuid(dataUri);
    std
::string collection = mongo_uri_to_collectionid(dataUri);
    mongocxx
::collection dbCollection = (*conn)[dbName][collection];
   
auto rtn = dbCollection.delete_many(filter.view());

It seems that this might be a temporary failure, in which case I  can simply wrap my code in a try{} block and try again.  However it would be good to understand why the failure occurs.

The annoying thing is that sometimes when I run my code with gdb on one system I dont get any failures, but if I run it with release builds of mongo c / mongo cxx drivers then it fails so this makes it rather difficult to determine what all the failures are caused by.

Notice that this is just one of the failures that I am getting.  I have wrapped some read and write operations in try {} blocks and just retry 3 times before giving up.. but this does not seem like a good solution in general.. better fix the underlying problem.

Thanks in advance,

Peter.

My setup is as follows:
database v3.4.2 - reported by mongo clients db.version().. however my administator informs me that we are using mongo v3.2.11...
mongo-c driver v 1.9.0
mongo-cxx driver git checkout @ fc9a44325e just after r3.2.0-rc1

however I have used various matching versions of mongo-c and mongo-cxx still with the same inconsistent / unreliable results.



Andrew Morrow

unread,
Jan 5, 2018, 11:24:43 AM1/5/18
to mongod...@googlegroups.com
On Wed, Jan 3, 2018 at 8:02 AM, Peter Saunderson <pet...@gmail.com> wrote:
I am using mongo on a number of machines and get different results on each machine and at different times.  Please help determine if this is a bug or a coding problem.

One of the failures is an exception thrown in collection::bulk_write(const class bulk_write& bulk_write) during the deletion of an entry.  The exception - throw_exception<bulk_write_exception>(reply.steal(), error); the message received reported by gdb is "Unknown option to delete command: dfletes: generic server error"

Is 'dfletes' a typo, or is that the actual error you got back? Because that looks pretty weird.

 

Now the strange thing is that the exact same test code run on the same or a different machine with the same mongo server does not cause a crash.  My code is as follows:

    std::unique_ptr<mongocxx::client, std::function<void (mongocxx::client*)> > conn = pool->acquire();
    std
::string uuid = mongo_uri_to_uuid(dataUri);
    std
::string collection = mongo_uri_to_collectionid(dataUri);
    mongocxx
::collection dbCollection = (*conn)[dbName][collection];
   
auto rtn = dbCollection.delete_many(filter.view());


Can you show a more complete example? Ideally, one showing the whole liftetime of the objects, including the creation of 'filter'?


 
It seems that this might be a temporary failure, in which case I  can simply wrap my code in a try{} block and try again.  However it would be good to understand why the failure occurs.

The annoying thing is that sometimes when I run my code with gdb on one system I dont get any failures, but if I run it with release builds of mongo c / mongo cxx drivers then it fails so this makes it rather difficult to determine what all the failures are caused by.

Notice that this is just one of the failures that I am getting.  I have wrapped some read and write operations in try {} blocks and just retry 3 times before giving up.. but this does not seem like a good solution in general.. better fix the underlying problem.

Thanks in advance,

Peter.

My setup is as follows:
database v3.4.2 - reported by mongo clients db.version().. however my administator informs me that we are using mongo v3.2.11...
mongo-c driver v 1.9.0
mongo-cxx driver git checkout @ fc9a44325e just after r3.2.0-rc1

however I have used various matching versions of mongo-c and mongo-cxx still with the same inconsistent / unreliable results.



Thanks,
Andrew

Peter Saunderson

unread,
Jan 6, 2018, 5:27:23 PM1/6/18
to mongodb-user
Thanks for your interest.. yes I cut and paste the exception message so dflete was in the message.  Actually the filter is nothing special or complicated.  The pool I hold in a shared pointer so that object will persist after the delete call... also don't forget than in the right conditions I can complete the test with no problems or exceptions caused and in that situation the test will succeed many times without failure.  I also previously had single read/ write/ delete working without the pool, however now my test runs several read / write/ delete cycles (only about 20 or so) over several minutes so I use the pool to manage the connections.

Anyway I have something like
 
    bsoncxx::builder::stream::document;
    filter
<< "entry" << bsoncxx::types::b_utf8{"simpletest"};


    std
::unique_ptr<mongocxx::client, std::function<void (mongocxx::client*)> > conn = pool->acquire();
    std
::string uuid = mongo_uri_to_uuid(dataUri);
    std
::string collection = mongo_uri_to_collectionid(dataUri);
    mongocxx
::collection dbCollection = (*conn)[dbName][collection];
   
auto rtn = dbCollection.delete_many(filter.view());


On another occasion that I managed to investigate I got a segment fault from a call to collection find

    mongocxx::cursor cursor = collection.find(filter.view());.
I was using gdb and got the following stack

(gdb) info stack
#0  0xffffffffffffffff in ?? ()
#1  0x00007ffff7bbd21b in std::unique_ptr<unsigned char [], void (*)(unsigned char*)>::~unique_ptr
   
(this=0x7fffffffd438, __in_chrg=<optimized out>) at /usr/include/c++/5/bits/unique_ptr.h:484
#2  0x00007ffff4ad3734 in bsoncxx::v_noabi::document::value::~value (this=0x7fffffffd438,
    __in_chrg
=<optimized out>)
    at
/root/build_usr_lib/mongo-cxx-driver/src/bsoncxx/document/value.hpp:33
#3  0x00007ffff4ad3a24 in core::v1::impl::storage<bsoncxx::v_noabi::document::value, false>::~storage (this=0x7fffffffd438, __in_chrg=<optimized out>)
    at
/root/build_usr_lib/buildmcxx/src/bsoncxx/third_party/EP_mnmlstc_core-prefix/src/EP_mnmlstc_core/include/core/optional.hpp:81
#4  0x00007ffff4ad369a in core::v1::optional<bsoncxx::v_noabi::document::value>::~optional (
   
this=0x7fffffffd438, __in_chrg=<optimized out>)
    at
/root/build_usr_lib/buildmcxx/src/bsoncxx/third_party/EP_mnmlstc_core-prefix/src/EP_mnmlstc_core/include/core/optional.hpp:194
#5  0x00007ffff4ad36b6 in bsoncxx::v_noabi::view_or_value<bsoncxx::v_noabi::document::view, bsoncxx::v_noabi::document::value>::~view_or_value (this=0x7fffffffd438, __in_chrg=<optimized out>)
    at
/root/build_usr_lib/mongo-cxx-driver/src/bsoncxx/view_or_value.hpp:30
#6  0x00007ffff4ae1706 in core::v1::impl::storage<bsoncxx::v_noabi::view_or_value<bsoncxx::v_noabi::document::view, bsoncxx::v_noabi::document::value>, false>::~storage (this=0x7fffffffd438,
    __in_chrg
=<optimized out>)
    at
/root/build_usr_lib/buildmcxx/src/bsoncxx/third_party/EP_mnmlstc_core-prefix/src/EP_mnmlstc_core/include/core/optional.hpp:81
#7  0x00007ffff4ae0004 in core::v1::optional<bsoncxx::v_noabi::view_or_value<bsoncxx::v_noabi::document::view, bsoncxx::v_noabi::document::value> >::~optional (this=0x7fffffffd438,
    __in_chrg
=<optimized out>)
    at
/root/build_usr_lib/buildmcxx/src/bsoncxx/third_party/EP_mnmlstc_core-prefix/src/EP_mnmlstc_core/include/core/optional.hpp:194
#8  0x00007ffff4ae009a in mongocxx::v_noabi::options::find::~find (this=0x7fffffffd1d0,
    __in_chrg
=<optimized out>)
    at
/root/build_usr_lib/mongo-cxx-driver/src/mongocxx/options/find.hpp:36
#9  0x00007ffff4ada7a5 in mongocxx::v_noabi::collection::find (this=0x7fffffffd560, filter=...,
    options
=...) at /root/build_usr_lib/mongo-cxx-driver/src/mongocxx/collection.cpp:352
#10 0x00007ffff6c3dc12 in MongoFetch (pool=std::shared_ptr (count 2, weak 1) 0x6af3e0,
    record
="simpletest") at src/MongoFetch.cpp:54
#11 0x0000000000422605 in main (argc=18, argv=0x7fffffffe4b8) at src/main.cpp:208

/root/build_usr_lib/mongo-c-driver/src/libbson/src/bson/bson.c:1977 bson_init_static(): precondition failed: data

Thread 1 "mongotest" received signal SIGABRT, Aborted.

(gdb) list
1972    {
ngo1973       bson_impl_alloc_t *impl = (bson_impl_alloc_t *) bson;
1974       uint32_t len_le;
1975    
1976       BSON_ASSERT (bson);
1977       BSON_ASSERT (data);
1978    
1979       if ((length < 5) || (length > INT_MAX)) {
1980          return false;
1981       }
(gdb) info args
bson
= 0x7fffffffcc00
data
= 0x0
length
= 0

Is there a test case that I can run from the mongo-cxx driver that will confirm that the driver is ok?  For now my try again 3 times quick fix seems to catch some of the failure and on re-try the re-try works.

So far I have tried mongodb 3.2.11 and mongodb 3.4.2 with mongo-c driver 1.9.0, and 1.6.3 and mongo cxx 3.1.1 and 3.1.3 and 3.2.0-rc1 but have not yet found a stable combination.

Andrew Morrow

unread,
Jan 6, 2018, 8:12:12 PM1/6/18
to mongod...@googlegroups.com

These errors feel to me like memory corruption. Can you try running your program under valgrind?

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/4711847d-4355-4617-9c4e-eccc5e6dc238%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Peter Saunderson

unread,
Jan 7, 2018, 11:51:57 AM1/7/18
to mongodb-user
Thank you for the tip.. running with valgrind has at least given me some more areas to look at.  I did a quick run of one of my tests that previously crashed with
*** Error in `local/bin/mvect_test': double free or corruption (!prev): 0x00000000015139c0 ***

Under valgrind I do not get a crash and the test runs to completion and the output of valgrind shows up several possible false positive failures:

==5829== Invalid read of size 8
==5829==    at 0x6AF041D: mongoc_counter_clients_active_add (mongoc-counters.defs:44)
==5829==    by 0x6AF0431: mongoc_counter_clients_active_inc (mongoc-counters.defs:44)
==5829==    by 0x6AF1A74: _mongoc_client_new_from_uri (mongoc-client.c:771)
==5829==    by 0x6AF4FAC: mongoc_client_pool_pop (mongoc-client-pool.c:212)
==5829==    by 0x5144C0F: mongocxx::v_noabi::pool::acquire() (pool.cpp:62)
...
==5829==  Address 0x9e570d0 is 5,232 bytes inside a block of size 5,952 free'd
==5829==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6D8BB33: bson_free (bson-memory.c:218)
==5829==    by 0x6B00F02: _mongoc_counters_cleanup (mongoc-counters.c:129)

==5829== Invalid read of size 8
==5829==    at 0x6B2A6DD: mongoc_counter_dns_success_add (mongoc-counters.defs:67)
==5829==    by 0x6B2A6F1: mongoc_counter_dns_success_inc (mongoc-counters.defs:67)
==5829==    by 0x6B2B431: mongoc_topology_scanner_node_connect_tcp (mongoc-topology-scanner.c:480)
==5829==    by 0x6B2B83F: mongoc_topology_scanner_node_setup (mongoc-topology-scanner.c:608)
...
==5829==  Address 0x9e572b8 is 5,720 bytes inside a block of size 5,952 free'd
==5829==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6D8BB33: bson_free (bson-memory.c:218)
==5829==    by 0x6B00F02: _mongoc_counters_cleanup (mongoc-counters.c:129)
==5829==    by 0x6B0CB8E: _mongoc_do_cleanup (mongoc-init.c:169)

Enter code here...==5829== Invalid read of size 1
==5829==    at 0x4C30F62: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6B11735: _mongoc_handshake_build_doc_with_application (mongoc-handshake.c:413)
==5829==    by 0x6B2A7C2: _build_ismaster_with_handshake (mongoc-topology-scanner.c:88)
==5829==    by 0x6B2A959: _mongoc_topology_scanner_get_ismaster (mongoc-topology-scanner.c:115)
==5829==    by 0x6B2AA06: _begin_ismaster_cmd (mongoc-topology-scanner.c:140)
==5829==    by 0x6B2BA69: mongoc_topology_scanner_start (mongoc-topology-scanner.c:689)
....
==5829==  Address 0x9e58e40 is 0 bytes inside a block of size 32 free'd
==5829==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6D8BB33: bson_free (bson-memory.c:218)
==5829==    by 0x6B113A4: _free_driver_info (mongoc-handshake.c:300)
==5829==    by 0x6B1153B: _mongoc_handshake_cleanup (mongoc-handshake.c:360)
==5829==    by 0x6B0CB93: _mongoc_do_cleanup (mongoc-init.c:171)

==5829== Invalid read of size 8
==5829==    at 0x6AF567B: mongoc_counter_op_ingress_total_add (mongoc-counters.defs:19)
==5829==    by 0x6AF568F: mongoc_counter_op_ingress_total_inc (mongoc-counters.defs:19)
==5829==    by 0x6AF5B99: _mongoc_cluster_inc_ingress_rpc (mongoc-cluster.c:166)
==5829==    by 0x6AF6A36: mongoc_cluster_run_command_internal (mongoc-cluster.c:511)
==5829==    by 0x6AF6DAC: mongoc_cluster_run_command_private (mongoc-cluster.c:642)
....
==5829==  Address 0x9e56ea8 is 4,680 bytes inside a block of size 5,952 free'd
==5829==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6D8BB33: bson_free (bson-memory.c:218)
==5829==    by 0x6B00F02: _mongoc_counters_cleanup (mongoc-counters.c:129)
==5829==    by 0x6B0CB8E: _mongoc_do_cleanup (mongoc-init.c:169)
==5829==    by 0x64E3A98: __pthread_once_slow (pthread_once.c:116)

==5829== 80 errors in context 39 of 39:
==5829== Invalid read of size 1
==5829==    at 0x4C30F74: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6B11591: _append_platform_field (mongoc-handshake.c:387)
==5829==    by 0x6B119C7: _mongoc_handshake_build_doc_with_application (mongoc-handshake.c:444)
==5829==    by 0x6B2A7C2: _build_ismaster_with_handshake (mongoc-topology-scanner.c:88)
==5829==    by 0x6B2A959: _mongoc_topology_scanner_get_ismaster (mongoc-topology-scanner.c:115)
==5829==    by 0x6B2AA06: _begin_ismaster_cmd (mongoc-topology-scanner.c:140)
....
==5829==  Address 0x9e58d21 is 1 bytes inside a block of size 128 free'd
==5829==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5829==    by 0x6D8BB33: bson_free (bson-memory.c:218)
==5829==    by 0x6B114E3: _free_platform_string (mongoc-handshake.c:343)
==5829==    by 0x6B11548: _mongoc_handshake_cleanup (mongoc-handshake.c:361)
==5829==    by 0x6B0CB93: _mongoc_do_cleanup (mongoc-init.c:171)
==5829==    by 0x64E3A98: __pthread_once_slow (pthread_once.c:116)
==5829==    by 0x6B0CBAD: mongoc_cleanup (mongoc-init.c:180)
==5829==    by 0x512D30F: mongocxx::v_noabi::instance::impl::~impl() (instance.cpp:115)

==5829== ERROR SUMMARY: 512 errors from 39 contexts (suppressed: 0 from 0)

I will try this out on a number of my systems with problems and see if there is any common issue that I can look at / fix..  One area that I will look at is counters because if a counter has been free'd then incremented it might explain problems like "dflete" where "e" -> "f" ie add one!

Peter.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Andrew Morrow

unread,
Jan 7, 2018, 6:13:04 PM1/7/18
to mongod...@googlegroups.com

I find this line in the valgrind output highly suspicious:

mongocxx::v_noabi::instance::impl::~impl

That means that your mongocxx::instance object has been destroyed, after which it is definitely illegal to make any further calls into the driver. Can you set a breakpoint inside the instance dtor and see if you hit it during your application execution? That would definitely indicate a logic error.

Thanks,
Andrew


To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Peter Saunderson

unread,
Jan 8, 2018, 7:20:40 AM1/8/18
to mongodb-user
Thanks for your help.  I have fixed one of my problems that was when I created my factory I forgot to keep the instance persistent as well as the pool.  I have fixed this by wrapping the pool and the instance in a separate object that I create in a factory.  My simple test now works and I will apply the same changes to my application.

Thanks again for you help.

Peter.

Andrew Morrow

unread,
Jan 8, 2018, 9:11:22 AM1/8/18
to mongod...@googlegroups.com

Happy to hear you got it sorted out. FYI if you want to see one idea about how to keep the pool and instance associated, see:


To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
Reply all
Reply to author
Forward
0 new messages