kj/table exception "inserted row already exists in table"

100 views
Skip to first unread message

cenkogu...@gmail.com

unread,
Oct 1, 2019, 12:10:37 PM10/1/19
to Cap'n Proto
Thanks for this amazing software.

We are using v0.7.0. I would like to ask help debugging the following exception which we rarely but consistently get:

terminate called after throwing an instance of 'kj::ExceptionImpl'
  what():  kj/table.c++:44: failed: inserted row already exists in table
stack: 7f6f7f0697 7f6f7f1ee3 7f6f802623 7f6f5dc9ab 7f6f5b4823 7f77fdd5bf 557c6f2957 557c63df2b 7f78a30e13 7f78b0f087

Our backtrace shows that we were trying to read from a proto, then the following two functions in capnp were called:
  • kj::Table<kj::HashMap<unsigned int, kj::Own<capnp::_::SegmentReader> >::Entry, kj::HashIndex<kj::HashMap<unsigned int, kj::Own<capnp::_::SegmentReader> >::Callbacks> >::insert(kj::HashMap<unsigned int, kj::Own<capnp::_::SegmentReader> >::Entry&&)
  • capnp::_::ReaderArena::tryGetSegment(kj::Id<unsigned int, capnp::_::Segment>)
Why would reading from proto trigger an insert call?

How can I make use of the "stack: 7f6f7f0697 7f6f7f1ee3..." to debug this further?

If the way we read proto is OK, can this perhaps be caused by how we populate the proto contents? Perhaps some missing or corrupt members?

Thank you very much for help in advance,
Oguz

Kenton Varda

unread,
Oct 1, 2019, 12:30:42 PM10/1/19
to cenkogu...@gmail.com, Cap'n Proto
Hi Oguz,

You can get better stack traces by compiling in debug mode (both Cap'n Proto itself, and your project). You should then see a symbolic trace instead of a bunch of addresses.

This is a strange error, though. Looking at the code for ReaderArena::tryGetSegment(), the insert() call only happens after a find() call looking for the same key has failed. How could the inserted row already exist, then?

Moreover, the whole sequence is performed under a mutex lock, seemingly ruling out any race conditions.

I'm not sure what to say here. If you can come up with a self-contained test case that reproduces the issue, I'd be happy to debug.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/d6512e2e-b990-47c6-9892-a252c0c629c3%40googlegroups.com.

Cenk Oguz Saglam

unread,
Oct 1, 2019, 1:39:16 PM10/1/19
to Cap'n Proto
Thanks for the quick response Kenton.

I was also suspecting a race condition. Thanks for checking the mutex. It is very likely that the issue is due to our usage. I'll share what I find as I debug this further.
To unsubscribe from this group and stop receiving emails from it, send an email to capn...@googlegroups.com.

cenkogu...@gmail.com

unread,
Oct 5, 2019, 5:35:11 PM10/5/19
to Cap'n Proto
As an update, I've tried to place the following messages to c++/src/capnp/arena.c++:

   SegmentMap* segments = nullptr;
   KJ_IF_MAYBE(s, *lock) {
     KJ_IF_MAYBE(segment, s->find(id.value)) {
       return *segment;
     }
     segments = s;
+  } else {
+    size_t this_id = std::hash<std::thread::id>{}(std::this_thread::get_id());
+    KJ_DBG("map doesn't exist", this_id, this);
   }

It looks like (just before the crash) multiple threads print "map doesn't exist" for the same 'this' value. It's as if lock did not work for some reason. I could not reproduce the issue in a pure capnp test yet.

For context, we have the same type of message with 2 segments printed in a high frequency. We have a stack of them being read by multiple readers. Apart from the mentioned exception being thrown, we often have segfaults in the insert() function.

Kenton Varda

unread,
Oct 5, 2019, 7:16:32 PM10/5/19
to cenkogu...@gmail.com, Cap'n Proto
Hmm, that's strange!

What operating system is this?

If it happens to be Linux, could you try compiling everything with -DKJ_USE_FUTEX=0 (or remove the `#define KJ_USE_FUTEX 1` from the top of c++/src/kj/mutex.h), and see if that changes anything? This change will make KJ use a completely different mutex implementation. (That said, the futex-based implementation has seen very heavy use with no problems in the past, so it would be surprising if it were broken somehow.)

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/874ea12b-865c-4acb-9b11-74cd0154ee63%40googlegroups.com.

cenkogu...@gmail.com

unread,
Oct 7, 2019, 2:58:40 PM10/7/19
to Cap'n Proto
Thanks for helping me debug this.
  1. With KJ_USE_FUTEX=0 I'm hitting what initially seems to be a deadlock. I'll debug this further now.
  2. I don't have any crash if I put a static std::mutex in c++/src/capnp/arena.c++ and lock it using std::unique_lock right after moreSegments.lockExclusive();
  3. I still observe the issue if (instead of #2) I make std::unique_ptr<std::mutex> a member variable of ReaderArena and lock it before or after moreSegments.lockExclusive();
    • I have printed KJ_DBG(my_unique_lock.owns_lock(), my_thread_id, this); and interestingly, two threads claim they own the lock:
      • capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 12814521112356468066; this = 7fd7f800a478
      • capnp/arena.c++:150: debug: l.owns_lock() = true; this_id = 2054456097081703779; this = 7fd7f800a478
    • This is really weird :)
I'm on Ubuntu 18.04.3 LTS.

Kenton Varda

unread,
Oct 7, 2019, 3:16:42 PM10/7/19
to cenkogu...@gmail.com, Cap'n Proto
Could this be use-after-free? Maybe the MessageReader (and hence the mutex) is already deleted?

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/bbe9f2bd-24c9-4508-a640-bb92824a093f%40googlegroups.com.

cenkogu...@gmail.com

unread,
Oct 7, 2019, 8:38:09 PM10/7/19
to Cap'n Proto
great suggestion. I found that just before the crash, ReaderArena gets constructed twice at the same memory location. That would explain why locks don't work as expected. Looking at the capnp code, we were wondering whether allocatedArena variable of MessageReader needs to be protected for multi-threading.

Kenton Varda

unread,
Oct 8, 2019, 12:05:19 PM10/8/19
to cenkogu...@gmail.com, Cap'n Proto
Ah, hmm. So you are accessing the MesasgeReader itself (not just the objects in it) from multiple threads? It has been a long time since I wrote this code, but I suspect that wasn't intended to be supported. Normally thread-safe methods are marked `const`, and none of MessageReader's methods appear to be `const`. In contrast, the getter/setter methods on individual Reader objects are marked `const`, hence all the mutex-locking to make sure that changes they might trigger are thread-safe. But to get a Reader object initially, you have to call getRoot(), which initializes the arena and sets `allocatedArena = true`.

I admit this is not very intuitive. MessageReader *feels* like a read-only type and hence like it ought to be thread-safe. I think probably I should refactor MessageReader so that it loads the whole segment table upfront and doesn't do any silly mutex locking, though I think subclasses may need to be changed to support that, bleh.

As a simple work-around to fix your problem, I think you can do:

    messageReader.getRoot<capnp::AnyPointer>();

immediately after allocating the MessageReader, before it has been made visible to other threads.

A more-correct solution is to call getRoot() with the actual root type and then only share the returned struct Reader with other threads; don't share the MessageReader.

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/245fd14b-0061-4346-af50-3804baef15f5%40googlegroups.com.

cenkogu...@gmail.com

unread,
Oct 8, 2019, 1:06:31 PM10/8/19
to Cap'n Proto
Makes sense. Thank you very much for helping me debug Kenton :)
Reply all
Reply to author
Forward
0 new messages