TSAN faiure

48 views
Skip to first unread message

Vitali Lovich

unread,
Oct 9, 2020, 9:37:26 AM10/9/20
to Cap'n Proto
I'm trying to do a basic in-process connection of the servers & running that under TSAN but I feel like this is some nuance of the APIs I'm missing or using the API totally incorrectly outright, so I would appreciate some feedback if possible. I'm sure the bug must be in my code rather than cap'n'proto. I don't have a helpful standalone piece of code to demonstrate this issue at the moment (but if it's really critical I can work on providing that). I've provided brief snippets of what the threads do (the threads themselves are really that simple, it's just the schema & the threads using various helper internal libraries that make it harder to post a working standalone example).

Full TSAN (to avoid spamming the list): https://pastebin.com/hSkVDgsU

For context, the main thread does 
auto ioContext = kj::setupAsyncIo();
auto serverThread = ioContext.provider->newPipeThread(...);
auto serverPtr = <wait for CV state from T1>;
capnp::TwoPartyClient rpcClient(*serverThread.pipe);
auto client = rpcClient.bootstrap().castAs<MyServer>();
auto connectedResponse = client.connectRequest().send().wait(ioContext.waitScope); <-- this is the problematic TSAN line 

T1 (taking AsyncIoProvider&, AsyncIoStream& stream, kj::WaitScope& waitScope):
capnp::TwoPartyVatNetwork network(stream, capnp::rpc::twoparty::Side::SERVER);
auto myServer = kj::heap<MyServerImpl>();
auto myServerPtr = myServer.get()
auto rpcServer = capnp::makeRpcServer(network, kj::mv( myServer));
<publish myServerPtr to main thread using CV>  
network.onDisconnect().wait(waitScope); <- problematic TSAN line

SUMMARY: ThreadSanitizer: data race capnproto/c++/src/capnp/layout.c++:2188:5 in capnp::_::WireHelpers::readCapabilityPointer(capnp::_::SegmentReader*, capnp::_::CapTableReader*, capnp::_::WirePointer const*, int)

  Read of size 8 at 0x0000016673e0 by main thread:
    #0 capnp::_::WireHelpers::readCapabilityPointer(capnp::_::SegmentReader*, capnp::_::CapTableReader*, capnp::_::WirePointer const*, int) capnproto/c++/src/capnp/layout.c++:2188:5 (ld-2.26.so+0x809e60)
    #1 capnp::_::PointerReader::getCapability() const capnproto/c++/src/capnp/layout.c++:2705 (ld-2.26.so+0x809e60)
    #2 capnp::AnyPointer::Reader::getPipelinedCap(kj::ArrayPtr<capnp::PipelineOp const>) const capnproto/c++/src/capnp/any.c++:53:18 (ld-2.26.so+0x7ff4a7)
    #3 capnp::_::(anonymous 

  Previous atomic write of size 8 at 0x0000016673e0 by thread T1:
    #0 __tsan_atomic64_store <null> (ld-2.26.so+0x6911ee)
    #1 capnp::_::setGlobalBrokenCapFactoryForLayoutCpp(capnp::_::BrokenCapFactory&) capnproto/c++/src/capnp/layout.c++:45:3 (ld-2.26.so+0x800d12)
    #2 capnp::ReaderCapabilityTable::ReaderCapabilityTable(kj::Array<kj::Maybe<kj::Own<capnp::ClientHook> > >) capnproto/c++/src/capnp/capability.c++:951:3 (ld-2.26.so+0x6f32cc)
    #3 capnp::_::(anonymous namespace)::RpcConnectionState::RpcCallContext::RpcCallContext(capnp::_::(anonymous namespace)::RpcConnectionState&, unsigned int, kj::Own<capnp::IncomingRpcMessage>&&, kj::Array<kj::Maybe<kj::Own<capnp::ClientHook> > >, capnp::AnyPointer::Reader const&, bool, kj::Own<kj::PromiseFulfiller<void> >&&, unsigned long, unsigned short) capnproto/c++/src/capnp/rpc.c++:2144:11 (ld-2.26.so+0x754bea)

Kenton Varda

unread,
Oct 9, 2020, 12:48:06 PM10/9/20
to Vitali Lovich, Cap'n Proto
I think that ThreadSanitizer is having trouble recognizing that the initialization of `brokenCapFactory` is thread-safe, due to the awkward way in which it is initialized. It may end up being initialized by multiple threads, but all threads will initialize it to the same value, hence no atomics are necessary when reading it later.

Maybe we should use atomic reads anyway, to make ThreadSanitizer happy. Doing so should be free, at least on x86.

(The reason for the weird initialization is that the factory is implemented in libcapnp-rpc.so, yet needs to be callable from libcapnp.so -- but only if RPC is in use.)

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/19bac31b-6f11-43d1-886e-93d6bc32557an%40googlegroups.com.

Vitali Lovich

unread,
Oct 9, 2020, 12:52:55 PM10/9/20
to Kenton Varda, Cap'n Proto
So you're saying just change the read to atomic read?

Is it possible there's a legitimate read-before-write race condition & that's what TSAN is complaining about? It doesn't seem to be complaining about concurrent writes but I also haven't investigated this codepath to understand yet as I assumed the problem was in my code.

Kenton Varda

unread,
Oct 9, 2020, 1:00:09 PM10/9/20
to Vitali Lovich, Cap'n Proto
It looks like the writes are done atomically, but the reads aren't. TSAN is right to be suspicious of this. In practice there is no bug here, but showing that requires higher-level reasoning that TSAN wouldn't be expected to figure out.

-Kenton

Vitali Lovich

unread,
Oct 9, 2020, 5:23:03 PM10/9/20
to Kenton Varda, Cap'n Proto
Ok. That seems to have made TSAN happy (I still don't understand the exact reason it's safe but I'll trust you're higher level reasoning about the internals of capnp :D). 

Should this atomic read happen only when running under TSAN or just bite the bullet & do it regardless? It's unclear to me if this is a hotpath that's carefully tuned

You received this message because you are subscribed to a topic in the Google Groups "Cap'n Proto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/capnproto/634juhn5ap0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAJouXQkT-3dHBk3iqE_fsfngaFZ447yJK9APof1VEKh-9RTHVw%40mail.gmail.com.

Kenton Varda

unread,
Oct 10, 2020, 3:14:32 PM10/10/20
to Vitali Lovich, Cap'n Proto
An atomic read with "relaxed" ordering should be free. Even "acquire" ordering (if TSAN insists on it) is free on x86. So I'd just have it always be atomic.

-Kenton

Vitali Lovich

unread,
Oct 10, 2020, 3:21:11 PM10/10/20
to Kenton Varda, Cap'n Proto
Sounds good. Relaxed is what I went with and it seemed to make TSAN happy. I uploaded a pull request for this (and the UBSAN issue).

Thanks for your guidance on this.

Erin Shepherd

unread,
Oct 13, 2020, 6:15:26 AM10/13/20
to Cap'n Proto
> I think that ThreadSanitizer is having trouble recognizing that the initialization of `brokenCapFactory` is thread-safe, due to the awkward way in which it is initialized. It may end up being initialized by multiple threads, but all threads will initialize it to the same value, hence no atomics are necessary when reading it later.

This doesn't hold unless every thread reading it also once wrote it. If it might be written by T1 and read (but never initialised) by T2, then appropriate write/read barriers need to be in place (e.g. acquire/release).

Erin

Erin Shepherd

unread,
Oct 13, 2020, 6:23:46 AM10/13/20
to Cap'n Proto
One option I see which would sidestep this entirely is to default initialise the brokenCapFactory to something that performs a read with a stronger load. You'd still need to use explicit atomic loads to not trigger UB, of course, but you could have something like


class UninitializedBrokenCapFactoryImpl: public _::BrokenCapFactory {

public:

kj::Own<ClientHook> newBrokenCap(kj::StringPtr description) override {

  auto realFactory = atomic_load(brokenCapFactory, acquire);
  KJ_REQUIRE(realFactory != this, "must initialise...");
  return realFactory->newBrokenCap(description);
}
...
}

static UninitializedBrokenCapFactoryImpl uninitializedBrokenCapFactory;
static BrokenCapFactory* brokenCapFactory = &uninitializedBrokenCapFactory;

Kenton Varda

unread,
Oct 13, 2020, 10:28:56 AM10/13/20
to Erin Shepherd, Cap'n Proto
On Tue, Oct 13, 2020 at 5:15 AM Erin Shepherd <erin.s...@e43.eu> wrote:
> I think that ThreadSanitizer is having trouble recognizing that the initialization of `brokenCapFactory` is thread-safe, due to the awkward way in which it is initialized. It may end up being initialized by multiple threads, but all threads will initialize it to the same value, hence no atomics are necessary when reading it later.

This doesn't hold unless every thread reading it also once wrote it.

That is the case here.

-Kenton
Reply all
Reply to author
Forward
0 new messages