Re: [chromium-dev] Using C++11 thread_local storage class

1,306 views
Skip to first unread message

Nico Weber

unread,
Aug 14, 2017, 1:25:57 PM8/14/17
to Ken Rockot, cxx
(moving to cxx@ per http://chromium-cpp.appspot.com/)

On Mon, Aug 14, 2017 at 1:11 PM, Ken Rockot <roc...@chromium.org> wrote:
I see there is currently no usage of this in the tree, and I would like to introduce some. To my surprise I can find no evidence of prior discussion.

What's the deal? Can we use it? If not, is there a suitable alternative?

Specifically of interest is a need to ensure that the lifetime of my thread-local thing is actually bounded by each thread's lifetime, and our //base thread-local support does not appear to be sufficient for such things.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/415f44d6-88ed-476c-a6e1-cacccfe2cc85%40chromium.org.

Jeremy Roman

unread,
Aug 14, 2017, 1:53:24 PM8/14/17
to Nico Weber, Ken Rockot, cxx
On Mon, Aug 14, 2017 at 1:25 PM, Nico Weber <tha...@chromium.org> wrote:
(moving to cxx@ per http://chromium-cpp.appspot.com/)

On Mon, Aug 14, 2017 at 1:11 PM, Ken Rockot <roc...@chromium.org> wrote:
I see there is currently no usage of this in the tree, and I would like to introduce some. To my surprise I can find no evidence of prior discussion.

What's the deal? Can we use it? If not, is there a suitable alternative?

I tried to introduce a use before, but its implementation across platforms varies significantly.

Notably, on Mac I had an issue where the TLS lookup symbol stops resolving (read: segfaults) if you access a thread_local variable past a certain point (which the pthreads thread-specific stuff does not do). This is particularly problematic when you just want to access some plain old data in TLS as part of destroying a TLS object.

(On Linux it can work wonderfully and compile down to an offset-from-register. But alas not everything is Linux.)
 
Specifically of interest is a need to ensure that the lifetime of my thread-local thing is actually bounded by each thread's lifetime, and our //base thread-local support does not appear to be sufficient for such things.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/415f44d6-88ed-476c-a6e1-cacccfe2cc85%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAMGbLiFojdFNd5UcF%2BqiedimL6ATGg6MDox4gPk0LWGO1yx8Eg%40mail.gmail.com.

François Doray

unread,
Aug 15, 2017, 10:21:50 AM8/15/17
to tha...@chromium.org, Ken Rockot, cxx
Have you considered using sequence-local storage (base/threading/sequence_local_storage_slot.h)? Values stored in sequence-local storage are destroyed when the sequence is destroyed. More information about sequences can be found at https://chromium.googlesource.com/chromium/src/+/lkcr/docs/threading_and_tasks.md

Ken Rockot

unread,
Aug 15, 2017, 10:27:08 AM8/15/17
to Francois Pierre Doray, cxx, Nico Weber
The intended usage is for an internal TLS cache to supplement base::RandBytes on POSIX, which currently has an unnecessarily high degree of per-call overhead. I don't think we want random number generation to have an internal dependency on sequences.

David Benjamin

unread,
Aug 15, 2017, 12:29:53 PM8/15/17
to Ken Rockot, Francois Pierre Doray, a...@chromium.org, cxx, Nico Weber
+agl

Using a thread-local cache for something like base::RandBytes is tricky because it is not safe against fork(), which Chromium does do by way of the zygote. It's probably better off for stuff with crypto needs[1] to use BoringSSL's implementation which manages thread-locals internally. Also on machines with getrandom(), RAND_bytes will seed with that which avoids some problems /dev/urandom has at very very early boot. We'd need to sort out allowing //base to depend on //third_party/boringssl, but there's no circular dependency, and BoringSSL is static-linker friendly[2], so you wouldn't pull in what you don't need. This would also let us remove //base/sha1.cc which is an unnecessary duplicate SHA-1 implementation.

That said, BoringSSL RAND_bytes will also hit an entropy source on each call because we need to be fork-safe, so this won't immediately solve your problem. On machines with RDRAND, RAND_bytes will use that which should avoid the overhead. We do have a "fork-unsafe" mode for machines without RDRAND, but Chromium does actually fork.

Can you say a bit more about your needs here? It's not very common to need so much randomness that this overhead matters.

[1] Although there are separate base::RandBytes and crypto::RandBytes, they are the same thing and a quick search through Chromium suggests base::RandBytes is more often than not used for cryptographically-secure random numbers.
[2] Due to silly reasons, some of the static-linker friendliness now needs -ffunction-sections or equivalent, but that can be easily fixed if it's a problem.

You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Ken Rockot

unread,
Aug 15, 2017, 1:08:06 PM8/15/17
to David Benjamin, Francois Pierre Doray, a...@chromium.org, cxx, Nico Weber, Justin Schuh
On Tue, Aug 15, 2017 at 9:29 AM, David Benjamin <davi...@chromium.org> wrote:
+agl

Using a thread-local cache for something like base::RandBytes is tricky because it is not safe against fork(), which Chromium does do by way of the zygote. It's probably better off for stuff with crypto

Yeah, the plan there was going to be to force-flush the cache immediately before or after zygote fork, which is a bit of a hack. We also have multiprocess tests which fork, and they would need to be fixed as well.

needs[1] to use BoringSSL's implementation which manages thread-locals internally. Also on machines with getrandom(), RAND_bytes will seed with that which avoids some problems /dev/urandom has at very very early boot. We'd need to sort out allowing //base to depend on //third_party/boringssl, but there's no circular dependency, and BoringSSL is static-linker friendly[2], so you wouldn't pull in what you don't need. This would also let us remove //base/sha1.cc which is an unnecessary duplicate SHA-1 implementation.

That said, BoringSSL RAND_bytes will also hit an entropy source on each call because we need to be fork-safe, so this won't immediately solve your problem. On machines with RDRAND, RAND_bytes will use that which should avoid the overhead. We do have a "fork-unsafe" mode for machines without RDRAND, but Chromium does actually fork.

Can you say a bit more about your needs here? It's not very common to need so much randomness that this overhead matters.

We do call RandBytes pretty frequently. The specific motivation here was that creating a new Mojo message pipe means two separate calls to RandBytes, and in microbenchmarks which end up creating lots of pipes we've observed quite a lot of time being spent within RandBytes. A simple page-sized cache reduces that time by ~20-30%.

Of course optimizing for microbenchmarks is not necessarily valuable, and we've already addressed the for Mojo specifically anyway.

It was suggested to me that I should attempt to land a more generalized solution, but that's starting to sound like more trouble than it's worth.

Jeremy Roman

unread,
Aug 15, 2017, 7:28:31 PM8/15/17
to mos...@chromium.org, Brett Wilson, Ken Rockot, Chromium-dev, cxx
On Tue, Aug 15, 2017 at 7:11 PM, Leonard Mosescu <mos...@chromium.org> wrote:
MSVC places thread local variables(*) in a special section (.tls) . A single TLS slot points to the base of the thread-specific copy of this section so accessing a thread_local variable is .tls_base[offset]. I'll let Linux/macOS experts chime in about how things are done on these platforms although it seems very close to the PE/COFF solution.

(*) MSVC implements thread_local pretty much by repackaging of the old declspec(thread) with a few rough corners (around initialization) fixed.

As I mentioned in the cxx@ fork of this thread, on Mac implementation has sharp edges.
 
On Tue, Aug 15, 2017 at 3:56 PM, Brett Wilson <bre...@chromium.org> wrote:
On Tue, Aug 15, 2017 at 3:47 PM, Leonard Mosescu <mos...@chromium.org> wrote:

On Tue, Aug 15, 2017 at 3:04 PM, Brett Wilson <bre...@chromium.org> wrote:
I don't think we would want to use anything but the base implementation of TLS.

On Android, there are very few TLS slots (I found a reference to 60 somewhere) and were hitting the maximum. The base implementation maps a 256-entry TLS array into a single OS TLS slot to avoid this problem. I would assume that an STL implementation of TLS would just call the native OS APIs and if people start using them we would be running out of slots again.

There may be some valid concerns about thread_local but this should not be one of them (it is not technically feasible to map thread_local variables to individual TLS slots provided by the host OS).
 
We should probably put this on the explicitly banned list.

What's the rule of thumb here? I'm new to Chrome but I'd expect that all thing being equal we'd favor standard language features over hand crafted alternatives. Thread storage class is a first class, fundamental part of C++ since C++11, with well defined semantics (and one would expect first class, solid implementations in all the major toolchains)

If the implementation is good and does what we want on all platforms for all compilers we support and there are no specific concerns, then we should use it. I am not so optimistic as you about uniformly good implementations though...

Brett

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Leonard Mosescu

unread,
Aug 15, 2017, 7:43:26 PM8/15/17
to Jeremy Roman, Brett Wilson, Ken Rockot, Chromium-dev, cxx
@Thomas: good question. Short answer : you should be fine even if you don't use std::thread. Long answer: The C++ standard defines a threading model (1.10) and also provides std::thread as a way to control threads, but as far as I remember it's underspecified how things behave when other platform facilities are used to create threads. In practice, if you use the CRT you need to use a CRT facility to create thread (ex. _beginthreadex() instead of calling CreateThread() directly on Windows). This may sound a bit scary, although if you use the CRT and you don't use the CRT thread functions you have worse things to worry about than thread_local.

@Jeremy: do you happen to remember the details, or perhaps have a repro? The thread_local variables can be tricky since they have all the problems associated with global variables plus additional nastiness related to thread lifetimes.

I don't want to suggest that thread_local is cheap or safe (so I definitely don't advocate sprinkling it around). I was just curious if there are other technical reasons to outright ban the language feature and/or prefer the base TLS support.

Gabriel Charette

unread,
Aug 16, 2017, 9:09:14 AM8/16/17
to mos...@chromium.org, Jeremy Roman, Brett Wilson, Ken Rockot, Chromium-dev, cxx

Also worth mentioning in this thread: base now provides SequenceLocalStorageSlot as a sequence-friendly version of ThreadLocalStorage::Slot. As we move more and more towards sequences, I suspect most TLS usage outside of base internals should migrate to SLS.


Brett Wilson

unread,
Aug 16, 2017, 12:10:17 PM8/16/17
to Gabriel Charette, Leonard Mosescu, Jeremy Roman, Ken Rockot, Chromium-dev, cxx
Here is a proposed addition to the style guide that disallows this:

I'm not 100% confident about this advice as I don't have an intuitive feeling for how bad the issues on Mac are. It would be nice to use if we can.

Brett

Lucas Gadani

unread,
Nov 2, 2017, 6:11:33 PM11/2/17
to cxx, g...@chromium.org, mos...@chromium.org, jbr...@chromium.org, roc...@chromium.org, chromi...@chromium.org
I'd like to revisit this and propose that we allow the thread_local usage.

Here's a CL that shows that it works on all platforms: https://chromium-review.googlesource.com/c/chromium/src/+/743696

The caveat on Mac is that we cannot access TLS objects during the destructors of other TLS objects, which seems fairly reasonable, and I believe is the issue that Jeremy was hitting when experimenting, and why I had to make a change in ThreadSpecific::Destroy in the CL above.

I've done some performance comparisons between using thread_local and using WTF::ThreadSpecific, and there's a 2.5x performance improvement[1] on both Windows and Mac. The generated code is also significantly smaller, which is a nice benefit as well.

[1] The testcase I used for performance comparison was:
  DEFINE_THREAD_SAFE_STATIC_LOCAL(WTF::ThreadSpecific<bool>, test_bool, ());
  for (int i = 0; i < 1000000000; i++)
    *test_bool = !*test_bool;

  static thread_local volatile bool test_bool;
  for (int i = 0; i < 1000000000; i++)
    test_bool = !test_bool;

Leonard Mosescu

unread,
Nov 2, 2017, 6:44:03 PM11/2/17
to Lucas Gadani, cxx, g...@chromium.org, Jeremy Roman, Ken Rockot, Chromium-dev
The caveat on Mac is that we cannot access TLS objects during the destructors of other TLS objects, which seems fairly reasonable

It's worth noting that this is by design and not an implementation issue. It's pretty much the same constraint that applies to global objects.

Jeremy Roman

unread,
Nov 2, 2017, 8:06:34 PM11/2/17
to Leonard Mosescu, Lucas Gadani, cxx, Gabriel Charette, Ken Rockot, Chromium-dev
On Thu, Nov 2, 2017 at 6:44 PM, Leonard Mosescu <mos...@chromium.org> wrote:
The caveat on Mac is that we cannot access TLS objects during the destructors of other TLS objects, which seems fairly reasonable

It's worth noting that this is by design and not an implementation issue. It's pretty much the same constraint that applies to global objects.

Note: at the time I tested, this included primitive types (can't access a bool, even). IIRC, the issue I encountered was roughly:
- thread_local stuff torn down
- pthread destructors run
-- WTFThreadData (inc. AtomicStringTable torn down)
-- other thing uses AtomicString; accesses ThreadSpecific<WTFThreadData>
-- ThreadSpecific::operator T* (new object case) accesses checks if main thread
-- main thread check accesses thread_local bool (mac crashed here)
- pthread destructors run again, re-destroy WTFThreadData

Are we sure this cannot still happen? It looks to me like it could (though the mayNotBeMainThread stuff might be hiding it in non-edge cases).

The order of destruction being undefined is reasonably analogous to globals; what I encountered was a primitive (which doesn't need destruction at all) going away, which AFAIK doesn't happen with globals.

Lucas Gadani

unread,
Nov 7, 2017, 11:00:53 AM11/7/17
to Jeremy Roman, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev
What I found is that on Mac, having a TLS object will automatically generate and accessor function, which dynamically allocates memory for the object on first access on that thread. They get destroyed/deallocated on thread destruction, which means we shouldn't access even POD TLS data on the destructors.

I still think this is a reasonable limitation given the benefits of improved performance and better generated code.

As for the specific issue you mentioned with AtomicStrings, I haven't seen that happening, it's possible that it got fixed. In particular, this would also be bad even without thread_local, since accessing an AtomicString after the destruction on the WTFThreadData would lead to leaked memory, since the WTFThreadData would be re-allocated and nothing will free it afterwards. I'm fine if that gets changed to a crash, since it should be fixed anyway.

Jeremy Roman

unread,
Nov 8, 2017, 7:47:44 PM11/8/17
to Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev
On Tue, Nov 7, 2017 at 11:00 AM, Lucas Gadani <l...@chromium.org> wrote:
What I found is that on Mac, having a TLS object will automatically generate and accessor function, which dynamically allocates memory for the object on first access on that thread. They get destroyed/deallocated on thread destruction, which means we shouldn't access even POD TLS data on the destructors.

I still think this is a reasonable limitation given the benefits of improved performance and better generated code.

As for the specific issue you mentioned with AtomicStrings, I haven't seen that happening, it's possible that it got fixed. In particular, this would also be bad even without thread_local, since accessing an AtomicString after the destruction on the WTFThreadData would lead to leaked memory, since the WTFThreadData would be re-allocated and nothing will free it afterwards. I'm fine if that gets changed to a crash, since it should be fixed anyway.

Good to hear it hasn't been an issue in practice anymore.

However, note that this is not quite what happened (at least on POSIX). pthreads is clever enough to try to run destructors more than once if TLS keys are created during destruction, at least PTHREAD_DESTRUCTOR_ITERATIONS (in practice, four) times. This does allow for WTFThreadData being re-freed in a subsequent pass (and definitely did happen in practice, when I last looked).

It's kinda hard to make this an absolute restriction (though I concede "eh, four ought to do it" is hardly elegant) unless we can control the order of thread destructors, because on the surface it seems reasonable for an object's destructor to access the AtomicString table (e.g. a thread-specific HashMap<AtomicString, T> would need to to remove strings from the table and unref them if they would otherwise leak).

Lucas Gadani

unread,
Nov 17, 2017, 5:34:26 PM11/17/17
to Jeremy Roman, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev
I thoroughly investigated this today, here's what I found about the behavior on Mac:

- thread_local variables are destroyed before WTF::ThreadSpecific (which uses pthreads TLS).
- If a thread_local is used after it was destroyed, it'll be re-created. This means that, for example, if you have POD it'll be reinitialized according to its default initialization.
- The exact same thing happens to WTF::ThreadSpecific, it'll be re-created if it's used after it has been destroyed. It is also re-constructed according to its default initializers.
- If you have a circular dependence between creating TLS inside the destructors, it'll eventually give up.

So, basically, thread_local on Mac behaves identically to WTF::ThreadSpecific, with the exception that it is always destroyed first.

Here's the test code I used to test this behavior:

static void TLSThreadTest();

static void TLSThreadTest2();


class TLSHelperB {

 public:

  TLSHelperB() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

  ~TLSHelperB() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

    TLSThreadTest();

  }

};


class WTFTLSHelperB {

 public:

  WTFTLSHelperB() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

  ~WTFTLSHelperB() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

    TLSThreadTest2();

  }

};


class TLSHelperA {

 public:

  TLSHelperA() {

    static thread_local TLSHelperB tls_helper;

    DEFINE_THREAD_SAFE_STATIC_LOCAL(WTF::ThreadSpecific<WTFTLSHelperB>, tls_helper_wtf, ());

    *tls_helper_wtf;

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

  ~TLSHelperA() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

};

class WTFTLSHelperA {

 public:

  WTFTLSHelperA() {

    static thread_local TLSHelperB tls_helper;

    DEFINE_THREAD_SAFE_STATIC_LOCAL(WTF::ThreadSpecific<WTFTLSHelperB>, tls_helper_wtf, ());

    *tls_helper_wtf;

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

  ~WTFTLSHelperA() {

    LOG(ERROR) << this << " " << __PRETTY_FUNCTION__;

  }

};


static void TLSThreadTest() {

  LOG(ERROR) << "Start " << __PRETTY_FUNCTION__;

  static thread_local TLSHelperA tls_helper_a;

  DEFINE_THREAD_SAFE_STATIC_LOCAL(WTF::ThreadSpecific<WTFTLSHelperA>, tls_helper_wtf, ());

  *tls_helper_wtf;

  LOG(ERROR) << "End " << __PRETTY_FUNCTION__;

}


static void TLSThreadTest2() {

  LOG(ERROR) << "Start " << __PRETTY_FUNCTION__;

  static thread_local TLSHelperA tls_helper_a;

  DEFINE_THREAD_SAFE_STATIC_LOCAL(WTF::ThreadSpecific<WTFTLSHelperA>, tls_helper_wtf, ());

  *tls_helper_wtf;

  LOG(ERROR) << "End " << __PRETTY_FUNCTION__;

}


TEST(..., TLS) {

  Thread a("Test");

  EXPECT_TRUE(a.Start());

  a.task_runner()->PostTask(

      FROM_HERE,

      base::BindOnce(&TLSThreadTest));

  a.task_runner()->PostTask(

      FROM_HERE,

      base::BindOnce(&TLSThreadTest2));

  a.Stop();

}


And here's the output:

Start void WTF::TLSThreadTest()

0x7ffef6e08582 WTF::TLSHelperB::TLSHelperB()

0x27fae18038 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x7ffef6e08580 WTF::TLSHelperA::TLSHelperA()

0x7ffef6e08590 WTF::TLSHelperB::TLSHelperB()

0x27fae18088 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x27fae18060 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest()

Start void WTF::TLSThreadTest2()

0x7ffef6e085a0 WTF::TLSHelperA::TLSHelperA()

0x27fae180b0 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest2()

0x7ffef6e085a0 WTF::TLSHelperA::~TLSHelperA()

0x7ffef6e08590 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

End void WTF::TLSThreadTest()

0x7ffef6e08580 WTF::TLSHelperA::~TLSHelperA()

0x7ffef6e08582 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

End void WTF::TLSThreadTest()

0x27fae18038 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

0x7ffef6e07d62 WTF::TLSHelperB::TLSHelperB()

0x7ffef6e07d80 WTF::TLSHelperA::TLSHelperA()

End void WTF::TLSThreadTest2()

0x27fae18060 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x27fae18088 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

End void WTF::TLSThreadTest2()

0x27fae180b0 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x7ffef6e07d80 WTF::TLSHelperA::~TLSHelperA()

0x7ffef6e07d62 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

0x27fae180b0 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x7ffef6e07d60 WTF::TLSHelperA::TLSHelperA()

0x7ffef6e07d70 WTF::TLSHelperB::TLSHelperB()

0x27fae18060 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x27fae18088 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest()

0x27fae180b0 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

0x7ffef6e07d62 WTF::TLSHelperB::TLSHelperB()

0x7ffef6e07d80 WTF::TLSHelperA::TLSHelperA()

0x7ffef6e07d70 WTF::TLSHelperB::TLSHelperB()

0x27fae18038 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest2()

0x27fae18088 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x27fae18060 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

End void WTF::TLSThreadTest2()

0x27fae18038 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x7ffef6e07d70 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

0x27fae18038 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x7ffef6e07d60 WTF::TLSHelperA::TLSHelperA()

0x27fae18088 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x27fae18060 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest()

0x7ffef6e07d80 WTF::TLSHelperA::~TLSHelperA()

0x7ffef6e07d62 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

End void WTF::TLSThreadTest()

0x7ffef6e07d70 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

End void WTF::TLSThreadTest()

0x7ffef6e07d60 WTF::TLSHelperA::~TLSHelperA()

0x27fae18038 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

0x7ffef6e07d62 WTF::TLSHelperB::TLSHelperB()

0x7ffef6e07d80 WTF::TLSHelperA::TLSHelperA()

0x7ffef6e07d70 WTF::TLSHelperB::TLSHelperB()

0x27fae180b0 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest2()

0x27fae18060 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x27fae18088 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

End void WTF::TLSThreadTest2()

0x27fae180b0 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x7ffef6e07d70 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

0x27fae180b0 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x7ffef6e07d60 WTF::TLSHelperA::TLSHelperA()

0x27fae18060 WTF::WTFTLSHelperB::WTFTLSHelperB()

0x27fae18088 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest()

0x7ffef6e07d80 WTF::TLSHelperA::~TLSHelperA()

0x7ffef6e07d62 WTF::TLSHelperB::~TLSHelperB()

Start void WTF::TLSThreadTest()

End void WTF::TLSThreadTest()

0x7ffef6e07d60 WTF::TLSHelperA::~TLSHelperA()

0x27fae180b0 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

0x7ffef6e07e82 WTF::TLSHelperB::TLSHelperB()

0x7ffef6e07ea0 WTF::TLSHelperA::TLSHelperA()

0x7ffef6e07e90 WTF::TLSHelperB::TLSHelperB()

0x27fae18038 WTF::WTFTLSHelperA::WTFTLSHelperA()

End void WTF::TLSThreadTest2()

0x27fae18088 WTF::WTFTLSHelperA::~WTFTLSHelperA()

0x27fae18060 WTF::WTFTLSHelperB::~WTFTLSHelperB()

Start void WTF::TLSThreadTest2()

End void WTF::TLSThreadTest2()

0x27fae18038 WTF::WTFTLSHelperA::~WTFTLSHelperA()





You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Jeremy Roman

unread,
Nov 28, 2017, 5:08:20 PM11/28/17
to Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
OK. I'm a little frustrated that the C++ spec doesn't seem to explicitly spell out what happens in cases like this, but it sounds like this is safe at least for a simple POD like is_main_thread, and we can see what issues we run into in practice from there. If thakis or danakj agrees, I'm happy to approve a CL which moves it to allowed with use by special dispensation and adds a use for a simple such case. I'm reluctant to give carte-blanche for broader use until we have more experience with how this interacts with the various other TLS in Chromium (because it tickles so many weird platform/compiler-specific things), but I would like to see us ultimately use the language-level thread_local if we can, because it's pretty nice. :)

Thanks for the detailed investigation.

David Turner

unread,
Nov 29, 2017, 5:05:37 AM11/29/17
to jbr...@chromium.org, Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
On Tue, Nov 28, 2017 at 11:08 PM, Jeremy Roman <jbr...@chromium.org> wrote:
OK. I'm a little frustrated that the C++ spec doesn't seem to explicitly spell out what happens in cases like this, but it sounds like this is safe at least for a simple POD like is_main_thread, and we can see what issues we run into in practice from there. If thakis or danakj agrees, I'm happy to approve a CL which moves it to allowed with use by special dispensation and adds a use for a simple such case. I'm reluctant to give carte-blanche for broader use until we have more experience with how this interacts with the various other TLS in Chromium (because it tickles so many weird platform/compiler-specific things), but I would like to see us ultimately use the language-level thread_local if we can, because it's pretty nice. :)

For the record, I beliieve that thread_local was broken Android before Android M :-( [1]. It is possible to work around this by using a more recent version of the Android NDK and /or libc++abi sources though.

- Digit

[1] https://github.com/android-ndk/ndk/issues/156 and more specifically this comment:

"""We can support trivial cases of thread_local, but thread_local with non-trivial destructors requires run time support that wasn't available until M (and I think the NDK's current libc++abi doesn't have the support either). There was actually a patch recently made to libc++abi adding support for this on platforms that don't have native support, so once that lands and we pick it up we could actually support that on older platforms."""
 
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CACuR13d0FnS0Nga7EEEu%2BwdWciB1%2BRrUjzYQXTGvYdK88GZkSg%40mail.gmail.com.

David Turner

unread,
Nov 29, 2017, 9:15:51 AM11/29/17
to Jeremy Roman, Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
On Wed, Nov 29, 2017 at 11:05 AM, David Turner <di...@chromium.org> wrote:


On Tue, Nov 28, 2017 at 11:08 PM, Jeremy Roman <jbr...@chromium.org> wrote:
OK. I'm a little frustrated that the C++ spec doesn't seem to explicitly spell out what happens in cases like this, but it sounds like this is safe at least for a simple POD like is_main_thread, and we can see what issues we run into in practice from there. If thakis or danakj agrees, I'm happy to approve a CL which moves it to allowed with use by special dispensation and adds a use for a simple such case. I'm reluctant to give carte-blanche for broader use until we have more experience with how this interacts with the various other TLS in Chromium (because it tickles so many weird platform/compiler-specific things), but I would like to see us ultimately use the language-level thread_local if we can, because it's pretty nice. :)

For the record, I beliieve that thread_local was broken Android before Android M :-( [1]. It is possible to work around this by using a more recent version of the Android NDK and /or libc++abi sources though.

Good news!

I have briefly looked into this, and it looks like that we're already using a recent version of the Android NDK (r16 according to the git log), even though third_party/android_tools/ndk/README.chromium claimes it is only r12b (and lists a number of patches that may not apply anymore).

This means it should be ok to use thread_local on any Android version now. I'll look at the best way to update the README.chromium, but that's a separate issue.

Sorry for the mis-warning.

Dmitry Skiba

unread,
Nov 29, 2017, 12:04:59 PM11/29/17
to David Turner, Jeremy Roman, Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens, Brian Sheedy
Hmm, I think r16 hasn't landed yet, see crbug.com/771171. +Brian who works on it.

Brian Sheedy

unread,
Nov 29, 2017, 12:46:50 PM11/29/17
to Dmitry Skiba, David Turner, Jeremy Roman, Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
We're still on r12b. A Skia change went in recently that had a fix we needed for Chromium to compile with r16 (that's probably what you saw). The Chromium-side changes seem to all work, so barring any unexpected blockers, r16 will probably be in shortly after branch point (I would guess early next week?).

David Turner

unread,
Nov 29, 2017, 1:58:51 PM11/29/17
to Brian Sheedy, Dmitry Skiba, Jeremy Roman, Lucas Gadani, Leonard Mosescu, cxx, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
On Wed, Nov 29, 2017 at 6:46 PM, Brian Sheedy <bsh...@google.com> wrote:
We're still on r12b. A Skia change went in recently that had a fix we needed for Chromium to compile with r16 (that's probably what you saw). The Chromium-side changes seem to all work, so barring any unexpected blockers, r16 will probably be in shortly after branch point (I would guess early next week?).


Thanks for the correction, my bad, I misread the git history for //third_party/, it is r12b at the moment indeed, though agrieve@ has an r16 branch on the chromium repository in preparation.
Brian, your patch looks promising, thanks a lot, can't wait for this to land.

zer...@gmail.com

unread,
Jan 8, 2018, 6:57:05 PM1/8/18
to cxx, l...@chromium.org, mos...@chromium.org, g...@chromium.org, roc...@chromium.org, chromi...@chromium.org, tha...@chromium.org, dan...@chromium.org
On Tuesday, November 28, 2017 at 2:08:20 PM UTC-8, Jeremy Roman wrote:
OK. I'm a little frustrated that the C++ spec doesn't seem to explicitly spell out what happens in cases like this, but it sounds like this is safe at least for a simple POD like is_main_thread, and we can see what issues we run into in practice from there. If thakis or danakj agrees, I'm happy to approve a CL which moves it to allowed with use by special dispensation and adds a use for a simple such case. I'm reluctant to give carte-blanche for broader use until we have more experience with how this interacts with the various other TLS in Chromium (because it tickles so many weird platform/compiler-specific things), but I would like to see us ultimately use the language-level thread_local if we can, because it's pretty nice. :)

thakis/danakj, want to comment on the above?  It seems like this thread is a bit in limbo right now, I'd like to see it move to a resolved place.

PK

Peter Kasting

unread,
Jan 8, 2018, 6:59:18 PM1/8/18
to cxx, l...@chromium.org, Leonard Mosescu, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
On Mon, Jan 8, 2018 at 3:57 PM, <zer...@gmail.com> wrote:
thakis/danakj, want to comment on the above?  It seems like this thread is a bit in limbo right now, I'd like to see it move to a resolved place.

PK

In case it wasn't clear, that was me...

PK

Jeremy Roman

unread,
Jan 8, 2018, 8:25:31 PM1/8/18
to Peter Kasting, cxx, Lucas Gadani, Leonard Mosescu, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
I believe Lucas has a patch to add a single use for IsMainThread, but AFAIK it hasn't landed yet. If that succeeds, we'll look at allowing more uses.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To post to this group, send email to c...@chromium.org.

Lucas Gadani

unread,
Jan 9, 2018, 4:47:22 PM1/9/18
to Jeremy Roman, Peter Kasting, cxx, Leonard Mosescu, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
Yes, that has been in the backburner for a while, I'll try to get it rebased and reviewed this week.

vtsyrk...@google.com

unread,
Jan 28, 2019, 12:40:26 PM1/28/19
to cxx, jbr...@chromium.org, pkas...@chromium.org, mos...@chromium.org, g...@chromium.org, roc...@chromium.org, chromi...@chromium.org, tha...@chromium.org, dan...@chromium.org
The underlying issue [1] blocking the CL noted below is fixed. I'm not sure if lfg@ is still interested in landing this patch, but if anyone else has a cross-platform use for thread_local it could be a good opportunity to give it a test before landing the style guide change.

Dave Tapuska

unread,
Jan 28, 2019, 1:55:21 PM1/28/19
to vtsyrk...@google.com, cxx, Jeremy Roman, Peter Kasting, mos...@chromium.org, g...@chromium.org, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
Regarding Lucas's patch... I think the whole CurrentThread should actually be rewritten in terms of base::PlatformThread::CurrentThreadId(). Blink may want to have a thread_local around that call but I think it can be more lightweight.

dave.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/54654263-36c5-46a9-9d73-788185208794%40chromium.org.

Vlad Tsyrklevich

unread,
Jan 28, 2019, 2:04:11 PM1/28/19
to Dave Tapuska, cxx, Jeremy Roman, Peter Kasting, mos...@chromium.org, g...@chromium.org, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
I have no opinion how Blink should implement it, but I'll just note that the Linux implementation of PlatformThread::CurrentThreadID() already uses thread_local: https://cs.chromium.org/chromium/src/base/threading/platform_thread_posix.cc?type=cs&q=thread_local&g=0&l=149

Dave Tapuska

unread,
Jan 28, 2019, 2:10:24 PM1/28/19
to Vlad Tsyrklevich, cxx, Jeremy Roman, Peter Kasting, mos...@chromium.org, g...@chromium.org, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
Yes I wrote that code... We needed to fix a perf problem with the swi becoming more expensive in a newer version of glibc. The swi code is duplicated in blink and the base primitive should just be used. If I get time I'll put together a pull request.

The guide should be updated separately if this is now fixed and is available for broad adoption.

dave.

Lucas Gadani

unread,
Jan 28, 2019, 2:11:19 PM1/28/19
to Vlad Tsyrklevich, Dave Tapuska, cxx, Jeremy Roman, Peter Kasting, mos...@chromium.org, g...@chromium.org, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens
I'll try to get my patch rebased and reviewed again now that the Mac blocker is fixed.

Regarding rewriting WTF::CurrentThread, it can be done, but it should be orthogonal to my patch, which only moved the caching of the current thread to a thread_local instead of going through WTF::ThreadSpecific (that was just a perf improvement, since it avoids a jump when checking if we are on the main thread).

Jeremy Roman

unread,
Jan 28, 2019, 2:49:13 PM1/28/19
to Lucas Gadani, Vlad Tsyrklevich, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, Dana Jansens

I got a little sidetracked trying to actually make the @plt stubs disappear (which I couldn't manage), because this seems like something that should be inlined (and with LTO, that should be rather easy to do), but it may be a win to just land it anyway.

Corentin Wallez

unread,
Oct 1, 2020, 6:53:12 AM10/1/20
to Chromium-dev, Jeremy Roman, vtsyrk...@google.com, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Chromium-dev, Nico Weber, danakj, Lucas Gadani

Has there been any updates on the experiments to use thread_local in Chromium? We're interested in using it in a third-party dependency of Chromium (Dawn), maybe that can be the experiment.

Cheers,

Corentin

Jeremy Roman

unread,
Oct 1, 2020, 9:56:54 AM10/1/20
to Corentin Wallez, Egor Pasko, Chromium-dev, vtsyrk...@google.com, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Nico Weber, danakj, Lucas Gadani
It appears that uses have snuck in so it appears that at least for simple uses to work. I'd be curious to see what the codegen actually looks like on our various build configs (and whether we need to tweak -ftls-model).

On an internal thread, +Egor Pasko and others looked into this in the context of some memory safety efforts in August; I'm not sure what the status of that is.

Benoit Lize

unread,
Oct 14, 2020, 5:32:13 AM10/14/20
to jbr...@chromium.org, Corentin Wallez, Egor Pasko, Chromium-dev, vtsyrk...@google.com, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Nico Weber, danakj, Lucas Gadani
Hi Corentin,

I've tried using it, faced issues, but did not dig much further afterwards to see where they were coming from:
- For simple objects, it seems to be working fine (as Jeremy pointed out, it is already used in the codebase in a couple places), and the codegen is what you expect, at least on desktop, that is very good (static offset from a register)
- For my use case, which had a destructor etc., I encountered weird failures on Windows 7 and Lollipop bots.

The issues may be down to me holding it wrong, but at least I think that for simple object (e.g. a POD one) it should work fine. I haven't checked the generated code on Android though, but Egor may have.

Cheers,

Benoit


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CACuR13cbtMOmOGFe0pg%2B-vOLRHnjhr2FqiJ-PDTN6_B87J9JVQ%40mail.gmail.com.

Corentin Wallez

unread,
Oct 14, 2020, 5:57:44 AM10/14/20
to Benoit Lize, jbr...@chromium.org, Egor Pasko, Chromium-dev, vtsyrk...@google.com, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Nico Weber, danakj, Lucas Gadani
Hey Benoit, thanks for doing another test! In our case it is a POD so it's probably fine. It's quite good that it's just a static offset from a register and that we don't have to walk the task structure!

Bin Wu

unread,
Jun 14, 2021, 4:27:21 PM6/14/21
to Chromium-dev, Corentin Wallez, Jeremy Roman, Egor Pasko, Chromium-dev, vtsyrk...@google.com, Dave Tapuska, cxx, Peter Kasting, Leonard Mosescu, Gabriel Charette, Ken Rockot, Nico Weber, danakj, Lucas Gadani, Benoit Lize
Any update on this? I'd like to add a thread local raw pointer in net/third_party/quiche.

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

Peter Kasting

unread,
Jan 2, 2023, 3:20:35 PM1/2/23
to cxx
I'm resurrecting this thread.

Based on the previous posts here, linked CLs, bugs, etc., `thread_local` should be safe at least when used with PODs or `CONSTINIT`. I propose we allow it for those cases, which probably covers most real-world use cases anyway. In practice, we've  been using `thread_local` in several places for some time, so this is just matching actual practice.

I'll give this thread a week for people to weigh in. If there is no response, then in a week I'll make this change.

PK

Daniel Cheng

unread,
Jan 2, 2023, 7:13:49 PM1/2/23
to Peter Kasting, cxx
I don't remember the context at this point, since email retention policy has claimed it. What were the original problems with thread_local?

What's the current Chrome alternative? base::SequenceLocalStorageMap/base::SequenceLocalStorageSlot and base::ThreadLocal{Boolean,Pointer} and base::ThreadLocalStorage::StaticSlot/Slot?

Can we give some guidelines for when to use the various alternatives here?

Daniel

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Nico Weber

unread,
Jan 3, 2023, 9:04:08 AM1/3/23
to cxx, Peter Kasting
I know that clang produces broken codegen with thread_local on macOS for thread_locals exported by one dylib and used by another: https://github.com/llvm/llvm-project/issues/51045

That's a fairly simple thing that's broken, which to me possibly suggests that `thread_local` isn't terribly widely used, and there might be other issues.

Victor Vasiliev

unread,
Jan 3, 2023, 9:22:37 AM1/3/23
to Nico Weber, cxx, Peter Kasting
We ran into an issue before where clang assumed incorrect alignment for thread-locals on Windows 7, producing SSE instructions that crashed: https://quiche.git.corp.google.com/quiche/+/cf0f085217c842fb5c04dde1025a3ef8f19fdbbc.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Peter Kasting

unread,
Jan 3, 2023, 1:19:10 PM1/3/23
to Victor Vasiliev, cxx
On Tue, Jan 3, 2023 at 6:22 AM Victor Vasiliev <vas...@google.com> wrote:
We ran into an issue before where clang assumed incorrect alignment for thread-locals on Windows 7, producing SSE instructions that crashed: https://quiche.git.corp.google.com/quiche/+/cf0f085217c842fb5c04dde1025a3ef8f19fdbbc.

Do you know if that was Win 7-specific for some reason, or if it applied to newer Windows versions also? I ask since Chromium has now dropped support for Win 7, so if there's some reason thread_local is broken there but not afterwards that wouldn't be a blocker; but OTOH I'm not sure how a codegen would be specific for an OS version, as opposed to a CPU arch/revision or something.

Was there a Clang bug for the issue here? I wasn't abv to find any kind of bug link in the CL description linked.

PK

Avi Drissman

unread,
Jan 3, 2023, 1:21:45 PM1/3/23
to Victor Vasiliev, Nico Weber, cxx, Peter Kasting
Note that Windows 7-8 support will end with Chrome 109, and starting with Chrome 110 we can assume at least Windows 10. Given that 110 branched, this should no longer be a problem.

Avi

Peter Kasting

unread,
Jan 3, 2023, 1:26:59 PM1/3/23
to Nico Weber, cxx
On Tue, Jan 3, 2023 at 6:04 AM Nico Weber <tha...@chromium.org> wrote:
I know that clang produces broken codegen with thread_local on macOS for thread_locals exported by one dylib and used by another: https://github.com/llvm/llvm-project/issues/51045

Interesting. The majority of the problems cited on the thread before were about bugs on Mac, so I wonder if Mac is in worse shape.

Regarding that specific bug: It looks like on your https://reviews.llvm.org/D109112, rjmccall commented back in July about the desired model the compiler should have. Any chance you can resummarize/reassign the bug/patch here so that it has the right description/owner and doesn't appear to be in limbo? I don't really understand any of what y'all are talking about so I can't do this myself :)

That's a fairly simple thing that's broken, which to me possibly suggests that `thread_local` isn't terribly widely used, and there might be other issues.

Definitely willing to believe this. At the very least, my hope would be to get the set of issues fleshed out a bit as LLVM bugs and maybe get Lexan team to devote time to them if necessary; right now `thread_local` has just been stalled for years, so if we don't push on it it's not going to move forward.

AFAIK, there is really no alternative today for someone who wants a `constinit thread_local`; the base:: abstractions around this stuff are not constexpr and do not look like they can be made such. I'm trying to use such a variable (debug only and not exported) in my coroutines port, so it would be good to either have this or an alternative.

PK

Peter Kasting

unread,
Jan 3, 2023, 1:47:34 PM1/3/23
to Daniel Cheng, cxx
On Mon, Jan 2, 2023 at 4:13 PM Daniel Cheng <dch...@chromium.org> wrote:
I don't remember the context at this point, since email retention policy has claimed it. What were the original problems with thread_local?

jbroman@ 2017: "on Mac I had an issue where the TLS lookup symbol stops resolving (read: segfaults) if you access a thread_local variable past a certain point (which the pthreads thread-specific stuff does not do)"

My take: This may be due to the bug Nico referenced. Possible workaround: disallow thread_local for exported variables.

brettw@ 2017: "On Android, there are very few TLS slots (I found a reference to 60 somewhere) and were hitting the maximum. The base implementation maps a 256-entry TLS array into a single OS TLS slot to avoid this problem. I would assume that an STL implementation of TLS would just call the native OS APIs and if people start using them we would be running out of slots again."
mosescu@ 2017: "There may be some valid concerns about thread_local but this should not be one of them (it is not technically feasible to map thread_local variables to individual TLS slots provided by the host OS). MSVC places thread local variables(*) in a special section (.tls) . A single TLS slot points to the base of the thread-specific copy of this section so accessing a thread_local variable is .tls_base[offset]. I'll let Linux/macOS experts chime in about how things are done on these platforms although it seems very close to the PE/COFF solution."

My take: Not a concern.

<back and forth between jbroman@ and lfg@ about destruction order/safety/reentrance/etc.>
lfg@ 2017: "- thread_local variables are destroyed before WTF::ThreadSpecific (which uses pthreads TLS).
- If a thread_local is used after it was destroyed, it'll be re-created. This means that, for example, if you have POD it'll be reinitialized according to its default initialization.
- The exact same thing happens to WTF::ThreadSpecific, it'll be re-created if it's used after it has been destroyed. It is also re-constructed according to its default initializers.
- If you have a circular dependence between creating TLS inside the destructors, it'll eventually give up.

So, basically, thread_local on Mac behaves identically to WTF::ThreadSpecific, with the exception that it is always destroyed first."

My take: Not a concern.

digit@ 2017: " I beliieve that thread_local was broken [on] Android before Android M"

My take: We're now well past this, not a concern.

lizeb@ 2020: "For my use case, which had a destructor etc., I encountered weird failures on Windows 7 and Lollipop bots."

My take: This may be due to the bug Victor referenced. Potentially not a concern. Hard to know without more use.

What's the current Chrome alternative? base::SequenceLocalStorageMap/base::SequenceLocalStorageSlot and base::ThreadLocal{Boolean,Pointer} and base::ThreadLocalStorage::StaticSlot/Slot?

Correct.

Can we give some guidelines for when to use the various alternatives here?

_If_ thread_local works correctly, I think it could replace all of the Thread* stuff (but not the Sequence* stuff). If it only works for POD/constinit types, then presumably we'd nuke ThreadLocal{Boolean,Pointer} (since those are PODs) and leave ThreadLocalStorage around for non-PODs.

PK

Lei Zhang

unread,
Jan 3, 2023, 1:51:47 PM1/3/23
to Daniel Cheng, Peter Kasting, cxx
FWIW, the "To view this discussion on the web" link in the email
footer has the entire thread.

On Mon, Jan 2, 2023 at 4:13 PM Daniel Cheng <dch...@chromium.org> wrote:
>
> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAF3XrKrsFr-J686uiWLFVmmMYEEmkyRLwegd%3D2RYmSt0cyYyFw%40mail.gmail.com.

dan...@chromium.org

unread,
Jan 3, 2023, 1:57:19 PM1/3/23
to Peter Kasting, Daniel Cheng, cxx
Thanks for the archaeology.

_If_ it is the case that PODs work but others don't, I don't think we should put ourselves in this position _unless_ our compiler plugin will tell you when you did it wrong and what to do instead with a Fixit. If we block on that, then it's fine IMO.
 

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Egor Pasko

unread,
Jan 3, 2023, 2:51:10 PM1/3/23
to Peter Kasting, Daniel Cheng, cxx
On Tue, Jan 3, 2023 at 7:47 PM Peter Kasting <pkas...@chromium.org> wrote:
On Mon, Jan 2, 2023 at 4:13 PM Daniel Cheng <dch...@chromium.org> wrote:
I don't remember the context at this point, since email retention policy has claimed it. What were the original problems with thread_local?

jbroman@ 2017: "on Mac I had an issue where the TLS lookup symbol stops resolving (read: segfaults) if you access a thread_local variable past a certain point (which the pthreads thread-specific stuff does not do)"

My take: This may be due to the bug Nico referenced. Possible workaround: disallow thread_local for exported variables.

2c: Exporting variables across shared libraries (even non-threadlocal) should generally be discouraged due to difficulties versioning them, and various surprises (like copy relocations). Though I think the use case of component builds on Android/Linux is narrow enough so that it is not a problem on these systems.
 
brettw@ 2017: "On Android, there are very few TLS slots (I found a reference to 60 somewhere) and were hitting the maximum. The base implementation maps a 256-entry TLS array into a single OS TLS slot to avoid this problem. I would assume that an STL implementation of TLS would just call the native OS APIs and if people start using them we would be running out of slots again."

This is no longer an issue. We switched to emutls, which has no limits for the number of TLS variables.
 
mosescu@ 2017: "There may be some valid concerns about thread_local but this should not be one of them (it is not technically feasible to map thread_local variables to individual TLS slots provided by the host OS). MSVC places thread local variables(*) in a special section (.tls) . A single TLS slot points to the base of the thread-specific copy of this section so accessing a thread_local variable is .tls_base[offset]. I'll let Linux/macOS experts chime in about how things are done on these platforms although it seems very close to the PE/COFF solution."

My take: Not a concern.

<back and forth between jbroman@ and lfg@ about destruction order/safety/reentrance/etc.>
lfg@ 2017: "- thread_local variables are destroyed before WTF::ThreadSpecific (which uses pthreads TLS).
- If a thread_local is used after it was destroyed, it'll be re-created. This means that, for example, if you have POD it'll be reinitialized according to its default initialization.
- The exact same thing happens to WTF::ThreadSpecific, it'll be re-created if it's used after it has been destroyed. It is also re-constructed according to its default initializers.
- If you have a circular dependence between creating TLS inside the destructors, it'll eventually give up.

So, basically, thread_local on Mac behaves identically to WTF::ThreadSpecific, with the exception that it is always destroyed first."

My take: Not a concern.

digit@ 2017: " I beliieve that thread_local was broken [on] Android before Android M"

My take: We're now well past this, not a concern.

lizeb@ 2020: "For my use case, which had a destructor etc., I encountered weird failures on Windows 7 and Lollipop bots."
 
FYI: Lollipop support has been removed.

My take: This may be due to the bug Victor referenced. Potentially not a concern. Hard to know without more use.

What's the current Chrome alternative? base::SequenceLocalStorageMap/base::SequenceLocalStorageSlot and base::ThreadLocal{Boolean,Pointer} and base::ThreadLocalStorage::StaticSlot/Slot?

Correct.

Can we give some guidelines for when to use the various alternatives here?

_If_ thread_local works correctly, I think it could replace all of the Thread* stuff (but not the Sequence* stuff). If it only works for POD/constinit types, then presumably we'd nuke ThreadLocal{Boolean,Pointer} (since those are PODs) and leave ThreadLocalStorage around for non-PODs.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Joe Mason

unread,
Jan 3, 2023, 7:04:16 PM1/3/23
to Egor Pasko, Peter Kasting, Daniel Cheng, cxx
Probably not a concern, but I'll mention it for the sake of completeness: on some platforms the thread_local implementation will malloc, on others it won't. So using thread_local variables from inside a malloc hook can lead to reentrancy on some platforms. Details at https://source.chromium.org/chromium/chromium/src/+/main:base/sampling_heap_profiler/poisson_allocation_sampler.cc;l=85;drc=abde1af1042ed21df1fd5ed8803ca84942065c4e. There could also be problems using them from an OOM handler or other context that shouldn't allocate, which might not be obvious to developers who are on platforms where thread_local doesn't allocate.

I'd expect anyone writing such constrained code would be aware of the pitfalls so this probably doesn't affect the decision.

Gabriel Charette

unread,
Jan 4, 2023, 1:03:07 PM1/4/23
to Joe Mason, Egor Pasko, Peter Kasting, Daniel Cheng, cxx
Has anyone tested the performance characteristics of replacing base::ThreadLocal variables with thread_local variables? Or is this strictly about aligning base:: with the C++ spec?

In general, base::SequenceLocalStorage is more aligned with CHromium's sequence-affine model, but there are still valid and performance-critical use cases for TLS (not that I'm aware of performance issues caused by base::ThreadLocal).

Valid use cases for TLS: //base context getters "for current thread" (set in the scope of each RunTask), thread/sequence checkers, thread-agnostic state aggregation (e.g. tracing: too many sequences, better to aggregate per thread).
Anything else that might be used from a sequence-affine context should store its state in sequence-local-storage (e.g. mojo).

Joe Mason

unread,
Jan 4, 2023, 1:22:40 PM1/4/23
to Gabriel Charette, Egor Pasko, Peter Kasting, Daniel Cheng, cxx
Is there any worry that devs will use thread_local (or base::ThreadLocal) when they really want sequence-local, leading to hard to find bugs? In my experience many beginners don't well understand the difference between Thread and Sequence (often assuming Sequence is just the Chrome word for Thread or that it's a subtype of thread).

I had been assuming that TLS was an advanced enough topic that people without a good grounding in Chrome's threading model wouldn't attempt it, but I have a vague intuition that `thread_local` is easy enough to drop in and standardized enough that people might use it without thinking hard.

Peter Kasting

unread,
Jan 4, 2023, 1:25:25 PM1/4/23
to Gabriel Charette, Joe Mason, Egor Pasko, Daniel Cheng, cxx
On Wed, Jan 4, 2023 at 10:03 AM Gabriel Charette <g...@chromium.org> wrote:
Has anyone tested the performance characteristics of replacing base::ThreadLocal variables with thread_local variables? Or is this strictly about aligning base:: with the C++ spec?

In my case, it's about facilitating currently-impossible use cases (`constinit thread_local`). I have not tried to compare perf on cases that could be served by either mechanism.

PK

dan...@chromium.org

unread,
Jan 4, 2023, 1:25:50 PM1/4/23
to Joe Mason, Gabriel Charette, Egor Pasko, Peter Kasting, Daniel Cheng, cxx
On Wed, Jan 4, 2023 at 1:22 PM 'Joe Mason' via cxx <c...@chromium.org> wrote:
Is there any worry that devs will use thread_local (or base::ThreadLocal) when they really want sequence-local, leading to hard to find bugs? In my experience many beginners don't well understand the difference between Thread and Sequence (often assuming Sequence is just the Chrome word for Thread or that it's a subtype of thread).

I had been assuming that TLS was an advanced enough topic that people without a good grounding in Chrome's threading model wouldn't attempt it, but I have a vague intuition that `thread_local` is easy enough to drop in and standardized enough that people might use it without thinking hard.

Hm.. I don't think we should fix that by avoiding a well known thing, if it's an issue. And it hasn't been raised as one until now with ThreadLocal. Unless this line of question brings out examples of risk and a means to address them, I am not sure that this could (or should?) be actionable, personally.
 

Peter Kasting

unread,
Jan 4, 2023, 1:41:34 PM1/4/23
to cxx
Summary of feedback thus far:
  • Known bug on Mac with wrong codegen for exported thread_locals. We should discourage exported variables in general.
  • General uncertainty whether there are other bugs.
  • thread_local might malloc, and thus cannot safely be used inside a malloc hook, OOM handler, etc. (Is this also true of ThreadLocal*?)
  • Thread-affine variables are generally less appropriate than sequence-affine ones. We should make sure there is clear guidance.
  • Perf versus ThreadLocal* is unknown, and probably important, since `thread_local` variables are often used in perf-critical systems.
Most of this could probably be dealt with via documentation and possibly PRESUBMITs, but the perf question probably needs addressing before approving this, and it would be nice if we had some general robustness testing (esp. for non-PODs) to see if there are other obvious codegen bugs.

Is anyone sufficiently qualified that they could try to do either of those things? I don't feel like I am :(

PK

Egor Pasko

unread,
Jan 4, 2023, 2:08:04 PM1/4/23
to Peter Kasting, cxx
On Wed, Jan 4, 2023 at 7:41 PM Peter Kasting <pkas...@chromium.org> wrote:
Summary of feedback thus far:
  • Known bug on Mac with wrong codegen for exported thread_locals. We should discourage exported variables in general.
  • General uncertainty whether there are other bugs.
  • thread_local might malloc, and thus cannot safely be used inside a malloc hook, OOM handler, etc. (Is this also true of ThreadLocal*?)
  • Thread-affine variables are generally less appropriate than sequence-affine ones. We should make sure there is clear guidance.
  • Perf versus ThreadLocal* is unknown, and probably important, since `thread_local` variables are often used in perf-critical systems.

On POSIX the ThreadLocal* classes sit on top of pthread_getspecific() and friends. The thread_local would fall back to it only in the worst case, if all the opportunities to rewrite the TLS model in the compiler and the linker fail. Hence at least on Android/Linux/MacOS thread_local has equivalent-or-faster performance.

On Windows someone should recheck, I think all Chrome's thread_local variables should live at constant offset from GS (on x64), nothing is faster than that.
 
Most of this could probably be dealt with via documentation and possibly PRESUBMITs, but the perf question probably needs addressing before approving this, and it would be nice if we had some general robustness testing (esp. for non-PODs) to see if there are other obvious codegen bugs.

Is anyone sufficiently qualified that they could try to do either of those things? I don't feel like I am :(

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Joe Mason

unread,
Jan 4, 2023, 2:43:28 PM1/4/23
to dan...@chromium.org, Gabriel Charette, Egor Pasko, Peter Kasting, Daniel Cheng, cxx
On Wed, Jan 4, 2023 at 1:25 PM <dan...@chromium.org> wrote:
On Wed, Jan 4, 2023 at 1:22 PM 'Joe Mason' via cxx <c...@chromium.org> wrote:
Is there any worry that devs will use thread_local (or base::ThreadLocal) when they really want sequence-local, leading to hard to find bugs? In my experience many beginners don't well understand the difference between Thread and Sequence (often assuming Sequence is just the Chrome word for Thread or that it's a subtype of thread).

I had been assuming that TLS was an advanced enough topic that people without a good grounding in Chrome's threading model wouldn't attempt it, but I have a vague intuition that `thread_local` is easy enough to drop in and standardized enough that people might use it without thinking hard.

Hm.. I don't think we should fix that by avoiding a well known thing, if it's an issue. And it hasn't been raised as one until now with ThreadLocal. Unless this line of question brings out examples of risk and a means to address them, I am not sure that this could (or should?) be actionable, personally.

I agree. I'm bringing this up as a potential risk that already exists, which I hadn't considered before Gab's comment. If anyone thinks the risk is big enough to worry about, we should address it by making sure the overall guidance around threading, sequences and locals is more clear.

Joe

Joe Mason

unread,
Jan 4, 2023, 2:52:50 PM1/4/23
to Egor Pasko, Peter Kasting, cxx
On Wed, Jan 4, 2023 at 2:08 PM Egor Pasko <pa...@chromium.org> wrote:


On Wed, Jan 4, 2023 at 7:41 PM Peter Kasting <pkas...@chromium.org> wrote:
Summary of feedback thus far:
  • Known bug on Mac with wrong codegen for exported thread_locals. We should discourage exported variables in general.
  • General uncertainty whether there are other bugs.
  • thread_local might malloc, and thus cannot safely be used inside a malloc hook, OOM handler, etc. (Is this also true of ThreadLocal*?)
Egor's response below also answers this. 
  • Thread-affine variables are generally less appropriate than sequence-affine ones. We should make sure there is clear guidance.
  • Perf versus ThreadLocal* is unknown, and probably important, since `thread_local` variables are often used in perf-critical systems.

On POSIX the ThreadLocal* classes sit on top of pthread_getspecific() and friends. The thread_local would fall back to it only in the worst case, if all the opportunities to rewrite the TLS model in the compiler and the linker fail. Hence at least on Android/Linux/MacOS thread_local has equivalent-or-faster performance.

On Windows someone should recheck, I think all Chrome's thread_local variables should live at constant offset from GS (on x64), nothing is faster than that.

This would explain why thread_local can malloc on POSIX platforms - it falls back to pthread_getspecific() in the worst case - but not on Windows. And ThreadLocal* also always mallocs.

My concern is, since a Chrome-specific base::ThreadLocal function isn't as familiar to developers as a standard function, people writing memory-constrained code would naturally investigate it fully to see if it mallocs (and find out that it does), while they might assume that they know the semantics of thread_local since it's standard but not know the nuances of it on POSIX. I'm not strongly concerned about this, since such code is rare, but it would be good to document it clearly somewhere.

Roland McGrath

unread,
Jan 4, 2023, 3:00:24 PM1/4/23
to Joe Mason, Egor Pasko, Peter Kasting, cxx
On some implementations of ELF TLS, which underlies thread_local, the implicit runtime code can malloc as well.  That happens for the first access to a given module's TLS segment in each thread, depending on the way things are built and some other factors. It's possible to arrange to avoid this being possible at all when dynamic loading after startup isn't needed, and it's also possible (though clearly error-prone) to try to make sure that every thread accesses each module's TLS block once when it starts so that the malloc was already done and won't need to be reentered from a signal handler later on that thread.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Benoit Lize

unread,
Jan 5, 2023, 5:52:30 AM1/5/23
to cxx, Roland McGrath, Egor Pasko, Peter Kasting, cxx, Joe Mason
I think there are separate issues for different use cases:
  • For very low-level code (i.e. some of //base, the memory allocator, etc.), there are many, many dragons on the general TLS area. We already use thread_local in several instances there on some platforms, because it is significantly faster (very platform-specific, but on Linux/CrOS where Chrome is typically a single executable, the difference is real), guaranteed to never allocate (on these platforms), etc.
  • Regular code that "just" wants fast and idiomatic thread-local variables that can be constinit (for instance as Peter mentions)

I would argue that for the first case, thread_local is fine, though it's your responsibility to make sure it doesn't explode. For the second one, there are cases where we (used) to do ugly things in the name of performance (for instance, looking at the stack pointer, compare it with the main thread stack pointer, and return that we "may" be on the main thread, in blink). It would likely be better to have a more centralized way to handle the details than at each call site (found a couple places already that handle the details, in WTF, oilpan, //base).

All of that to say: I would lean towards "allow thread_local, with a big disclaimer that it should only be used when you really need it, and have read the documentation". So perhaps something that should have a presubmit attached to it, and keep the guidance to use //base's primitives when you can?



Egor Pasko

unread,
Jan 5, 2023, 10:05:39 AM1/5/23
to Benoit Lize, cxx, Roland McGrath, Peter Kasting, Joe Mason
+1 to special treatment of thread_local for all code running below allocator shim. The reentrancy and its platform-specific nature adds complexity to the allocator shim, but not too much in comparison to the rest of it. 

IMO we should allow thread_local for the rest of code due to its speed benefits. The constraints look compact to describe:
1. restrict thread_local only for POD/constinit
2. never export thread_local vars outside executables/DSOs, even for component build

Can we check for these conditions in CQ?

For (2) I can imagine these:
* exported thread-local would likely not even link on Android/emutls :)
* on Linux (and maybe later Android/ELFTLS) objdump and look for TLS relocations
* on OS_APPLE - something similar?

Can we remove ThreadLocal{Boolean,Pointer} then?

PS: I'm somewhat sad with the fact that currently thread_local is disproportionally slower on Android than on other systems. Though we would probably be able to make it better later once this becomes a problem of too many thread_locals. The current speedup of thread_local is quite nice, so let's not block it?

dan...@chromium.org

unread,
Jan 5, 2023, 10:13:41 AM1/5/23
to Egor Pasko, Benoit Lize, cxx, Roland McGrath, Peter Kasting, Joe Mason
POD doesn't exist anymore in C++20 (https://en.cppreference.com/w/cpp/named_req/PODType), so could we be a bit more precise about what the requirements are without referring to POD?
One or more of:
- Default constructible?
- Constexpr constructible?
- Trivial? Scalar? Or StandardLayout?

If someone has a spec reference to explain why we're limiting to POD (pre C++20) that would help and I could propose something.

Egor Pasko

unread,
Jan 5, 2023, 11:22:37 AM1/5/23
to dan...@chromium.org, Benoit Lize, cxx, Roland McGrath, Peter Kasting, Joe Mason
On Thu, Jan 5, 2023 at 4:13 PM <dan...@chromium.org> wrote:
POD doesn't exist anymore in C++20 (https://en.cppreference.com/w/cpp/named_req/PODType), so could we be a bit more precise about what the requirements are without referring to POD?
One or more of:
- Default constructible?
- Constexpr constructible?
- Trivial? Scalar? Or StandardLayout?

If someone has a spec reference to explain why we're limiting to POD (pre C++20) that would help and I could propose something.

Something without destructors? :)

dan...@chromium.org

unread,
Jan 5, 2023, 11:23:38 AM1/5/23
to Egor Pasko, Benoit Lize, cxx, Roland McGrath, Peter Kasting, Joe Mason
On Thu, Jan 5, 2023 at 11:22 AM Egor Pasko <pa...@chromium.org> wrote:


On Thu, Jan 5, 2023 at 4:13 PM <dan...@chromium.org> wrote:
POD doesn't exist anymore in C++20 (https://en.cppreference.com/w/cpp/named_req/PODType), so could we be a bit more precise about what the requirements are without referring to POD?
One or more of:
- Default constructible?
- Constexpr constructible?
- Trivial? Scalar? Or StandardLayout?

If someone has a spec reference to explain why we're limiting to POD (pre C++20) that would help and I could propose something.

Something without destructors? :)

std::is_trivially_destructible then?

Egor Pasko

unread,
Jan 5, 2023, 11:39:41 AM1/5/23
to dan...@chromium.org, Benoit Lize, cxx, Roland McGrath, Peter Kasting, Joe Mason
On Thu, Jan 5, 2023 at 5:23 PM <dan...@chromium.org> wrote:
On Thu, Jan 5, 2023 at 11:22 AM Egor Pasko <pa...@chromium.org> wrote:


On Thu, Jan 5, 2023 at 4:13 PM <dan...@chromium.org> wrote:
POD doesn't exist anymore in C++20 (https://en.cppreference.com/w/cpp/named_req/PODType), so could we be a bit more precise about what the requirements are without referring to POD?
One or more of:
- Default constructible?
- Constexpr constructible?
- Trivial? Scalar? Or StandardLayout?

If someone has a spec reference to explain why we're limiting to POD (pre C++20) that would help and I could propose something.

Something without destructors? :)

std::is_trivially_destructible then?

SGTM

Peter Kasting

unread,
Jan 5, 2023, 12:12:55 PM1/5/23
to Egor Pasko, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
And why are we limiting to things that are trivially destructible?

PK

dan...@chromium.org

unread,
Jan 5, 2023, 12:28:20 PM1/5/23
to Peter Kasting, Egor Pasko, Benoit Lize, cxx, Roland McGrath, Joe Mason
I am also wondering. There was some references up thread about it needing to be POD, but I didn't follow the root cause of that claim either. I'd love a reference to the spec to understand more.

Daniel Cheng

unread,
Jan 5, 2023, 12:31:46 PM1/5/23
to dan...@chromium.org, Peter Kasting, Egor Pasko, Benoit Lize, cxx, Roland McGrath, Joe Mason
Isn't TLS essentially static, and the style guide forbids non-PODs with static storage duration?

Daniel

K. Moon

unread,
Jan 5, 2023, 12:53:15 PM1/5/23
to Daniel Cheng, dan...@chromium.org, Peter Kasting, Egor Pasko, Benoit Lize, cxx, Roland McGrath, Joe Mason
https://en.cppreference.com/w/cpp/language/storage_duration#Storage_duration says thread_local's duration is tied to the lifetime of the thread. I can see where that would get a bit complicated.

Egor Pasko

unread,
Jan 5, 2023, 12:56:21 PM1/5/23
to Daniel Cheng, dan...@chromium.org, Peter Kasting, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 6:31 PM Daniel Cheng <dch...@chromium.org> wrote:
Isn't TLS essentially static, and the style guide forbids non-PODs with static storage duration?
 
+1 (essentially the same concerns as with static storage: folks like me would be confused by initialization/destruction order issues).

Possible other causes I can remember:

1. In previous discussions we encountered cryptic destruction problems on Win7. 

2. Google styleguide requires ABSL_CONST_INIT for thread_local. However, it then follows with "thread_local variable instances are not destroyed before their thread terminates, so they do not have the destruction-order issues of static variables". Thanks, dear styleguide, they do not have destruction-order issues, but I am still not following why it is so. Disallow?

Peter Kasting

unread,
Jan 5, 2023, 1:15:25 PM1/5/23
to Egor Pasko, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
Maybe it's saying that the destructors of thread_local variables never run (i.e. they're all implicitly No destructor), so it's ok to use any constinit, even not trivially destructible ones?

PK

Peter Boström

unread,
Jan 5, 2023, 1:23:34 PM1/5/23
to Peter Kasting, Egor Pasko, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
Implicit NoDestructor seems scary or at least icky to the peanut gallery.

K. Moon

unread,
Jan 5, 2023, 1:24:51 PM1/5/23
to Peter Boström, Peter Kasting, Egor Pasko, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
I think this is referring to this statement in https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables:
Moreover, when a program starts threads that are not joined at exit, those threads may attempt to access objects after their lifetime has ended if their destructor has already run.

Thread-local variables won't be accessed by other threads (under normal circumstances, anyway), so you avoid the risk of having a thread that accesses them after destruction (which can occur with program-wide destruction). (If you pass a pointer to a thread-local variable to another thread, I assume all bets are off again.) I think any other drawbacks still apply.

Threads can terminate (both before and when a program exits), so it's not the same as no destructor.

Peter Kasting

unread,
Jan 5, 2023, 1:27:13 PM1/5/23
to Egor Pasko, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
Ah I figured it out. The static variables section of the guide talks about non-joined threads still running after static destruction, and this potentially UAFing static variables. That particular risk doesn't happen with thread_local objects.

I think we should allow thread_locals as long as they are not exported.

PK

Egor Pasko

unread,
Jan 5, 2023, 1:54:07 PM1/5/23
to Peter Kasting, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 7:27 PM Peter Kasting <pkas...@chromium.org> wrote:
Ah I figured it out. The static variables section of the guide talks about non-joined threads still running after static destruction, and this potentially UAFing static variables. That particular risk doesn't happen with thread_local objects.

Ah, thanks, Peter. This makes sense.
 
I think we should allow thread_locals as long as they are not exported.
 
For different thread_local variables do we know the order of their destructors when the thread joins? If it is unspecified, then the convenience of running destructors may be overshadowed by this nondeterminism?

K. Moon

unread,
Jan 5, 2023, 1:56:23 PM1/5/23
to Egor Pasko, Peter Kasting, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
I believe thread_local variables will destruct in reverse-initialization order for each thread. Since this shouldn't be observable from other threads, this shouldn't cause a problem other than order-dependencies within a single thread.

K. Moon

unread,
Jan 5, 2023, 2:03:18 PM1/5/23
to Egor Pasko, Peter Kasting, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
Although the code gen for the non-trivial case is considerably more complicated: https://godbolt.org/z/PnonxEGKe

Mike Wittman

unread,
Jan 5, 2023, 4:35:57 PM1/5/23
to K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Dana Jansens, Benoit Lize, cxx, Roland McGrath, Joe Mason
+1 to enforcing the distinction that Benoit proposed, having encountered TLS dragons working with the sampling profiler. It's exceedingly easy to make invalid assumptions about core Chrome services being available (or use APIs that make those assumptions) in non-trivial TLS object destructors. For long-lived threads TLS destruction happens pretty late in Chrome shutdown so use of thread_local objects with non-trivial destructors risks introducing hard-to-debug shutdown crashes.

-Mike

dan...@chromium.org

unread,
Jan 5, 2023, 4:41:12 PM1/5/23
to Mike Wittman, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 4:35 PM Mike Wittman <wit...@chromium.org> wrote:
+1 to enforcing the distinction that Benoit proposed, having encountered TLS dragons working with the sampling profiler. It's exceedingly easy to make invalid assumptions about core Chrome services being available (or use APIs that make those assumptions) in non-trivial TLS object destructors. For long-lived threads TLS destruction happens pretty late in Chrome shutdown so use of thread_local objects with non-trivial destructors risks introducing hard-to-debug shutdown crashes.

I'm trying to fully grok Benoit's proposal. Is it that we
- allow thread_local but recognize it's dangerous (like atomics)
- provide base primitives built on thread_local that require the thread_local's initial value to be constexpr evaluated?

Mike Wittman

unread,
Jan 5, 2023, 6:01:54 PM1/5/23
to dan...@chromium.org, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
For context I think the risk falls in three tiers:
  1. std::is_trivially_destructible thread_local objects: safe
  2. non-std::is_trivially_destructible thread_local objects on owned threads with clearly bounded lifetimes: reasonably safe assuming the thread gets shut down/joined along with the owning code at a well-defined point in Chrome shutdown
  3. non-std::is_trivially_destructible thread_local objects on general threads: risky since it's hard to know what dependencies will be available at destruction time. You really shouldn't do this unless you Know What You're Doing™.
We could allow 1 only as a conservative approach with 2 and 3 on an exceptional basis. I'd expect most instances of use case 2 would be satisfied by SequenceLocalStorage; it probably should be preferred there since the path to object destruction is entirely controlled by Chrome code so it's easier to reason about destruction ordering.

I'm unsure of the best technical approach to enforce though. :)

Gabriel Charette

unread,
Jan 5, 2023, 6:55:26 PM1/5/23
to Mike Wittman, dan...@chromium.org, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
Reason to prefer std::is_trivial_destructible: any non-trivial destructor may use a thread_local variable, causing headaches in TLS destruction.

base/threading/thread_local_storage.cc has some docs on how we mitigate this in //base (multi-pass destruction), not sure if thread_local supports that.

Matt Denton

unread,
Jan 5, 2023, 6:57:04 PM1/5/23
to Mike Wittman, dan...@chromium.org, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 3:01 PM Mike Wittman <wit...@chromium.org> wrote:
For context I think the risk falls in three tiers:
  1. std::is_trivially_destructible thread_local objects: safe

Just *accessing* a TLS variable can cause an allocation when using glibc, which is unexpected and therefore unsafe in some contexts, e.g. allocators and allocation hooks (sampling profiler, GWP-ASAN). One example is the crashes we saw when calling base::PlatformThread::CurrentId() after forking the browser process: the thread id was cached in a TLS variable, but accessing the TLS variable caused an allocation, which is forbidden after forking a multithreaded process. (see CL)
 

Peter Kasting

unread,
Jan 5, 2023, 6:59:59 PM1/5/23
to Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
Since thread_local variables are destroyed in reverse init order, the only way one should attempt to access another that's already been destroyed is if you init A, then B, then hand &B to A. There's nothing really specific to thread locals about this and I don't think it should influence our decision about what kinds of thread locals to allow.

PK

Gabriel Charette

unread,
Jan 5, 2023, 7:18:20 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 6:59 PM Peter Kasting <pkas...@chromium.org> wrote:
Since thread_local variables are destroyed in reverse init order, the only way one should attempt to access another that's already been destroyed is if you init A, then B, then hand &B to A. There's nothing really specific to thread locals about this and I don't think it should influence our decision about what kinds of thread locals to allow.

Agreed that it's not that big of a deal, but we also don't know if thread_local supports this. Just mentioning why std::is_trivially_destructible might be a desired requirement as there were questions up-thread about why it would even matter.

Peter Kasting

unread,
Jan 5, 2023, 8:53:55 PM1/5/23
to Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
What "this" are you concerned thread_local may not support, that's distinct from what non-thread_local variables support? They can't safely refer to already-destroyed variables either.

PK

Matt Denton

unread,
Jan 5, 2023, 9:08:44 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 5:53 PM Peter Kasting <pkas...@chromium.org> wrote:
What "this" are you concerned thread_local may not support, that's distinct from what non-thread_local variables support? They can't safely refer to already-destroyed variables either.
 
I'm not 100% sure but I wonder if it goes something like this:
thread_local int a;
struct B { ~B() { a = 5; } };
thread_local B b;

This feels fine since we are not accessing any pointers, except that "a" is actually allocated (even though it looks like a global), and the compiler handles the implicit dereference for you.
 

Mike Wittman

unread,
Jan 5, 2023, 9:09:32 PM1/5/23
to Matt Denton, dan...@chromium.org, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 3:57 PM Matt Denton <mpde...@google.com> wrote:
On Thu, Jan 5, 2023 at 3:01 PM Mike Wittman <wit...@chromium.org> wrote:
For context I think the risk falls in three tiers:
  1. std::is_trivially_destructible thread_local objects: safe

Just *accessing* a TLS variable can cause an allocation when using glibc, which is unexpected and therefore unsafe in some contexts, e.g. allocators and allocation hooks (sampling profiler, GWP-ASAN). One example is the crashes we saw when calling base::PlatformThread::CurrentId() after forking the browser process: the thread id was cached in a TLS variable, but accessing the TLS variable caused an allocation, which is forbidden after forking a multithreaded process. (see CL)

Right, but those contexts are obvious (or should be, via code documentation), very rare, and reasonably can be considered expert-only. Having written the allocation-free code for the sampling profiler, I'd argue that we shouldn't be making any assumptions about allocations within standard library code (or any other dependencies) in those contexts without manually verifying that's the case, and also that's likely to continue being the case with future library updates.

Since most Chrome engineers will never touch those contexts I think it's reasonable to consider thread_local with trivially-destructible objects safe for general usage.

Matt Denton

unread,
Jan 5, 2023, 9:14:55 PM1/5/23
to Mike Wittman, dan...@chromium.org, K. Moon, Egor Pasko, Peter Kasting, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 6:09 PM Mike Wittman <wit...@chromium.org> wrote:
On Thu, Jan 5, 2023 at 3:57 PM Matt Denton <mpde...@google.com> wrote:
On Thu, Jan 5, 2023 at 3:01 PM Mike Wittman <wit...@chromium.org> wrote:
For context I think the risk falls in three tiers:
  1. std::is_trivially_destructible thread_local objects: safe

Just *accessing* a TLS variable can cause an allocation when using glibc, which is unexpected and therefore unsafe in some contexts, e.g. allocators and allocation hooks (sampling profiler, GWP-ASAN). One example is the crashes we saw when calling base::PlatformThread::CurrentId() after forking the browser process: the thread id was cached in a TLS variable, but accessing the TLS variable caused an allocation, which is forbidden after forking a multithreaded process. (see CL)

Right, but those contexts are obvious (or should be, via code documentation), very rare, and reasonably can be considered expert-only. Having written the allocation-free code for the sampling profiler, I'd argue that we shouldn't be making any assumptions about allocations within standard library code (or any other dependencies) in those contexts without manually verifying that's the case, and also that's likely to continue being the case with future library updates.

Since most Chrome engineers will never touch those contexts I think it's reasonable to consider thread_local with trivially-destructible objects safe for general usage.

The contexts being: allocators/allocation hooks and between fork() and exec()? I think that's true, though I also think these are important footnotes since it's very non-obvious that accessing thread_local allocates--this causes bugs over and over in low-level code.

Peter Kasting

unread,
Jan 5, 2023, 9:31:42 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
The code you write is guaranteed safe, though. b is destroyed before a, not after it.

PK

Peter Kasting

unread,
Jan 5, 2023, 9:32:10 PM1/5/23
to Mike Wittman, Matt Denton, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
I still don't see the benefit of restricting to trivially-destructible objects.

PK

Matt Denton

unread,
Jan 5, 2023, 9:38:18 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 6:31 PM Peter Kasting <pkas...@chromium.org> wrote:
The code you write is guaranteed safe, though. b is destroyed before a, not after it.

Are there no situations in which a is destroyed before b, while b references a in its destructor?

Peter Kasting

unread,
Jan 5, 2023, 9:43:40 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
There are, but they're not specific to thread_local in any way, as far as I know. They're run of the mill dangling pointer/reference bugs that you can get from passing in a pointer/ref from the allocated-later object to the allocated-earlier one.

PK

Matt Denton

unread,
Jan 5, 2023, 9:48:30 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
void FuncA() {
  thread_local int a;
  a = 5;
}
struct B { ~B() { FuncA(); } };
thread_local B b;
FuncA();

There's no obvious pointer deref in this one, but IIUC this is UB.
I suppose this particular situation comes up in statics, but don't we have lots of special rules for status? base::NoDestructor etc?

Peter Kasting

unread,
Jan 5, 2023, 9:56:40 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
I do not think that's ub either, especially since a is not static.

PK

Matt Denton

unread,
Jan 5, 2023, 10:02:45 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 6:56 PM Peter Kasting <pkas...@chromium.org> wrote:
I do not think that's ub either, especially since a is not static.

Peter Kasting

unread,
Jan 5, 2023, 10:06:51 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
No, you're correct and I was wrong. Your example was indeed ub. I misunderstood how function-scoped thread_local lifetimes worked.

PK

Peter Kasting

unread,
Jan 5, 2023, 10:07:35 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
(note, though, that trivial destructors don't save you here; this is ub regardless of type or what the function does with the variable.)

PK

Matt Denton

unread,
Jan 5, 2023, 10:08:45 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 7:07 PM Peter Kasting <pkas...@chromium.org> wrote:
(note, though, that trivial destructors don't save you here; this is ub regardless of type or what the function does with the variable.)

If B doesn't use its own destructor, isn't there no UB?

Peter Kasting

unread,
Jan 5, 2023, 10:15:31 PM1/5/23
to Matt Denton, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
Yes; I was thinking only that a trivially destructible a does not save you. But you're right that if _all_ thread locals and all static/globals are forced to be trivially destructible, then I think no one can call the function (or anything else) once the thread reaches thread_local-destruction phase.

I probably shouldn't respond to these emails while I'm also trying to parent a five year old :P

PK

Matt Denton

unread,
Jan 5, 2023, 10:20:23 PM1/5/23
to Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, Benoit Lize, cxx, Roland McGrath, Joe Mason
On Thu, Jan 5, 2023 at 7:15 PM Peter Kasting <pkas...@chromium.org> wrote:
Yes; I was thinking only that a trivially destructible a does not save you. But you're right that if _all_ thread locals and all static/globals are forced to be trivially destructible, then I think no one can call the function (or anything else) once the thread reaches thread_local-destruction phase.

My only point is that accessing other thread_locals in a destructor is a non-obvious pointer deref and can unexpectedly be UB. Whether or not that means we should outlaw all destructors isn't something I have a strong opinion about.
 
I probably shouldn't respond to these emails while I'm also trying to parent a five year old :P

I have a hard enough time responding to emails while entertaining my dog, let alone a five year old human. :P

Benoit Lize

unread,
Jan 6, 2023, 5:45:49 AM1/6/23
to Matt Denton, Peter Kasting, Gabriel Charette, Mike Wittman, Dana Jansens, K. Moon, Egor Pasko, Daniel Cheng, cxx, Roland McGrath, Joe Mason
Since we've seen several issues with destructor ordering of TLS variables with spooky action at a distance (one example in PartitionAlloc on Windows, https://bugs.chromium.org/p/chromium/issues/detail?id=1159411), I would tend to prefer forbidding types with non-trivial destructors in thread_local, since (as mentioned above):
  1. It's consistent with our policy regarding static destructors
  2. We've had several nasty issues there


It is loading more messages.
0 new messages