The global locale on highly parallel systems

63 views
Skip to first unread message

g.a....@gmail.com

unread,
Sep 29, 2016, 6:51:07 AM9/29/16
to ISO C++ Standard - Discussion
I write C++ applications which run on servers with a lot of cores.  We make use of std::locale's, but we don't change them during  runtime (at least we only ever change global system locale once). 

We suffer from the fact that accessing the global locale requires an atomic reference count implementation.  This pops up when a developer calls any kind of streaming operation or one of a large number of boost string algorithms in an area that gets called by a large number of threads.  It's easy to construct the problem in an artificial test, even on a system with just a few cores -- for example spawn a lot of threads that call boost::iequal repeatedly.  The contention for the atomic reference count dwarfs the time spent doing any work.

When we're lucky we discover these problems during QA.  Often we discover it when our customers have a mysterious slowdown.  Typically on our 32 core development machines things run smoothly, but our customers run on 128 or more cores. 

I looked at the standard and gcc's implementation of locale's and given the standard definition I don't see an obvious solution to the problem.  It's unfortunate because the locale based functionality would be ideal for our systems, but using them entails a significant risk for us.

This is now the third company/project I've worked on where this is a problem, and where we do not need the ability to change the global locale.  We set the locale once on system startup, and it doesn't change for the lifetime of the program.  Otherwise we have to solve the problems as they crop up. 

Since these problems occur in libraries where it's not always obvious to us that the locale's are being accessed, I can't think of a global solution that wouldn't involve changing the standard.  In a world where the number of cores keeps increasing, where  I keep running into this problem, I imagine a standard solution would be of general interest.  The solution that occurs to me is a mechanism to  inform the compiler that, hey, the global locale can be considered as constant.   It seems to me this would necessitate std::locale::global throwing an exception if called on a const_locale, as I can't think of a way to determine this at compile time.  On the other hand, that would be acceptable to me, and I suppose to others in my situation.     On the other hand maybe others see better solutions?

Does anyone have a solution to this problem that doesn't involve changing the standard?  Would it be worth trying to craft a proposal?

Andrey Semashev

unread,
Sep 29, 2016, 7:18:19 AM9/29/16
to std-dis...@isocpp.org
On 09/29/16 13:51, g.a....@gmail.com wrote:
> I write C++ applications which run on servers with a lot of cores. We
> make use of std::locale's, but we don't change them during runtime (at
> least we only ever change global system locale once).
>
> We suffer from the fact that accessing the global locale requires an
> atomic reference count implementation. This pops up when a developer
> calls any kind of streaming operation or one of a large number of boost
> string algorithms in an area that gets called by a large number of
> threads. It's easy to construct the problem in an artificial test, even
> on a system with just a few cores -- for example spawn a lot of threads
> that call boost::iequal repeatedly. The contention for the atomic
> reference count dwarfs the time spent doing any work.

Did you try using thread_local locale objects? As you say, you only set
the global locale once in your program startup, you could then
initialize thread-specific copies from the global locale when you start
worker threads and then use the copies in all your algorithms. Although
that would not help if you're constructing a lot of streams - the stream
would still default-construct a locale on its construction.

g.a....@gmail.com

unread,
Sep 29, 2016, 7:34:13 AM9/29/16
to ISO C++ Standard - Discussion


Did you try using thread_local locale objects? As you say, you only set
the global locale once in your program startup, you could then
initialize thread-specific copies from the global locale when you start
worker threads and then use the copies in all your algorithms. Although
that would not help if you're constructing a lot of streams - the stream
would still default-construct a locale on its construction.


That's the usual workaround.  For example, with boost::iequal i created a wrapper function, and stored a thread local copy of the locale, which I pass to the boost::iequal -- problem solved until the next case.  Another time it happened I had to create a thread local copy of a stringstream object, and always clear the stringstream. 

 I'm not really happy with that situation though.  It would be nice if our developers could use the native std and boost libraries (like stringstream) without worrying about the possibility of a contention problem.

In 90% of the cases, the contention isn't a problem because the concurrent access is at a sane level.  If we always create thread local instances, or write wrappers, that smacks of premature optimization.  On the other hand it's often difficult to know in advance if the code you just wrote will wind up being used in such a way.  Sometimes we make a release and things run smoothly until the customer changes a configuration variable, and suddenly contention becomes an issue.

On the other hand, at leas in my job experience, one time setting of the  global locale has been  the norm.  I've only once had a system where changing the global locale object was part of the internationalization design, and even then we could have easily handled it otherwise.

g.a....@gmail.com

unread,
Sep 29, 2016, 7:41:40 AM9/29/16
to ISO C++ Standard - Discussion, g.a....@gmail.com

That's the usual workaround.  For example, with boost::iequal i created a wrapper function, and stored a thread local copy of the locale, which I pass to the boost::iequal -- problem solved until the next case.  Another time it happened I had to create a thread local copy of a stringstream object, and always clear the stringstream. 

 I'm not really happy with that situation though.  It would be nice if our developers could use the native std and boost libraries (like stringstream) without worrying about the possibility of a contention problem.

In 90% of the cases, the contention isn't a problem because the concurrent access is at a sane level.  If we always create thread local instances, or write wrappers, that smacks of premature optimization.  On the other hand it's often difficult to know in advance if the code you just wrote will wind up being used in such a way.  Sometimes we make a release and things run smoothly until the customer changes a configuration variable, and suddenly contention becomes an issue.

On the other hand, at leas in my job experience, one time setting of the  global locale has been  the norm.  I've only once had a system where changing the global locale object was part of the internationalization design, and even then we could have easily handled it otherwise.

I think it's worth mentioning, both here and at the second last project we had code that was a few years old, and took advantage of stringstreams and so on.  Things worked fine for years -- but the world is changing.  It used to be our customer's servers had 16 hardware threads and my dev machine had one or two.  Now the servers  have 128-256 hardware threads, and my computer at home has 8.  Often we discover a problem when our customer buys a new server.

Thus I think it would be worthwhile to consider  C++ locale access.  I'd be happy if someone can come up with a better solution than my idea, but I hope I'm not alone in seeing this as a problem.

Andrey Semashev

unread,
Sep 29, 2016, 7:55:52 AM9/29/16
to std-dis...@isocpp.org
Well, according to [locale]/9, the implementation is already permitted
(and encouraged) to have per-thread global locales, so this is a
question of QoI.

What I think is lacking in the standard is the ability to specify a
locale on a stream construction. I think that could be a useful proposal.

Demi Obenour

unread,
Sep 29, 2016, 8:38:28 AM9/29/16
to 'Edward Catmur' via ISO C++ Standard - Discussion

I think the solution is to mandate that locales be per-thread, rather than merely allowing it.


--

--- You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Tony V E

unread,
Sep 29, 2016, 4:33:24 PM9/29/16
to std-discussion
On Thu, Sep 29, 2016 at 6:51 AM, <g.a....@gmail.com> wrote:
I write C++ applications which run on servers with a lot of cores.  We make use of std::locale's, but we don't change them during  runtime (at least we only ever change global system locale once). 

We suffer from the fact that accessing the global locale requires an atomic reference count implementation. 


What kind of hardware?  I think this could be minimized to an acquire-read, which on x86 is basically free.
 

--
Be seeing you,
Tony

Thiago Macieira

unread,
Sep 29, 2016, 5:00:58 PM9/29/16
to std-dis...@isocpp.org
On quinta-feira, 29 de setembro de 2016 14:55:48 PDT Andrey Semashev wrote:
> Well, according to [locale]/9, the implementation is already permitted
> (and encouraged) to have per-thread global locales, so this is a
> question of QoI.

Also QoI: there could also be a way for you to specify that locales are never
changed thread-unsafely (or at all), so the runtime can simply forego the
locking and atomic reference counting.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Reply all
Reply to author
Forward
0 new messages