absl::Mutex in Chromium?

Alexei Svitkine

unread,

Oct 12, 2021, 5:42:59 PM10/12/21

to cxx

Hello,

As part of my investigation on crbug.com/1123627, I'm interested in using a Read/Write lock in base::StatisticsRecorder.

It looks like absl::Mutex fits this case - which is not supported by base::Lock. Amusingly, the Chromium site page has sample code on how to "roll your own" R/W lock:

https://www.chromium.org/developers/lock-and-condition-variable

I assume we can all agree that rolling your own r/w lock doesn't seem ideal. What would be the process for getting approved to use absl::Mutex in this specific case?

As an aside, C++17 will introduce std::shared_mutex, which also handles this case:

https://en.cppreference.com/w/cpp/thread/shared_mutex

But it looks like we're not on C++17 yet?

(Apologies if there's a better place to discuss absl than cxx@ - let me know.)

-Alexei

K. Moon

unread,

Oct 12, 2021, 6:33:19 PM10/12/21

to Alexei Svitkine, cxx

Yep, this is the designated place to discuss Abseil. I recall mutex implementations and reader-writer locks in particular are controversial, though, so this probably will be a long discussion...

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAKFU2SAJg%2By4FUU-qTWkhq1XAeCWB2GKo%2BjSUbkMFdVK%2Bakxcg%40mail.gmail.com.

K. Moon

unread,

Oct 12, 2021, 6:38:22 PM10/12/21

to Alexei Svitkine, cxx

As an example, internal TotW #197 cautions against using the reader lock mechanism in performance-sensitive code. Your example seems like the situation the TotW is warning against.

Peter Kasting

unread,

Oct 12, 2021, 9:48:19 PM10/12/21

to K. Moon, Alexei Svitkine, cxx

IMO, better to have a dangerous tool than no tool.

PK

Peter Boström

unread,

Oct 12, 2021, 10:34:40 PM10/12/21

to Peter Kasting, Alexei Svitkine, K. Moon, cxx

Would the "no tool" here would just be replacing rw locks with regular mutexes or everyone rolls their own rw locks?

On Tue, Oct 12, 2021 at 6:48 PM 'Peter Kasting' via cxx <c...@chromium.org> wrote:

IMO, better to have a dangerous tool than no tool.

PK

--

You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAAHOzFDX6Kj9wjSThGh1U3rHjAL_HDhicS2mh3YpUYwVUCnWpw%40mail.gmail.com.

Peter Kasting

unread,

Oct 13, 2021, 7:23:25 AM10/13/21

to Peter Boström, Alexei Svitkine, K. Moon, cxx

On Tue, Oct 12, 2021 at 7:34 PM Peter Boström <pb...@chromium.org> wrote:

Would the "no tool" here would just be replacing rw locks with regular mutexes or everyone rolls their own rw locks?

Either way, I think?

PK

Alexei Svitkine

unread,

Oct 13, 2021, 9:14:49 AM10/13/21

to Peter Kasting, Peter Boström, K. Moon, cxx

Thanks for the pointer to the TOTW, I haven't seen it.

To summarize, the argument against R/W lock is that the fixed cost of acquiring/releasing it is higher than a regular lock, so it may actually hurt performance - and provides some general guidance to avoid it for short-lived lock operations.

That's a fair point, though in the case where I'm interested in using it, it's actually not "performance sensitive code". This probably sounds surprising - since why would I even care about changing the lock if it's not performance critical? Let me explain.

1. The lock in question is the one used to look up Histogram objects in a global map. This is needed when a histogram is logged (thousands of call sites in Chromium), BUT not all callsites are equally affected. In particular, histograms logged via histogram_functions.h APIs do the look up (and thus acquire the lock) each time, whereas call sites using histogram_macros.h actually only look up the histogram once, at the first call, and cache in a function-scoped static - so only interact with the lock once. And we recommend the histogram macros for performance critical code (but recommend against them for other call sites - since there are disadvantages such as increased binary size and being more error prone, if someone passes in a dynamic name).

So to that effect, we're actually more focused on non-performance-sensitive call sites. The small overhead of R/W lock doesn't seem like an issue here. (BTW, in the comments on that TOTW, the perf difference between the two was measured to be around 1.5% which is quite negligible - though I'd imagine it may vary by platform. OTOH, on the bug I'm looking at, we noticed that base::Lock actually uses an OS RW lock behind the scenes on Windows.)

2. So why do we care if it's not performance critical?

Well, the bug in question is a priority inversion issue. That is, histogram APIs can and are used from different threads in Chromium. So, the observed situation is that a low-priority thread is descheduled while holding the lock, while the main thread is trying to acquire it, causing jank. So we don't care about the fixed cost overhead of these locks, but rather a situation where a low priority thread may starve a high priority thread.

In the code in question, we've observed that most uses of the lock just need to do a look up (and not an insertion - which is only when a given histogram is first added), so if most callers can use a read lock, then it would reduce greatly the probability of this happening since readers won't compete with each other.

One other thing worth pointing out is the fact that we do have some slow reports about this occurrence, so we should be able to see the effect of such a change by the occurrence rate of such reports.

....

So based on all the above, I still think a R/W lock is a good solution here. Hopefully the above explanation sheds some light on my thinking for this problem.

So my question is how can we use it?

-Alexei

K. Moon

unread,

Oct 13, 2021, 9:55:27 AM10/13/21

to Alexei Svitkine, Peter Kasting, Peter Boström, cxx

My main takeaway from the TotW is that techniques like RCU are preferable to any kind of locking in the first place. I think that would also solve the priority inversion issue, but I'm not as familiar with the problem as you are, and maybe RCU or some other lock-free solution isn't an appropriate solution in this case.

Perhaps attractive nuisance features like a RW lock could at least be behind an allow list, so each use can be carefully vetted, rather than blanket permission like other Abseil features?

Gabriel Charette

unread,

Oct 13, 2021, 10:04:35 AM10/13/21

to Alexei Svitkine, Peter Kasting, Peter Boström, K. Moon, cxx

AFAICT in crbug.com/1123627, the lock is held by Histogram::Factory::Build() doing either StatisticsRecorder::RegisterOrDeleteDuplicateRanges() or StatisticsRecorder::RegisterOrDeleteDuplicate() (can't tell which from the report). Both of these operations look like writes so an RW lock wouldn't help?

--

You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAKFU2SBwWrNiUC2WNQqi7Vo42M%3Dfaa6qhnxMMJshxT7m9u3hEA%40mail.gmail.com.

Alexei Svitkine

unread,

Oct 13, 2021, 10:26:02 AM10/13/21

to Gabriel Charette, Peter Kasting, Peter Boström, K. Moon, cxx

Just to clarify, there's a bunch of different reports linked from the bug, and not all of them are the same issue.

But we definitely see instances of this where it's two FindHistogram() calls contending, like the one with call stack screenshots linked here:

https://bugs.chromium.org/p/chromium/issues/detail?id=1123627#c6

Alexei Svitkine

unread,

Oct 13, 2021, 10:35:01 AM10/13/21

to Gabriel Charette, Peter Kasting, Peter Boström, K. Moon, cxx

And just to clarify, Histogram::Factory:Build() does look up the histogram using FindHistogram() to see if one exists already first, and if it doesn't it will create it (registering both its bucket ranges and the histogram).

In the current world, this still requires a lock operation for find and if not found, another lock operation for insert. The proposal calls for replacing the first operation with a reader lock, which should satisfy most cases. Anyway, this thread is about what would be needed to use such a lock in Chromium.

For the specific problem being solved, we can discuss more on the bug. But I'd like to find a way to get unblocked on trying the r/w lock approach here, so wondering what's the path forward?

(To answer the question about RCU - we can certainly explore other approaches like that, but the R/W lock seems like a straight forward thing to try here - and RCU sounds like it may require a bunch more work - but happy to learn more - e.g. are there RCU primitives in Chromium that could be used here?)

Gabriel Charette

unread,

Oct 13, 2021, 10:41:50 AM10/13/21

to Alexei Svitkine, Gabriel Charette, Peter Kasting, Peter Boström, K. Moon, cxx

For an experiment, how about using a platform specific RW lock (IIUC the issue is particularly visible on Windows)? If it proves to perform well, we can use that to justify the adoption of a cross-platform RW lock (absl's) in Chromium (at which point we can re-run a cross-platform experiment and gather number for all platforms as well as confirm it performs as expected on the platform we already tested).

Alexei Svitkine

unread,

Oct 13, 2021, 11:09:21 AM10/13/21

to Gabriel Charette, Peter Kasting, Peter Boström, K. Moon, cxx

That seems like it would be adding quite a bit of throw-away code. The absl::Mutex and absl::Lock API are pretty similar, so it's a pretty straight forward conversion:

https://chromium-review.googlesource.com/c/chromium/src/+/3216667

I'd rather not write a bunch of throw-away platform-specific code here. I don't think we'd want to maintain a bunch of ifdefed code, even for an experiment - it's a bunch of complexity that would be error prone.

In my CL above, I did update the top-level DEPS file. I'm wondering, could we have a DEPS file at a lower level that overrides it - if so, base/metrics could have one specifically for this case. Assuming that works, that would solve the "mechanical" part of how to allow it in a narrow case, though I'd still like to get clarity on the approval side. What's the process there?

Gabriel Charette

unread,

Oct 13, 2021, 11:33:15 AM10/13/21

to Alexei Svitkine, Gabriel Charette, Peter Kasting, Peter Boström, K. Moon, cxx

On Wed, Oct 13, 2021 at 11:09 AM Alexei Svitkine <asvi...@chromium.org> wrote:

That seems like it would be adding quite a bit of throw-away code. The absl::Mutex and absl::Lock API are pretty similar, so it's a pretty straight forward conversion:
https://chromium-review.googlesource.com/c/chromium/src/+/3216667

This doesn't put the change behind an experiment. To put it behind an experiment we need to define a file-local MyLock class that is doing something fancier under the experiment and using base::Lock without. The fancy part can be an RW-lock on Windows.

With "using MyAutoLock = internal::BasicAutoLock<MyLock>;", the changes are localized to the definition of "lock_" and the file-local class defining MyLock.

I don't think that's too invasive.

K. Moon

unread,

Oct 13, 2021, 11:44:41 AM10/13/21

to Gabriel Charette, Alexei Svitkine, Peter Kasting, Peter Boström, cxx

To answer the process question, the procedure to allow an Abseil feature is the same as allowing a C++ standard feature, and is outlined here:

https://chromium.googlesource.com/chromium/src/+/main/styleguide/c++/c++11.md#modern-c_use-in-chromium

Basically, we have a discussion on cxx@ (so this thread), and if the conclusion is positive, it gets added to the list of allowed features.

I would be very surprised if RCU isn't used already in places in the Chromium code base, as it's a widely-applicable technique for reducing contention in many concurrent data structures, but I don't know if there's an existing library (or if that would even be useful). Typically, you would use a cheap atomic to implement the "read" part of the pattern; in this case, I suspect you'd still want a lock for the "copy" and "update" parts, so a better analogy might be double-checked locking.

Łukasz Anforowicz

unread,

Oct 13, 2021, 11:56:14 AM10/13/21

to K. Moon, Gabriel Charette, Alexei Svitkine, Peter Kasting, Peter Boström, cxx

On Wed, Oct 13, 2021 at 8:44 AM K. Moon <km...@chromium.org> wrote:

To answer the process question, the procedure to allow an Abseil feature is the same as allowing a C++ standard feature, and is outlined here:
https://chromium.googlesource.com/chromium/src/+/main/styleguide/c++/c++11.md#modern-c_use-in-chromium

Basically, we have a discussion on cxx@ (so this thread), and if the conclusion is positive, it gets added to the list of allowed features.

I would be very surprised if RCU isn't used already in places in the Chromium code base, as it's a widely-applicable technique for reducing contention in many concurrent data structures, but I don't know if there's an existing library (or if that would even be useful). Typically, you would use a cheap atomic to implement the "read" part of the pattern; in this case, I suspect you'd still want a lock for the "copy" and "update" parts, so a better analogy might be double-checked locking.

RE: RCU

1. My search for \brcu\b under //third_party/abseil-cpp/ didn't find anything.

2. go/fast/41 says "RCU may also be impractical or infeasible when managing large data as RCU requires a full copy of the data on updates". Here it seems that we are synchronizing access to a map of all Chrome histograms - this seems like a rather large map.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CACwGi-4nMMazP2o4ksijU%3DTb%2B3UKDNne2sBMYxrVwxpgXjqiAQ%40mail.gmail.com.

K. Moon

unread,

Oct 13, 2021, 12:12:08 PM10/13/21

to Łukasz Anforowicz, Gabriel Charette, Alexei Svitkine, Peter Kasting, Peter Boström, cxx

It looks like the data structure in question is HistogramMap, which is defined as std::unordered_map<StringPiece, HistogramBase*, StringPieceHash>. It would depend on how many entries there are, but copying this might be relatively cheap, since the entries are all references. (A similar cost would get triggered during table size changes.) The cost only would need to be paid when adding a new entry, which I'm guessing is rare overall? I think there were some suggestions on the bug to shard the data structure as well, which might be a good idea anyway. (Concurrent map data structures exist, but I think they're overly complicated for most use cases.)

Anyway, I agree that alternatives need careful thought, but for an API as frequently and widely used as a metric collection API, it's often worth the extra effort. It also might be helpful to introduce something like the RCU helper type mentioned in go/fast/41 to base, but probably would need an implementation outside of base to prove the case first.

K. Moon

unread,

Oct 13, 2021, 12:23:58 PM10/13/21

to Łukasz Anforowicz, Gabriel Charette, Alexei Svitkine, Peter Kasting, Peter Boström, cxx

Ignoring the R/W locks for a moment, I think allowing absl::Mutex would also need to overcome the objection here:

https://chromium.googlesource.com/chromium/src/+/main/styleguide/c++/c++11.md#synchronization

base::Lock, absl::Mutex, and std::mutex would all exist (but the last two are banned, and after this change, the last one would be banned). Would there be any effort to harmonize this?

Alexei Svitkine

unread,

Oct 13, 2021, 1:22:36 PM10/13/21

to K. Moon, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx

In this specific case, I was not planning to run this behind an experiment, for a couple reasons:

1. This code gets run before FieldTrials are available, so would require custom set up to do, as well as the extra code complexity.

2. I was thinking given we can observe instances of the bad behavior through slow reports, we can just monitor the occurrence rate of these for versions past the change. I've had success with something like this for other cases that we've only had visibility through UMA (e.g. record corruption), which went away after a specific fix. (Although I'm less familiar w/ slow reports and I guess the question is whether we have a good way to query them that match this scenario - I assume we do, but also we can discuss that outside this list - e.g. on the bug.)

So given that, Gab, your suggested approach would be adding more complexity than a TOT change (e.g. my linked CL).

RCU is interesting, but I believe the main win for it over RW lock is optimizing the read lock case (e.g. read becomes even cheaper, write is still expensive), but as I mentioned up thread, that's not the case we're worried about (the overhead of the read lock is not so important, given it mainly affects non-performance critical call sites).

Joe Mason

unread,

Oct 13, 2021, 3:05:48 PM10/13/21

to Alexei Svitkine, K. Moon, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx

Another concern around RW locks that hasn't been mentioned is that they're very easy to misuse. It's up to the user to verify that code that has the shared lock only reads and never writes. It's easy to accidentally modify a map while under the lock (such as by using the [] operator and accidentally inserting a default value). Given that I don't think we should allow RW locks in general.

But the metrics code is very well-audited and optimized, so I trust that team to use an RW lock appropriately. I think allowing a one-off use here makes the most sense because they've justified their use case, and we want anyone else who wants to use absl::Mutex to make the same case.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAKFU2SC%2BcHszzuYgP1d03Mip%3D-23NWnNp6JWaJ6iWq-rPCP35Q%40mail.gmail.com.

K. Moon

unread,

Oct 13, 2021, 3:47:08 PM10/13/21

to Joe Mason, Alexei Svitkine, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx

I can get behind a one-off exception (assuming this is toolable).

Alexei Svitkine

unread,

Oct 14, 2021, 11:17:44 AM10/14/21

to K. Moon, Joe Mason, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx, dani...@chromium.org, kwi...@chromium.org, mbon...@chromium.org

Thanks!

Do we need additional people to agree or could I proceed with sending my CL for review?

I verified that I can have a DEPS inside base/metrics that permits this case:

https://chromium-review.googlesource.com/c/chromium/src/+/3216667

So we do have an enforcement mechanism - I believe the DEPS change needs to be approved by OWNERS of third_party/abseil-cpp/ (so it's not the case that anyone can just add a DEPS themselves like this), although I'm not sure whether those owners are on this list. Adding them explicitly for visibility.

+dani...@chromium.org

+kwi...@chromium.org

+mbon...@chromium.org

K. Moon

unread,

Oct 14, 2021, 11:20:49 AM10/14/21

to Alexei Svitkine, Joe Mason, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx, dani...@chromium.org, kwi...@chromium.org, mbon...@chromium.org

The rule is that you have to reach consensus on cxx@, so unless someone else wants to weigh in, I would think it's settled. (Maybe wait a day to give any objectors a chance to chime in.)

Gabriel Charette

unread,

Oct 14, 2021, 1:28:38 PM10/14/21

to Alexei Svitkine, K. Moon, Łukasz Anforowicz, Gabriel Charette, Peter Kasting, Peter Boström, cxx

On Wed, Oct 13, 2021 at 1:22 PM Alexei Svitkine <asvi...@chromium.org> wrote:

In this specific case, I was not planning to run this behind an experiment, for a couple reasons:
1. This code gets run before FieldTrials are available, so would require custom set up to do, as well as the extra code complexity.
2. I was thinking given we can observe instances of the bad behavior through slow reports, we can just monitor the occurrence rate of these for versions past the change. I've had success with something like this for other cases that we've only had visibility through UMA (e.g. record corruption), which went away after a specific fix. (Although I'm less familiar w/ slow reports and I guess the question is whether we have a good way to query them that match this scenario - I assume we do, but also we can discuss that outside this list - e.g. on the bug.)

We can but that won't tell us if R/W has impact on other metrics.

There's precedent, at least on Windows, for using the registry to configure synthetic field trials (read and applied in the next session).

Another option might be to no-op our custom Lock() calls before FeatureList is created as Chrome has to be single-threaded until that point anyways.

Discussing here because the results of this as an experiment would be useful to vouch for R/W lock on the overall codebase?

Alexei Svitkine

unread,

Oct 14, 2021, 3:06:59 PM10/14/21

to Gabriel Charette, K. Moon, Łukasz Anforowicz, Peter Kasting, Peter Boström, cxx

On Thu, Oct 14, 2021 at 1:28 PM Gabriel Charette <g...@chromium.org> wrote:

On Wed, Oct 13, 2021 at 1:22 PM Alexei Svitkine <asvi...@chromium.org> wrote:
In this specific case, I was not planning to run this behind an experiment, for a couple reasons:
1. This code gets run before FieldTrials are available, so would require custom set up to do, as well as the extra code complexity.
2. I was thinking given we can observe instances of the bad behavior through slow reports, we can just monitor the occurrence rate of these for versions past the change. I've had success with something like this for other cases that we've only had visibility through UMA (e.g. record corruption), which went away after a specific fix. (Although I'm less familiar w/ slow reports and I guess the question is whether we have a good way to query them that match this scenario - I assume we do, but also we can discuss that outside this list - e.g. on the bug.)

We can but that won't tell us if R/W has impact on other metrics.

There's precedent, at least on Windows, for using the registry to configure synthetic field trials (read and applied in the next session).
Another option might be to no-op our custom Lock() calls before FeatureList is created as Chrome has to be single-threaded until that point anyways.

Discussing here because the results of this as an experiment would be useful to vouch for R/W lock on the overall codebase?

Right, we could do it, but it would be quite a bit of complexity. I agree it would be interesting to validate the overall effect (though I honestly expect not to see any significant movement).

I do think it's worth checking the impact on slow reports in question to confirm it actually has the desired effect, before investing more into a complex trial set up for it. Perhaps we can do this as a first step and then re-visit?

For example, if it doesn't help, then we can just revert it and avoid the overhead of the complicated field trial set up that can't use the normal flow given the early start up case.

Łukasz Anforowicz

unread,

Oct 14, 2021, 3:39:42 PM10/14/21

to Alexei Svitkine, Gabriel Charette, K. Moon, Peter Kasting, Peter Boström, cxx

On Thu, Oct 14, 2021 at 12:07 PM Alexei Svitkine <asvi...@chromium.org> wrote:

On Thu, Oct 14, 2021 at 1:28 PM Gabriel Charette <g...@chromium.org> wrote:

On Wed, Oct 13, 2021 at 1:22 PM Alexei Svitkine <asvi...@chromium.org> wrote:
In this specific case, I was not planning to run this behind an experiment, for a couple reasons:
1. This code gets run before FieldTrials are available, so would require custom set up to do, as well as the extra code complexity.
2. I was thinking given we can observe instances of the bad behavior through slow reports, we can just monitor the occurrence rate of these for versions past the change. I've had success with something like this for other cases that we've only had visibility through UMA (e.g. record corruption), which went away after a specific fix. (Although I'm less familiar w/ slow reports and I guess the question is whether we have a good way to query them that match this scenario - I assume we do, but also we can discuss that outside this list - e.g. on the bug.)

We can but that won't tell us if R/W has impact on other metrics.

There's precedent, at least on Windows, for using the registry to configure synthetic field trials (read and applied in the next session).
Another option might be to no-op our custom Lock() calls before FeatureList is created as Chrome has to be single-threaded until that point anyways.

Discussing here because the results of this as an experiment would be useful to vouch for R/W lock on the overall codebase?

Right, we could do it, but it would be quite a bit of complexity. I agree it would be interesting to validate the overall effect (though I honestly expect not to see any significant movement).
I do think it's worth checking the impact on slow reports in question to confirm it actually has the desired effect, before investing more into a complex trial set up for it. Perhaps we can do this as a first step and then re-visit?
For example, if it doesn't help, then we can just revert it and avoid the overhead of the complicated field trial set up that can't use the normal flow given the early start up case.

Let me silently drop the link to go/tyranny-of-metrics here and then quickly hide back into my cave:

From "Tips for ICs": When a manager, PM, etc, asks for a metric, it's OK to ask "why do you want to know?".
I also remember that when I was in Microsoft we had training and presentation that was discouraging gathering new metrics unless they were clearly actionable. (And from the sidelines it seems that just landing Alexei's CL without A/B experimentation would be sufficient to verify if the original perf/priority-inversion bug was fixed.)

Okay... back to hiding... :-)

Gabriel Charette

unread,

Oct 14, 2021, 7:17:06 PM10/14/21

to Łukasz Anforowicz, Etienne Bergeron, Alexei Svitkine, Gabriel Charette, K. Moon, Peter Kasting, Peter Boström, cxx

Okay, SG, I'm all for the most expedient way to acquire data. At the same time I prefer not cutting corners when there's an opportunity to gather more data that'll influence cross-cutting decisions (so someone doesn't have to do this again in the future).

I understand that in this case it's worthwhile to first figure out if it helps the priority inversion seen by slow reports.

LGTM for usage of RW locks in base/metrics.

+Etienne Bergeron FYI

Gabriel Charette

unread,

Oct 14, 2021, 7:22:53 PM10/14/21

to Gabriel Charette, Robert Liao, Łukasz Anforowicz, Etienne Bergeron, Alexei Svitkine, K. Moon, Peter Kasting, Peter Boström, cxx

Le jeu. 14 oct. 2021, 19 h 16, Gabriel Charette <g...@chromium.org> a écrit :

Okay, SG, I'm all for the most expedient way to acquire data. At the same time I prefer not cutting corners when there's an opportunity to gather more data that'll influence cross-cutting decisions (so someone doesn't have to do this again in the future).

I understand that in this case it's worthwhile to first figure out if it helps the priority inversion seen by slow reports.

LGTM for usage of RW locks in base/metrics.

+Robert Liao who last looked at Windows lock usage too.

Gabriel Charette

unread,

Oct 14, 2021, 7:26:53 PM10/14/21

to Gabriel Charette, Robert Liao, Łukasz Anforowicz, Etienne Bergeron, Alexei Svitkine, K. Moon, Peter Kasting, Peter Boström, cxx

https://groups.google.com/a/chromium.org/g/chromium-dev/c/cPZKCTpVw4U/m/3e4t0OcsBgAJ?pli=1

Is interesting. robliao@ removed all RW locks from base a few years ago... IIUC, the Windows impl was the problem, let's confirm absl doesn't use it?

Alexei Svitkine

unread,

Oct 15, 2021, 10:33:23 AM10/15/21

to Gabriel Charette, Robert Liao, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Kasting, Peter Boström, cxx

Thanks for the pointer. The results show a 17% performance penalty for a shared (read) lock vs. exclusive one on Windows. That's indeed higher than the result posted on the internal totw (presumably from Linux).

I don't think it changes the equation for this targeted use, in my mind. As we're interested in fixing the priority inversion stemming from exclusive use of the lock by background threads, and the a small penalty to the overhead of the lock is not a concern here - since performance-critical code would be using histogram macros anyway, and not acquiring this lock except on the first call.

Alexei Svitkine

unread,

Oct 15, 2021, 10:39:22 AM10/15/21

to Gabriel Charette, Robert Liao, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Kasting, Peter Boström, cxx

Also, looking at ABSL mutex.cc, it doesn't appear to use those Windows APIs.

In fact, it looks like the implementation is cross-platform, relying on std::atomic<intptr_t> for bookkeeping.

https://source.chromium.org/chromium/chromium/src/+/main:third_party/abseil-cpp/absl/synchronization/mutex.cc;drc=07cb7b184515ad207d30f00d0b00b8ce96d0a750;l=1488

So probably the numbers from the TOTW are more accurate (but this can vary on the cpu architecture, I'd imagine).

Robert Liao

unread,

Oct 15, 2021, 2:19:40 PM10/15/21

to Alexei Svitkine, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Kasting, Peter Boström, cxx

The main objection to RW locks at the time was not only perf but rare usage (hence questioning if it should even be in base).

The usage at the time could be replaced with a Lock+CV, which undermined the usage argument even more.

Alexei Svitkine

unread,

Sep 8, 2023, 3:51:04 PM9/8/23

to Robert Liao, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Kasting, Peter Boström, cxx

+Luc Nguyen

Revisiting this topic, let me recap where we're at.

So the change to use absl's R/W lock for StatisticsRecorder had several landings and reverts, which were mostly related to hangs detected on iOS (where starvation was happening contrary to the guarantees the API documents).

As part of our investigation, we found and got absl team to fix one cause of this (cl/540547885). However, the issue was still not fully fixed, so we ended up reverting to the previous behavior on iOS. (So now we have a platform-specific ifdef.)

Now that it's landed on stable on other platforms, we're seeing some additional crash reports related to it, most prominently crbug.com/1475019 which appears to be a CHECK failure in a low-level allocator absl's R/W lock uses.

So with that and the iOS issues, it seems absl's R/W impl isn't really robust enough to support our needs.

However, we still think a R/W is the right solution here. It turns out, C++17 introduces std::shared_mutex that supports R/W semantics. However, the C++17 threading primitives are currently banned in Chromium:

https://chromium.googlesource.com/chromium/src/+/1865f277bb4b77b6d12ccfa566d67a187d5d00e6/styleguide/c++/c++11.md#thread-library-banned

So we'd like to ask for an exemption from the above, so we can try the std::shared_mutex imp impl for a R/W lock for StatisticsRecorder. (All the same arguments from upthread apply as to why we think a R/W lock makes sense here.)

We're hoping it won't have the issues we've seen w/ absl's R/W lock impl.

Peter Kasting

unread,

Sep 8, 2023, 9:26:10 PM9/8/23

to Alexei Svitkine, Robert Liao, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

I read back through the thread, but I didn't see if you actually benchmarked e.g. std::shared_mutex or absl::Mutex against base:: primitives to confirm that despite the low-level increase in overhead, the suitability-for-use-case factor at a high level makes this a win. Did you?

If it is, I think that's an argument that we should build a (frowned-on, should be rarely used) r/w capability in the base:: classes, because there's an actual win with it. If you don't know that it is, though, I'm leery. (And how did we land usage of the absl:: types into stable, when they're also banned?)

PK

Alexei Svitkine

unread,

Sep 11, 2023, 10:08:53 AM9/11/23

to Peter Kasting, Robert Liao, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

On Fri, Sep 8, 2023 at 9:26 PM Peter Kasting <pkas...@google.com> wrote:

I read back through the thread, but I didn't see if you actually benchmarked e.g. std::shared_mutex or absl::Mutex against base:: primitives to confirm that despite the low-level increase in overhead, the suitability-for-use-case factor at a high level makes this a win. Did you?

No, it hasn't been done. We can't really benchmark it synthetically because it depends on the actual usage patterns in the wild. OTOH, the effect of a R/W lock is a large chunk of the lock ops will now be non-blocking and there are many of these on the main thread. So I do expect there to be an effect (on things like jank). But agreed that it would be best to verify this.

I will have to check with the slow reports team to see if there's a good way to do this, since this isn't using a base::Feature to roll out (since this code needs to run before the base::Features framework is initialized.)

Though, we could do our own randomization (and timing measurement) and report the results as a synthetic trial if there are really concerns about this.

Note: If we're switching the impl, I'd rather do the measurements on the new impl since they likely have different overheads.

If it is, I think that's an argument that we should build a (frowned-on, should be rarely used) r/w capability in the base:: classes, because there's an actual win with it. If you don't know that it is, though, I'm leery. (And how did we land usage of the absl:: types into stable, when they're also banned?)

The use of the absl R/W lock was approved as an exception via this thread (e.g. see discussion above such as: "The rule is that you have to reach consensus on cxx@, so unless someone else wants to weigh in, I would think it's settled."). The DEPS change to allowlist for base/metrics is documented here and was reviewed by the absl owners:

https://source.chromium.org/chromium/chromium/src/+/main:base/metrics/DEPS;drc=fb9148ad86e2d770fbe4349d2aed3d19212b3914;l=3

Alexei Svitkine

unread,

Sep 11, 2023, 11:43:25 AM9/11/23

to Peter Kasting, Robert Liao, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

I decided to go ahead and bite the bullet to implement a synthetic trial and a metric that tracks the total time spent acquiring this lock, which we can use to measure the benefits of the R/W semantics against the normal base::Lock.

WIP CL here: https://chromium-review.googlesource.com/c/chromium/src/+/4851484

Alexei Svitkine

unread,

Sep 11, 2023, 11:54:37 AM9/11/23

to Robert Liao, Peter Kasting, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

That seems reasonable. To clarify, right now I am just asking for an exception to try the C++17 R/W lock class for this specific use case (same rationale as was for the absl impl), not yet asking for it to be generally approved.

We can use the results of the A/B synthetic trial above to evaluate the next steps after.

On Mon, Sep 11, 2023 at 11:49 AM Robert Liao <rob...@google.com> wrote:

As a reference, I've pushed back in the past against shared mode locks as the use cases were rare and you potentially run the risk of actually slowing down your code (base::Lock is based on SRWLock on Windows).

If we were to allow RW locks in base, it certainly seems like a good candidate for base::subtle.

Gabriel Charette

unread,

Sep 11, 2023, 1:15:04 PM9/11/23

to Alexei Svitkine, Robert Liao, Peter Kasting, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

LGTM and +1 to setting it up as a synthetic field trial so we can learn what magnitude and side effects it has beyond fixing the known jank signature (that we can measure on Catan's Congestion Timelines already)

Robert Liao

unread,

Sep 13, 2023, 1:55:31 PM9/13/23

to Alexei Svitkine, Peter Kasting, Luc Nguyen, Gabriel Charette, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

As a reference, I've pushed back in the past against shared mode locks as the use cases were rare and you potentially run the risk of actually slowing down your code (base::Lock is based on SRWLock on Windows).

If we were to allow RW locks in base, it certainly seems like a good candidate for base::subtle.

Alexei Svitkine

unread,

Nov 20, 2023, 4:48:53 PM11/20/23

to Gabriel Charette, Robert Liao, Peter Kasting, Luc Nguyen, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

Now that we've had a chance to try it, let me share our findings for how std::shared_mutex ended up performing for us for StatisticsRecorder.

1. We implemented a 50/50 trial client-side for which lock impl to use (either a base::Lock vs. std::shared_mutex, where the latter would be locked in read or write mode depending on the call site).

2. We used a metric that summed up the total time spent waiting to acquire the lock (across all threads and not distinguishing the R/W call sites). This metric would be reported to UMA with every log, reporting the sum of the time used during that period.

3. On Windows, we saw a very significant improvement at P99 and to the overall sum, with some small regressions at smaller percentiles.

4. On other platforms, we mostly just saw regressions, likely from the extra overhead of the R/W impl.

There's a bunch of details/investigation on the bug (https://bugs.chromium.org/p/chromium/issues/detail?id=1123627), but basically our understanding was that this only helped on Windows because of the fact that Windows' underlying locks can suffer from priority inversion, whereas other platforms have kernel support for detecting and preventing priority inversion (i.e. boosting the low-priority thread if a high-priority thread is waiting for the lock). So there, we were just experiencing the overhead without the benefits. We turned off the trial on non-Windows platforms.

But also in parallel, we investigated the underlying problem (not related to the R/W lock) more and found several improvements to how that lock was used that ended up solving the priority inversion problems even with the regular lock. Once those improvements landed, we no longer saw the R/W lock moving anything for Windows (unlike earlier results). So given that, I've turned off the experiment and reverted the code to the normal base::Lock.

Peter Kasting

unread,

Nov 20, 2023, 5:08:29 PM11/20/23

to Alexei Svitkine, Gabriel Charette, Robert Liao, Luc Nguyen, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

There is some fascinating work in this bug.

I would love to hear a brews and bites talk about the odyssey some time.

PK

Gabriel Charette

unread,

Nov 22, 2023, 5:17:46 PM11/22/23

to Peter Kasting, Alexei Svitkine, Gabriel Charette, Robert Liao, Luc Nguyen, Łukasz Anforowicz, Etienne Bergeron, K. Moon, Peter Boström, cxx

On Mon, Nov 20, 2023 at 5:08 PM Peter Kasting <pkas...@google.com> wrote:

There is some fascinating work in this bug.

I would love to hear a brews and bites talk about the odyssey some time.

+1 :)!

Reply all

Reply to author

Forward