Re: Proposal to upstream static analysis optimizations for ThreadSanitizer

46 views
Skip to first unread message

Dmitry Vyukov

unread,
Aug 20, 2025, 9:37:26 PMAug 20
to Alexey Paznikov, Marco Elver, Dmitry Vyukov, Umang Mathur, thread-sanitizer
On Sun, 17 Aug 2025 at 19:42, Alexey Paznikov <apaz...@gmail.com> wrote:

Hi Dmitry, Marco,

We have developed a set of sound static analyses that significantly reduce ThreadSanitizer's overhead by eliminating unnecessary instrumentation ahead-of-time.

Our benchmarks on the Chromium codebase are very promising:

  • A 30% improvement on the overall Speedometer 3 score (up to 2x speedups on individual sub-tests like TodoMVC-React-Redux).

  • Significant gains (up to 2.2x) in other Blink performance tests.

We have also seen substantial speedups on other major applications:

  • SQLite: Up to 4.7x.

  • FFmpeg: Up to 2.27x.

  • Redis: Up to 1.72x.

  • MySQL: Up to 1.11x.


+thread-sanitizer mailing list

Hi Alexey,

These results look great! And, yes, it would be good to see it merged upstream.
 

These results are achieved by a framework of five analyses: Escape Analysis (finds thread-local objects), Lock Ownership (finds lock-protected globals), Single-Threaded Context (detects pre-threading code), SWMR Pattern Detection (finds read-mostly globals), and Dominance-Based Elimination (prunes redundant checks).

While the full framework is extensive, we propose two analyses as excellent candidates for initial upstreaming:

  1. Intra-procedural Escape Analysis: We suggest starting with an intra-procedural version. It is simpler to integrate yet highly effective. It is sound by construction, operating on provably thread-local data.

  2. Dominance-Based Elimination: This is extremely powerful in hot loops. We acknowledge this affects report granularity (the race is reported on the dominating access), but it guarantees detection, which we believe is a valuable trade-off for the performance gain.


What do you mean by "affects reporting granularity"?
I assume the dominating access also races in this case, so reporting the race on it is not wrong.


Our approach preserves TSan's zero-false-positive guarantee and adds no runtime overhead. We have a working implementation integrated with the TSan LLVM pass and are prepared to do the work to adapt it for upstreaming.

Would you be open to discussing this further and guiding us on the contribution process?


How exactly can we help?
There should be some LLVM docs on the contribution process. But basically it's a github PR nowadays. Existing tests need to pass, new tests added.


Alexey Paznikov

unread,
Aug 25, 2025, 5:47:59 AMAug 25
to Dmitry Vyukov, Marco Elver, Dmitry Vyukov, Umang Mathur, thread-sanitizer

Hi Dmitry,


Thank you very much for your prompt and encouraging reply! We're very glad you find our results promising, and we are excited to upstream this work under your guidance.


Let me first answer your questions below:


1. What do you mean by "affects reporting granularity"?


Indeed, domination-based analysis is sound and complete in the following sense: TSan (without dominance analysis) reports a race if and only if TSan (with the dominance analysis) reports a race. However, as with some optimizations that TSan already implements, the number of race reports (after dominance based optimization) may be fewer because we report only the “dominating” instruction to be in race (and avoid instrumenting and reporting other instructions that are dominated).


To illustrate this, let's consider an example:


// BB1: Dominator block

__tsan_write(&x); // TSan instrumentation call

x = 1;            // Dominating access I₁ (instrumented)


if (condition) {

  // BB3: Dominated block

  x = 2;          // Dominated access I₂ (uninstrumented)

}


In this code, the access x = 1 dominates x = 2. Our analysis removes the instrumentation for x = 2. If another thread writes to x concurrently, creating a race with the access at x = 2, then TSan will still detect this race, but the report will point to the line with x = 1.


What this means for a developer in practice is that a race report on the dominating access `I₁` is a strong signal that the entire region of code it dominates is vulnerable.


2. How exactly can we help?


Thanks for clarifying the process – a GitHub PR that passes existing tests and adds new ones sounds like a clear plan.


We have a working and debugged implementation as a set of LLVM passes integrated with ThreadSanitizerPass. To simplify the review and integration process, we could start with one of the two analyses proposed in the initial email (Intra-procedural Escape Analysis or Dominance-Based Elimination), as this would allow us to focus on a single set of changes.


Your guidance on the following points would be invaluable:

  • Could you recommend a recent commit or PR that adds similar functionality to the sanitizer infrastructure? It would serve as a great reference for us.
  • Regarding architecture: our current implementation consists of separate LLVM passes. Do you think it would be best to integrate them as tightly as possible with ThreadSanitizerPass, or would it be better to implement them as more general, standalone analyses (e.g., in the lib/Analysis/ directory)?
  • Besides the standard check-tsan target, are there other TSan test suites we should be aware of? Of course, we plan to add new regression tests for our analyses and instrumentation logic.
  • Would it be helpful if we shared a branch with our current implementation or a brief design doc for a preliminary feedback before we open a formal PR?

We are ready to get started and look forward to your guidance.


чт, 21 авг. 2025 г. в 09:37, Dmitry Vyukov <dvy...@google.com>:


--
Best regards,
Paznikov Alexey
Senior Research Fellow, School of Computing
National University of Singapore

Dmitry Vyukov

unread,
Aug 25, 2025, 2:01:44 PMAug 25
to Alexey Paznikov, Vitaly Buka, Florian Mayer, Thurston Dang, Marco Elver, Dmitry Vyukov, Umang Mathur, thread-sanitizer
On Mon, 25 Aug 2025 at 02:47, Alexey Paznikov <apaz...@gmail.com> wrote:
>
> Hi Dmitry,
>
>
> Thank you very much for your prompt and encouraging reply! We're very glad you find our results promising, and we are excited to upstream this work under your guidance.
>
>
> Let me first answer your questions below:
>
>
> 1. What do you mean by "affects reporting granularity"?
>
>
> Indeed, domination-based analysis is sound and complete in the following sense: TSan (without dominance analysis) reports a race if and only if TSan (with the dominance analysis) reports a race. However, as with some optimizations that TSan already implements, the number of race reports (after dominance based optimization) may be fewer because we report only the “dominating” instruction to be in race (and avoid instrumenting and reporting other instructions that are dominated).
>
>
> To illustrate this, let's consider an example:
>
>
> // BB1: Dominator block
>
> __tsan_write(&x); // TSan instrumentation call
>
> x = 1; // Dominating access I₁ (instrumented)
>
>
> if (condition) {
>
> // BB3: Dominated block
>
> x = 2; // Dominated access I₂ (uninstrumented)
>
> }
>
>
> In this code, the access x = 1 dominates x = 2. Our analysis removes the instrumentation for x = 2. If another thread writes to x concurrently, creating a race with the access at x = 2, then TSan will still detect this race, but the report will point to the line with x = 1.
>
>
> What this means for a developer in practice is that a race report on the dominating access `I₁` is a strong signal that the entire region of code it dominates is vulnerable.

This looks fine.
I don't think "creating a race with the access at x = 2" is even a
thing. The effect does not look user-observable.

> 2. How exactly can we help?
>
>
> Thanks for clarifying the process – a GitHub PR that passes existing tests and adds new ones sounds like a clear plan.
>
>
> We have a working and debugged implementation as a set of LLVM passes integrated with ThreadSanitizerPass. To simplify the review and integration process, we could start with one of the two analyses proposed in the initial email (Intra-procedural Escape Analysis or Dominance-Based Elimination), as this would allow us to focus on a single set of changes.

Yes, please start with just 1 change.

> Your guidance on the following points would be invaluable:
>
> Could you recommend a recent commit or PR that adds similar functionality to the sanitizer infrastructure? It would serve as a great reference for us.

+Vitaly, Florian, Thurston, can you recommend some recent "canonical"
PR for llvm part of sanitizers?

> Regarding architecture: our current implementation consists of separate LLVM passes. Do you think it would be best to integrate them as tightly as possible with ThreadSanitizerPass, or would it be better to implement them as more general, standalone analyses (e.g., in the lib/Analysis/ directory)?

I can't answer this. I have not worked closely on other parts of llvm,
so I'm not sure what the policy is.
Is it possible to factor this logic into separate helper classes (not
passes), and use the classes within TSan pass? This looks like some
middle-ground, it should be easy to move this logic to a separate pass
if necessary later.


> Besides the standard check-tsan target, are there other TSan test suites we should be aware of? Of course, we plan to add new regression tests for our analyses and instrumentation logic.

Does check-tsan run tests for llvm instrumentation part of tsan? Or
it's only the compiler-rt part?

> Would it be helpful if we shared a branch with our current implementation or a brief design doc for a preliminary feedback before we open a formal PR?

Sharing the existing branch won't harm.

Alexey Paznikov

unread,
Oct 9, 2025, 2:56:22 AMOct 9
to thread-sanitizer

Hello Dmitry and all,

I’m implementing getUnderlyingObjectsThroughLoads — a stronger variant of getUnderlyingObjects that augments ValueTracking (VT) with MemorySSA clobber-chasing and, when safe, follows stores to recover the pointer value stored in memory. The function is used by our Escape Analysis (EA) to enumerate all potential underlying objects for a pointer read (so EA can prove objects are thread-local / non-escaping). 

The core problem I ran into is practical and important for soundness:

  • getUnderlyingObjects(Value *V, ..., MaxLookup) accepts a MaxLookup parameter. In our tree calling MaxLookup = 0 behaves as “unbounded”, so we call VT with 0 to try to get a full expansion.

  • But the VT API does not report whether it completed exploration or silently stopped because of an internal cutoff (cycle protection, internal step limits, etc.). That means callers cannot distinguish “VT explored everything” vs “VT stopped early”. If a client (EA) treats a truncated VT result as complete, it may incorrectly assume that no other underlying objects exist — producing unsound decisions.

Because EA must be sound (it should conservatively bail out when the analysis is incomplete), this ambiguity is a real problem.

Options I’m considering (with tradeoffs)
  1. Keep calling VT with MaxLookup = 0 (unbounded) and trust VT.

    • Pro: simplest, no API changes.

    • Con: implicit semantics, possible hangs/explosion in pathological cases; no way to detect truncated results.

  2. Clone / reimplement relevant parts of ValueTracking inside my analysis.

    • Pro: full control; I can detect internal cutoffs and return an Incomplete flag.

    • Con: code duplication; maintenance burden; risk of divergence as VT evolves.

  3. Propose an upstream API change to VT — for example:

    void getUnderlyingObjects(const Value *V, SmallVectorImpl<const Value*> &Bases, LoopInfo *LI, unsigned MaxLookup, bool *Incomplete = nullptr);

    or return a struct:

    struct UnderlyingResult { SmallVector<const Value*> Bases; bool Complete; }; UnderlyingResult getUnderlyingObjectsEx(const Value *V, LoopInfo *LI, unsigned MaxLookup);
    • Pro: clean; callers know whether VT was complete.

    • Con: API change; requires PR, tests, and review.

Questions:
  1. Which of these solutions would you consider most reasonable or maintainable?

  2. Is the current non-signalling behavior of getUnderlyingObjects intentional? Has this been discussed previously? Any hidden reasons VT intentionally hides completeness (internal complexity, historical reasons, etc.)?

  3. If an API change is acceptable, which form is preferred:
    an out-parameter (bool *Incomplete),
    a new return struct,
    or a new overload (e.g. getUnderlyingObjectsEx)?  

  4. Are there codebase precedents for adding “completion” flags to analyses, and is there a preferred pattern to follow? Any performance/style concerns I should follow if I prepare a PR (e.g., prefer new symbol vs changing the existing signature)?


вторник, 26 августа 2025 г. в 02:01:44 UTC+8, dvy...@google.com:

Dmitry Vyukov

unread,
Oct 10, 2025, 6:36:02 AMOct 10
to Alexey Paznikov, thread-sanitizer
On Thu, 9 Oct 2025 at 08:56, Alexey Paznikov <apaz...@gmail.com> wrote:
>
> Hello Dmitry and all,
>
> I’m implementing getUnderlyingObjectsThroughLoads — a stronger variant of getUnderlyingObjects that augments ValueTracking (VT) with MemorySSA clobber-chasing and, when safe, follows stores to recover the pointer value stored in memory. The function is used by our Escape Analysis (EA) to enumerate all potential underlying objects for a pointer read (so EA can prove objects are thread-local / non-escaping).
>
> The core problem I ran into is practical and important for soundness:
>
> getUnderlyingObjects(Value *V, ..., MaxLookup) accepts a MaxLookup parameter. In our tree calling MaxLookup = 0 behaves as “unbounded”, so we call VT with 0 to try to get a full expansion.
>
> But the VT API does not report whether it completed exploration or silently stopped because of an internal cutoff (cycle protection, internal step limits, etc.). That means callers cannot distinguish “VT explored everything” vs “VT stopped early”. If a client (EA) treats a truncated VT result as complete, it may incorrectly assume that no other underlying objects exist — producing unsound decisions.
>
> Because EA must be sound (it should conservatively bail out when the analysis is incomplete), this ambiguity is a real problem.
>
> Options I’m considering (with tradeoffs)
>
> Keep calling VT with MaxLookup = 0 (unbounded) and trust VT.
>
> Pro: simplest, no API changes.
>
> Con: implicit semantics, possible hangs/explosion in pathological cases; no way to detect truncated results.
>
> Clone / reimplement relevant parts of ValueTracking inside my analysis.
>
> Pro: full control; I can detect internal cutoffs and return an Incomplete flag.
>
> Con: code duplication; maintenance burden; risk of divergence as VT evolves.
>
> Propose an upstream API change to VT — for example:
>
> void getUnderlyingObjects(const Value *V, SmallVectorImpl<const Value*> &Bases, LoopInfo *LI, unsigned MaxLookup, bool *Incomplete = nullptr);
>
> or return a struct:
>
> struct UnderlyingResult { SmallVector<const Value*> Bases; bool Complete; }; UnderlyingResult getUnderlyingObjectsEx(const Value *V, LoopInfo *LI, unsigned MaxLookup);
>
> Pro: clean; callers know whether VT was complete.
>
> Con: API change; requires PR, tests, and review.
>
> Questions:
>
> Which of these solutions would you consider most reasonable or maintainable?

3 looks most reasonable to me (and highest chances of being accepted upstream).

> Is the current non-signalling behavior of getUnderlyingObjects intentional? Has this been discussed previously? Any hidden reasons VT intentionally hides completeness (internal complexity, historical reasons, etc.)?

I am not sure we have people with enough expertise to answer this on
this list. The LLVM communication channels may be more helpful for
this.

> If an API change is acceptable, which form is preferred:
> an out-parameter (bool *Incomplete),
> a new return struct,
> or a new overload (e.g. getUnderlyingObjectsEx)?
>
> Are there codebase precedents for adding “completion” flags to analyses, and is there a preferred pattern to follow? Any performance/style concerns I should follow if I prepare a PR (e.g., prefer new symbol vs changing the existing signature)?

I also don't know.
I would grep headers to find precedents of using one or the other approach.
> --
> You received this message because you are subscribed to the Google Groups "thread-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/thread-sanitizer/bfed2b17-2bd4-4bc4-abcc-8dde325bb9c5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages