Hi Dmitry, Marco,
We have developed a set of sound static analyses that significantly reduce ThreadSanitizer's overhead by eliminating unnecessary instrumentation ahead-of-time.
Our benchmarks on the Chromium codebase are very promising:
A 30% improvement on the overall Speedometer 3 score (up to 2x speedups on individual sub-tests like TodoMVC-React-Redux).
Significant gains (up to 2.2x) in other Blink performance tests.
We have also seen substantial speedups on other major applications:
SQLite: Up to 4.7x.
FFmpeg: Up to 2.27x.
Redis: Up to 1.72x.
MySQL: Up to 1.11x.
These results are achieved by a framework of five analyses: Escape Analysis (finds thread-local objects), Lock Ownership (finds lock-protected globals), Single-Threaded Context (detects pre-threading code), SWMR Pattern Detection (finds read-mostly globals), and Dominance-Based Elimination (prunes redundant checks).
While the full framework is extensive, we propose two analyses as excellent candidates for initial upstreaming:
Intra-procedural Escape Analysis: We suggest starting with an intra-procedural version. It is simpler to integrate yet highly effective. It is sound by construction, operating on provably thread-local data.
Dominance-Based Elimination: This is extremely powerful in hot loops. We acknowledge this affects report granularity (the race is reported on the dominating access), but it guarantees detection, which we believe is a valuable trade-off for the performance gain.
Our approach preserves TSan's zero-false-positive guarantee and adds no runtime overhead. We have a working implementation integrated with the TSan LLVM pass and are prepared to do the work to adapt it for upstreaming.
Would you be open to discussing this further and guiding us on the contribution process?
Hi Dmitry,
Thank you very much for your prompt and encouraging reply! We're very glad you find our results promising, and we are excited to upstream this work under your guidance.
Let me first answer your questions below:
1. What do you mean by "affects reporting granularity"?
Indeed, domination-based analysis is sound and complete in the following sense: TSan (without dominance analysis) reports a race if and only if TSan (with the dominance analysis) reports a race. However, as with some optimizations that TSan already implements, the number of race reports (after dominance based optimization) may be fewer because we report only the “dominating” instruction to be in race (and avoid instrumenting and reporting other instructions that are dominated).
To illustrate this, let's consider an example:
// BB1: Dominator block
__tsan_write(&x); // TSan instrumentation call
x = 1; // Dominating access I₁ (instrumented)
if (condition) {
// BB3: Dominated block
x = 2; // Dominated access I₂ (uninstrumented)
}
In this code, the access x = 1 dominates x = 2. Our analysis removes the instrumentation for x = 2. If another thread writes to x concurrently, creating a race with the access at x = 2, then TSan will still detect this race, but the report will point to the line with x = 1.
What this means for a developer in practice is that a race report on the dominating access `I₁` is a strong signal that the entire region of code it dominates is vulnerable.
2. How exactly can we help?
Thanks for clarifying the process – a GitHub PR that passes existing tests and adds new ones sounds like a clear plan.
We have a working and debugged implementation as a set of LLVM passes integrated with ThreadSanitizerPass. To simplify the review and integration process, we could start with one of the two analyses proposed in the initial email (Intra-procedural Escape Analysis or Dominance-Based Elimination), as this would allow us to focus on a single set of changes.
Your guidance on the following points would be invaluable:
We are ready to get started and look forward to your guidance.