PartitionAlloc and shared libraries

86 views
Skip to first unread message

Daniel Cheng

unread,
May 2, 2025, 1:21:00 AMMay 2
to memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
I recently tried to delete the non-Rust JSON parsing path: https://chromium-review.googlesource.com/c/chromium/src/+/6492579

Much to my surprise, this CL was reverted due to hitting a `PA_NOTREACHED()`: https://ci.chromium.org/ui/p/chromium/builders/ci/linux-cast-x64-dbg/5720/overview

After a bunch of debugging and help from tasak@, the problem is due to a shared library. Specifically, cast_audio_backend_unittests depends on libcast_graphics_1.0, and libcast_graphics_1.0 depends on //base. This means we have two PartitionAllocs running around, and if we end up trying to free an allocation from one instance of PartitionAlloc from the other instance of PartitionAlloc, bad things happen.

If a .so and the main binary both provide definitions for the same symbols, is there a way to reliably prevent them from stomping on each other's symbols? Or is it pretty much a hopeless task? I know there were some discussions about PartitionAlloc-Everywhere and component builds in the past, but as far as I know, we never shipped PA-E on component builds.

FWIW, I think it's possible to drop the //base dependency from libcast_graphics_1.0, and I'll be pursuing that as a short-term fix, but it would be nice if we could detect/prevent this situation from happening at all.

Daniel

Anton Bikineev

unread,
May 2, 2025, 6:04:36 AMMay 2
to Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
I'd assume ASAN with ASAN_OPTIONS=detect_odr_violation=1 should detect such issues for global variables. It doesn't work for functions though, but since we have global vars in PA (like the singleton PartitionAddressSpace), I guess it would work. Not sure also if the ASAN check works when an odr violation happens across shared objects.

--
You received this message because you are subscribed to the Google Groups "memory-safety-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-safety-...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/memory-safety-dev/CAF3XrKqQVZA4V3Nm%3DeOucr3ExGppC3pDVnM8pO9WciruNGr-ow%40mail.gmail.com.

Egor Pasko

unread,
May 5, 2025, 6:40:53 AMMay 5
to Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
Besides PartitionAlloc there are probably plenty of assumptions relying on a single instance of //base per process. Seeing a couple of uses of thread_local in task/threading/sampling_heap_profiler, plus some of our past crashes makes me believe that creating two instances of //base in one process should not be allowed.

On Fri, May 2, 2025 at 7:21 AM Daniel Cheng <dch...@chromium.org> wrote:
--
You received this message because you are subscribed to the Google Groups "rust-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rust-dev+u...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/rust-dev/CAF3XrKqQVZA4V3Nm%3DeOucr3ExGppC3pDVnM8pO9WciruNGr-ow%40mail.gmail.com.

Greg Thompson

unread,
May 5, 2025, 7:06:46 AMMay 5
to Egor Pasko, Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
Not to disagree with you, but non-component builds on Windows always have one copy of //base in chrome.exe and another in chrome.dll. We do consider each module to be its own allocation domain, so we try to never alloc in one and free in another. The component build makes this intractable. (Another reason to dislike the component build...)

You received this message because you are subscribed to the Google Groups "memory-safety-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-safety-...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/memory-safety-dev/CAH3q7_nK%2BJd5MHG5JGp1%3DLHQXOQp4QPwnKZ%2BYCVp1VwsZQQk3w%40mail.gmail.com.

Egor Pasko

unread,
May 7, 2025, 9:05:43 AMMay 7
to Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
One idea that just came to me. On Linux/Android by using dl_iterate_phdr(3) we can walk all exported symbols from all loaded libraries. We can CHECK(more than one library exports some PA symbol). This check would be somewhat slow, but I think within allowance for debug builds.

On Fri, May 2, 2025 at 7:21 AM Daniel Cheng <dch...@chromium.org> wrote:
--

Nico Weber

unread,
May 7, 2025, 1:24:54 PMMay 7
to Greg Thompson, Egor Pasko, Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
On Mon, May 5, 2025 at 7:06 AM Greg Thompson <g...@chromium.org> wrote:
Not to disagree with you, but non-component builds on Windows always have one copy of //base in chrome.exe and another in chrome.dll. We do consider each module to be its own allocation domain, so we try to never alloc in one and free in another. The component build makes this intractable.

base is a shared library in component builds, no? The problem here is in non-component builds as far as I understand (?)
 
On Mon, May 5, 2025 at 12:40 PM Egor Pasko <pa...@chromium.org> wrote:
Besides PartitionAlloc there are probably plenty of assumptions relying on a single instance of //base per process. Seeing a couple of uses of thread_local in task/threading/sampling_heap_profiler, plus some of our past crashes makes me believe that creating two instances of //base in one process should not be allowed.

On Fri, May 2, 2025 at 7:21 AM Daniel Cheng <dch...@chromium.org> wrote:
I recently tried to delete the non-Rust JSON parsing path: https://chromium-review.googlesource.com/c/chromium/src/+/6492579

Much to my surprise, this CL was reverted due to hitting a `PA_NOTREACHED()`: https://ci.chromium.org/ui/p/chromium/builders/ci/linux-cast-x64-dbg/5720/overview

After a bunch of debugging and help from tasak@, the problem is due to a shared library. Specifically, cast_audio_backend_unittests depends on libcast_graphics_1.0, and libcast_graphics_1.0 depends on //base. This means we have two PartitionAllocs running around, and if we end up trying to free an allocation from one instance of PartitionAlloc from the other instance of PartitionAlloc, bad things happen.

If a .so and the main binary both provide definitions for the same symbols, is there a way to reliably prevent them from stomping on each other's symbols? Or is it pretty much a hopeless task? I know there were some discussions about PartitionAlloc-Everywhere and component builds in the past, but as far as I know, we never shipped PA-E on component builds.

FWIW, I think it's possible to drop the //base dependency from libcast_graphics_1.0, and I'll be pursuing that as a short-term fix, but it would be nice if we could detect/prevent this situation from happening at all.

Daniel

--
You received this message because you are subscribed to the Google Groups "rust-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rust-dev+u...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/rust-dev/CAF3XrKqQVZA4V3Nm%3DeOucr3ExGppC3pDVnM8pO9WciruNGr-ow%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "memory-safety-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-safety-...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/memory-safety-dev/CAH3q7_nK%2BJd5MHG5JGp1%3DLHQXOQp4QPwnKZ%2BYCVp1VwsZQQk3w%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "memory-safety-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-safety-...@chromium.org.

Greg Thompson

unread,
May 8, 2025, 1:53:12 AMMay 8
to Nico Weber, Egor Pasko, Daniel Cheng, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
Thanks, Nico. I had some crossed wires...

On Windows, we have historically been careful to consider each module to have its own allocation domain and never alloc in one and free in the other. This is why we have a strict C api at the boundaries of chrome_elf.dll, chrome.exe, and chrome.dll -- it's easier to reason about.

Does the API exposed by libcast_graphics_1.0 require handing out allocations that are freed by the caller (or vice-versa)? If so, can we change its API so that it doesn't? Would this address the "two P-A" problem?

Daniel Cheng

unread,
May 8, 2025, 2:40:01 AMMay 8
to Greg Thompson, Nico Weber, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
As I understand, the problem isn't that libcast_graphics_1.0 is allocating something that gets freed outside the .so (or vice versa). The problem is that simply loading the .so is enough to interpose the .so's implementation of the symbols. If this happens after something has already used PartitionAlloc to allocate something, we then get a second instance of PartitionAlloc running around.

When I asked about this internally, I got pointed at -Bsymbolic and friends, but I haven't had much time to research those in depth. They did note that even if this prevents the immediate problem with symbol interposition, there may very well be other issues with multiple copies of //base, et cetera in process.

Daniel

Greg Thompson

unread,
May 8, 2025, 4:57:01 AMMay 8
to Daniel Cheng, Nico Weber, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
I'm not sure how this would be handled on POSIX platforms. On Windows, the various modules that link to //base in a static build do not export any P-A symbols, so there can't be any collisions.

Reid Kleckner

unread,
May 8, 2025, 4:35:08 PMMay 8
to Greg Thompson, Daniel Cheng, Nico Weber, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
Maybe the solution here is for //base & PartitionAlloc to be built with `-fvisibility=hidden`? That would align the way things happen in Posix with the way Windows DLLs work.

Maybe Chrome should add checks that all DSOs it loads don't export any C++ symbols, so, nothing in .dynstr should start with `_*Z`.

Łukasz Anforowicz

unread,
May 8, 2025, 4:54:07 PMMay 8
to Reid Kleckner, Greg Thompson, Daniel Cheng, Nico Weber, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
On Thu, May 8, 2025 at 1:35 PM 'Reid Kleckner' via memory-safety-dev <memory-s...@chromium.org> wrote:
Maybe the solution here is for //base & PartitionAlloc to be built with `-fvisibility=hidden`? That would align the way things happen in Posix with the way Windows DLLs work.

Maybe Chrome should add checks that all DSOs it loads don't export any C++ symbols, so, nothing in .dynstr should start with `_*Z`.

Is that the problem here though?  I thought that the problem here is that if foo.dll uses //base+PartitionAlloc **internally** (without exploring anything to users of foo.dll), then pointers allocated by foo.dll cannot be freed by a **separate** (also **internal**) copy of //base+PartitionAlloc present in bar.dll?

So maybe we want kind of an opposite approach where instead of hiding symbols, we make **all** PartitionAlloc symbols publicly visible + weak (weak = I have no idea what I am talking about, but I hope that the runtime/dynamic linker can dedupe 2 separate definitions from foo.dll and bar.dll?).  Just brainstorming.


-Lukasz

Nico Weber

unread,
May 8, 2025, 5:55:25 PMMay 8
to Reid Kleckner, Greg Thompson, Daniel Cheng, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
On Thu, May 8, 2025 at 4:35 PM Reid Kleckner <r...@google.com> wrote:
Maybe the solution here is for //base & PartitionAlloc to be built with `-fvisibility=hidden`? That would align the way things happen in Posix with the way Windows DLLs work.

Maybe Chrome should add checks that all DSOs it loads don't export any C++ symbols, so, nothing in .dynstr should start with `_*Z`.

I believe that's already the case, at least in some build configurations. https://source.chromium.org/chromium/chromium/src/+/main:chrome/app/framework.order?q=framework.order is the file that lists all exported symbols in the mac build.

Maybe partitionalloc defines operator new and friends as public visibility on linux so symbol overriding can work, or something else is special about partitionalloc.

Lukasza: Shouldn't that only be a problem if pointers are passed from one binary to the other? (I.e. the scenario grt says we're careful not to do on Windows)

Feels like we don't fully understand the problem here yet :)

Daniel Cheng

unread,
May 8, 2025, 6:22:03 PMMay 8
to Nico Weber, Reid Kleckner, Greg Thompson, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
It's true—I don't exactly understand what's going wrong. I was able to reproduce this quite deterministically using the GN args from the cast bot and running cast_audio_backend_unittests with --gtest_filter=StreamMixerDeathTest.BadJsonCrashes. I'd be happy to try to debug it further if people can give me concrete pointers on what to look for.

lukasza@, it might ultimately be an issue of "allocated by one place and freed from another", but the question remains "why is that happening?" Because this was memory allocated and deallocated without ever returning the event loop: it was allocated as part of returning an error string from the Rust portion of the JSON parser, and then deallocated when returning up the stack.

Daniel

Greg Thompson

unread,
May 9, 2025, 3:15:36 AMMay 9
to Daniel Cheng, Nico Weber, Reid Kleckner, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
I'm not sure the event loop has anything to do with it. In a static build, we generally can't have an allocation from one module get free'd in another. This is why, for example, the Windows SDK has these awkward things like SysFreeString -- various APIs that return strings allocated from within a system dll need to be free'd by that same system dll. To be clear, it is crashtastic to have a module that exports a function like this:

std::string DoSomeThingAndReturnAString();

Because when you innocently call it from another module in the same process:

{
  std::string result = DoSomeThingAndReturnAString();
}  // kablewie when `result` is destroyed.

Is that what's happening in StreamMixerDeathTest.BadJsonCrashes?

Daniel Cheng

unread,
May 9, 2025, 11:57:39 AMMay 9
to Greg Thompson, Nico Weber, Reid Kleckner, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
On Fri, 9 May 2025 at 00:15, Greg Thompson <g...@chromium.org> wrote:
I'm not sure the event loop has anything to do with it. In a static build, we generally can't have an allocation from one module get free'd in another. This is why, for example, the Windows SDK has these awkward things like SysFreeString -- various APIs that return strings allocated from within a system dll need to be free'd by that same system dll. To be clear, it is crashtastic to have a module that exports a function like this:

I understand this. My point is either the .so should be loaded, or it should not. It shouldn't randomly load halfway through JSON decoding and interpose symbols.
 

std::string DoSomeThingAndReturnAString();

Because when you innocently call it from another module in the same process:

{
  std::string result = DoSomeThingAndReturnAString();
}  // kablewie when `result` is destroyed.

Is that what's happening in StreamMixerDeathTest.BadJsonCrashes?

It could be, but it's certainly not intentional if it is—the unit test binary is statically linked, and certainly doesn't mean to use some of the .so's //base and some of its own //base.

Daniel

Wez

unread,
May 12, 2025, 11:19:11 AMMay 12
to Daniel Cheng, Greg Thompson, Nico Weber, Reid Kleckner, Egor Pasko, memory-s...@chromium.org, Stephen Nusko, rust...@chromium.org, ta...@chromium.org
This specific .so has a couple of Foo::Create() APIs that return new Foo's that the calling code stashes into a unique_ptr<Foo>:

OzonePlatform* CreateOzonePlatformCast() {
  const std::vector<std::string>& argv =
      base::CommandLine::ForCurrentProcess()->argv();
  std::unique_ptr<chromecast::CastEglPlatform> platform(
      chromecast::CastEglPlatformShlib::Create(argv));
  return new OzonePlatformCast(std::move(platform));
}

However, I can't see where that specifical example would actually be released.  That certainly violates the same-allocator rule, though.

Re the mismatched symbols: If a dynamic library is loaded with RTLD_LAZY then symbols are lazily resolved when first referenced, which would mean that a program that had (somehow) managed only to allocate, never release, memory by the time it loaded a .so could conceivably resolve the free against the newly loaded .so. Given the normal symbol lookup ordering, I don't know how one could provoke that, though.




Reply all
Reply to author
Forward
0 new messages