External memory issues with v8 13.2 and 13.3

156 views
Skip to first unread message

Dan Lapid

unread,
Jan 13, 2025, 8:48:43 AM1/13/25
to v8-...@googlegroups.com, erik...@chromium.org, Kenton Varda
Hi,
In V8 13.2 and 13.3 we see wasm isolates external memory usage blowing up sometimes (up to gigabytes).
Under V8 13.1 the same code would never ever use more than 80-100MB
The issue doesn't happen every time for the same wasm bytecode. It doesn't even reproduce locally.
But some significant percentage of the time it does happen.
This has only started happening in 13.2, what are we missing? Should we be enabling/disabling some flags?
It also seems that 13.3 is significantly worse in terms of error rate.
The problem happens under "--liftoff-only".
We use pointer compression but not sandbox.
We've tried enabling --turboshaft-wasm in 13.1 and the problem did not reproduce.
Has anything changed that we need to adapt to?
Would really appreciate your help!

Jakob Kummerow

unread,
Jan 13, 2025, 9:25:29 AM1/13/25
to v8-...@googlegroups.com, erik...@chromium.org, Kenton Varda
Sounds like a bug, but without more details (or a repro) I don't have a more specific guess than that.

If you're desperate, you could try to bisect it (even with a flaky repro). Or review the ~500 changes between those branches: https://chromium.googlesource.com/v8/v8/+log/branch-heads/13.1..branch-heads/13.2?n=10000


--

Erik Corry

unread,
Jan 13, 2025, 6:31:15 PM1/13/25
to Jakob Kummerow, v8-...@googlegroups.com, Kenton Varda
It looks like it's related to shared objects between isolates. Is there a newer document than https://docs.google.com/document/d/18lYuaEsDSudzl2TDu-nc-0sVXW7WTGAs14k64GEhnFg/edit?usp=drivesdk that describes how this works today? In particular cross-isolate GCs?

Kenton Varda

unread,
Jan 13, 2025, 7:09:41 PM1/13/25
to Erik Corry, Dan Lapid, Jakob Kummerow, v8-...@googlegroups.com
To add context here:

The problem appears to show up only after running in production for an hour or two. During that time we will have created thousands of isolates to handle millions of requests.

But the problem seems to affect *new* isolates, even when those isolates are loaded with applications that had been loaded into previous isolates without problems. Startup of an application should be 100% deterministic since we disallow any I/O during startup, but we're seeing that after the host has been running a while, new isolates are showing much higher "external memory" on startup. (E.g. 400MB external memory, but we enforce a 128MB limit on the whole isolate.)

We observed that the wasm native module cache causes identical wasm modules to be shared across isolates, and that wasm lazy compilation causes memory usage of a wasm module -- as accounted by all isolates that have loaded it -- to change.

Could it be that there is a memory leak in lazy compilation, such that these shared cached modules are gradually growing over time, to the point where new isolates that try to load these modules are being hit with extremely high "external memory" numbers right off the bat?

-Kenton

Jakob Kummerow

unread,
Jan 14, 2025, 6:59:56 AM1/14/25
to Kenton Varda, Erik Corry, Dan Lapid, v8-...@googlegroups.com
Erik: Shared GC is still only partially implemented and definitely not shipped (or usable), so that document is surely unrelated to whatever is going on here. All existing ways to share data between isolates (such as the NativeModule cache) use other mechanisms.

Kenton: I can't rule out anything. We admittedly don't have much test coverage for thousands-of-isolates scenarios. Perhaps the --trace-wasm-offheap-memory flag can help narrow it down a bit. It's currently only hooked up with the memory measurement API, so you'll either have to use that, or hack some more triggers into convenient places (perhaps isolate shutdown or creation?), see occurrences of v8_flags.print_wasm_offheap_memory_size for inspiration.

A few more ideas:
- from what you describe, perhaps it would be feasible to craft a reproducer. It'd probably have to be a custom V8 embedder that, in a loop, creates many fresh isolates and instantiates/runs the same (or several?) demo Wasm module in them.
- it could make sense to verify (with printfs in their destructors) that both Isolates and NativeModules get destroyed as expected. It's conceivable that the memory growth you're observing is intentional caching (of generated code, or something?) because the WasmEngine thinks that the cached data is still needed/useful.

How/where exactly are you seeing this increased "external memory"? I.e. what reporting system are you using to get memory consumption numbers?

Erik Corry

unread,
Jan 14, 2025, 8:00:40 AM1/14/25
to Jakob Kummerow, Kenton Varda, Dan Lapid, v8-...@googlegroups.com
The external memory is the one the internal heap knows about:

uint64_t Heap::external_memory() const { return external_memory_.total(); }

The following code in wasm-engine.cc:1015 attributes external memory to the isolate, in the From() call on the second-to-last line.

Is the native_module likely to be shared between isolates here, and long lived?

Could it be that it is gradually committing more code space, causing later isolates to get a higher external
memory size?

(does this backquoting work in email for fixed formatting?  Probably not).
```
  // Use the given shared {NativeModule}, but increase its reference count by
  // allocating a new {Managed<T>} that the {Script} references.
  size_t code_size_estimate = native_module->committed_code_space();
  size_t memory_estimate =
      code_size_estimate +
      wasm::WasmCodeManager::EstimateNativeModuleMetaDataSize(module);
  DirectHandle<Managed<wasm::NativeModule>> managed_native_module =
      Managed<wasm::NativeModule>::From(isolate, memory_estimate,
                                        std::move(native_module));
```

Jakob Kummerow

unread,
Jan 14, 2025, 9:23:10 AM1/14/25
to Erik Corry, Kenton Varda, Dan Lapid, v8-...@googlegroups.com
On Tue, Jan 14, 2025 at 2:00 PM Erik Corry <erik...@chromium.org> wrote:
The external memory is the one the internal heap knows about:

uint64_t Heap::external_memory() const { return external_memory_.total(); }

The following code in wasm-engine.cc:1015 attributes external memory to the isolate, in the From() call on the second-to-last line.

Is the native_module likely to be shared between isolates here, and long lived?

Yes, NativeModules are shared per process. They are primarily keyed on Wasm wire bytes, so if multiple Isolates instantiate the same Wasm module, they'll share a lot of memory (including generated code, and other engine-internal metadata) via the NativeModule. NativeModules are freed when no Wasm instance is keeping them alive any more.
 
Could it be that it is gradually committing more code space, causing later isolates to get a higher external
memory size?

Yes, absolutely: with lazy compilation, committed code space should at first be near-zero, and will grow over time as functions are called (for the first time, triggering lazy compilation) and eventually optimized (when they're sufficiently hot). That should be quite deterministic, and upper-bounded by a module-specific maximum (once everything is optimized and all inlining budgets are exhausted).

Also, none of this has changed recently, as far as I'm aware. I don't know how to explain the regression you're observing.
 
(does this backquoting work in email for fixed formatting?  Probably not).

It does not. But this specific snippet is sufficiently readable either way.

Kenton Varda

unread,
Jan 14, 2025, 10:10:33 AM1/14/25
to Jakob Kummerow, Erik Corry, Dan Lapid, v8-...@googlegroups.com
On Tue, Jan 14, 2025 at 5:59 AM Jakob Kummerow <jkum...@chromium.org> wrote:
- from what you describe, perhaps it would be feasible to craft a reproducer. It'd probably have to be a custom V8 embedder that, in a loop, creates many fresh isolates and instantiates/runs the same (or several?) demo Wasm module in them.

I tried exactly that yesterday, and was able to see that "external memory" was indeed correlated across isolates, but after creating/destroying thousands of isolates it seemed to converge on a reasonable number rather than keep growing forever.

But in prod we see something in external memory growing and growing.

Kenton Varda

unread,
Jan 15, 2025, 11:03:47 AM1/15/25
to Jakob Kummerow, Erik Corry, Dan Lapid, v8-...@googlegroups.com
By bisecting in production we determined that the problem is --flush_liftoff_code, which was enabled by default starting in 13.2. In our environment, this flag seems to leak memory that lives in the code cache and so affects newly-created isolates. I've filed a bug:

https://issues.chromium.org/issues/390075235

-Kenton

Erik Corry

unread,
Jan 17, 2025, 10:29:21 AM1/17/25
to Kenton Varda, Jakob Kummerow, Dan Lapid, v8-...@googlegroups.com
I added a test in:

https://chromium-review.googlesource.com/c/v8/v8/+/6182264

With --no-flush-liftoff-code the external memory stays at 15k. With the default --flush-liftoff-code the external memory rises until it hits 300k and the test fails. Hopefully this can help diagnose the issue.

Clemens Backes

unread,
Jan 17, 2025, 10:35:19 AM1/17/25
to v8-...@googlegroups.com, Kenton Varda, Jakob Kummerow, Dan Lapid
Thanks, we'll use that to investigate.

And I think we have all relevant folks on that bug, so let's keep all follow-up discussion in that bug.

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/v8-dev/CAHZxHpjNJWYQC9HoNtFon2k8mh79nYmTwMJpCn5K6woayHCBbQ%40mail.gmail.com.


--

Clemens Backes

Software Engineer

clem...@google.com

Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian   

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


Reply all
Reply to author
Forward
0 new messages