Need help - WASM OOM when downloading 500MB AI models in extension

Hitesh230

unread,

Jan 8, 2026, 9:05:59 AMJan 8

to Chromium Extensions

Building a local AI extension (Transformers.js + WebGPU).

Models are 500MB. Browser crashes with WASM OOM during download.

What's the recommended pattern for large model files in extensions?

Oliver Dunk

unread,

Jan 8, 2026, 9:23:52 AMJan 8

to Hitesh230, Chromium Extensions

Hi,

500MB is a reasonable size for a model that you could run locally in a Chrome Extension. Of course, this will only work if there is enough memory on the device - but there is no hard limitation you would be hitting.

If you're unable to get this working, it would be great if you could share a simple reproduction on GitHub and then we can take a closer look.

Oliver Dunk | DevRel, Chrome Extensions | https://developer.chrome.com/ | London, GB

--
You received this message because you are subscribed to the Google Groups "Chromium Extensions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-extens...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chromium-extensions/bebc3225-3864-44f0-a82a-df8a69e1111fn%40chromium.org.

Hitesh230

unread,

Jan 8, 2026, 9:31:27 AMJan 8

to Chromium Extensions, Oliver Dunk, Chromium Extensions, Hitesh230

Hi Oliver,

Thanks for the quick response! Let me clarify the technical issue in detail.

SETUP Chrome Extension (Manifest V3) using Transformers.js v3 + ONNX
Runtime Web Running in an Offscreen Document (required for
WASM/WebGPU) 3 models total:
SmolLM2-360M (Text Generation) - around 220MB GTE-small (Embeddings) -
around 45MB
Qwen2.5-0.5B (Classifier) - around 200MB

THE PROBLEM When loading models sequentially: Model 1 loads
successfully Model 2 loads successfully Model 3 crashes with:
RuntimeError: Aborted() from ort-wasm-simd-threaded.jsep.wasm

WHAT I'VE TRIED Sequential loading (not parallel) - still crashes
Calling .dispose() on previous model before loading next - still
crashes The WASM linear memory doesn't seem to be reclaimed after
dispose

THE CORE ISSUE Each ONNX Runtime session allocates into the same 4GB
WASM heap. Even after disposing Model 2, that memory isn't freed
before Model 3 tries to allocate. The heap fragments and eventually
OOMs.

MY QUESTION For extensions requiring multiple AI models, what's the
recommended architecture? Options I'm considering:

Separate Web Workers per model (each gets own WASM heap)?
Pre-download model files externally using a companion app?
Some other pattern Chrome recommends for multi-model AI?

Thanks!

Oliver Dunk

unread,

Jan 12, 2026, 8:33:35 AMJan 12

to Hitesh230, Chromium Extensions

Have you tried loading the models directly in your background service worker? That should support WASM and WebGPU.

I'm not aware of any hard limits that would prevent this from working in an offscreen document, but I'd be curious if that helps.

Oliver Dunk | DevRel, Chrome Extensions | https://developer.chrome.com/ | London, GB

Reply all

Reply to author

Forward