The Embedding API is a proposed Web Platform API that allows developers to generate high-dimensional vector representations (embeddings) of content directly on the user's device.
By leveraging Chrome's on-device AI infrastructure and a shared on-device model, this API enables powerful semantic understanding features—such as semantic search, Retrieval-Augmented Generation (RAG), and content clustering. It eliminates the latency, cost, and privacy trade-offs of cloud services. Furthermore, compared to DIY client-side approaches, it provides significant user benefits (saving bandwidth and local storage by preventing each site from downloading its own massive model) and developer benefits (abstracting away complex model delivery and keeping WebAssembly/WebGPU frameworks up-to-date).
While existing web technologies like WebAssembly and WebGPU provide standardized, high-performance, and privacy-preserving execution environments, deploying an embedding model still forces developers into a difficult trade-off:
WebAssembly/WebGPU (DIY): Leads to significant storage and memory bloat, as every site must download its own multi-hundred megabyte model.
Cloud APIs: Introduce network latency, financial costs for developers, and require sending potentially sensitive user text to third-party servers.
By ensuring stateless execution and explicitly not persisting embeddings globally, an on-device API allows the browser to safely share a single, optimized model across all origins, drastically reducing the resource footprint while providing a simple, high-level JavaScript primitive for generalist developers.
Key Use Cases
Semantic Search: Enable note-taking or documentation apps to find content based on meaning rather than keywords, entirely offline and private.
On-Device RAG: Power local Q&A bots that retrieve relevant context from a user’s own data.
Real-time Content Intelligence: Provide proactive moderation hints or content categorization as a user types, before content is ever transmitted to a server.
Anticipated questions Here's a list of problems that we want to discuss with other browser vendors and the Web Machine Learning Community Group (WebML CG) as part of Standards to ensure interoperability (Note: the explainer lists more in the "Ensuring an Interoperable API Design" section)
Model and Space Choices: Exploring requirements for open-weight models and allowing developers to specify or provide their own models, to ensure compatibility with server-side embedding databases.
Content Mediation: Can we develop some sort of mediation when embeddings must be used server-side?