Questions About Prompt API vs. Summarizer API (Context Window, Model Behavior, and Performance)

Kevin Weitgenant

unread,

Oct 24, 2025, 10:08:22 AMOct 24

to Chrome Built-in AI Early Preview Program Discussions

Hi everyone,

I’m looking for some technical clarifications regarding the differences between the Prompt API and the Summarizer API for Gemini Nano. Specifically, I have three questions:

1. Context Window Size

In the thread “Maximum Token Limits for Gemini Nano APIs”, it was mentioned that:

“For the Prompt API … session can retain the last 4096 tokens … For the Summarizer API: The context window is currently limited to 1024 tokens but we use about 26 of those under the hood. Thanks to your feedback … we are exploring how to expand this feature to 4096 tokens.”
(Source)

Has the Summarizer API’s input/context window been expanded to match the Prompt API’s ~4096-token capacity, or does it remain limited to around 1024 tokens for summarize() operations?

2. Model Architecture & Download Behavior

From the documentation, it seems that both APIs rely on Gemini Nano, which is downloaded to the device upon first use. I’ve also seen references to fine-tuning or adapter layers (e.g., LoRA) used in task-specific APIs.

If a developer uses both the Prompt API and the Summarizer API in the same environment, does this result in:

a single shared instance of Gemini Nano being downloaded and used by both APIs, or
separate model assets (for example, a base Gemini Nano plus a summarization-specific fine-tuned adapter) being downloaded and maintained for the Summarizer API?

3. Performance & Speed Differences

Are there any measurable differences in latency or throughput between the Summarizer API and the Prompt API when performing summarization tasks?

In other words, does the Summarizer API provide any performance benefits beyond the built-in prompt optimization, or is it mainly a simplified interface with a smaller context window?

Thanks in advance for any clarifications you can provide. At the moment, I’m leaning toward using only the Prompt API

Thomas Steiner

unread,

Oct 24, 2025, 10:23:05 AMOct 24

to Kevin Weitgenant, Chrome Built-in AI Early Preview Program Discussions

Hi Kevin,

On Fri, Oct 24, 2025 at 4:08 PM Kevin Weitgenant <kevin.we...@gmail.com> wrote:

Hi everyone,
I’m looking for some technical clarifications regarding the differences between the Prompt API and the Summarizer API for Gemini Nano. Specifically, I have three questions:
1. Context Window Size
In the thread “Maximum Token Limits for Gemini Nano APIs”, it was mentioned that:

“For the Prompt API … session can retain the last 4096 tokens … For the Summarizer API: The context window is currently limited to 1024 tokens but we use about 26 of those under the hood. Thanks to your feedback … we are exploring how to expand this feature to 4096 tokens.”
(Source)

Has the Summarizer API’s input/context window been expanded to match the Prompt API’s ~4096-token capacity, or does it remain limited to around 1024 tokens for summarize() operations?

You can just find out dynamically and should never hard-code previously communicated values:

(await Summarizer.create()).inputQuota

// 6000

(await LanguageModel.create()).inputQuota

// 9216

2. Model Architecture & Download Behavior
From the documentation, it seems that both APIs rely on Gemini Nano, which is downloaded to the device upon first use. I’ve also seen references to fine-tuning or adapter layers (e.g., LoRA) used in task-specific APIs.
If a developer uses both the Prompt API and the Summarizer API in the same environment, does this result in:

a single shared instance of Gemini Nano being downloaded and used by both APIs, or

Yes, Summarizer and LanguageModel (Prompt API) use the same base model. The Summarizer then uses a tailored system prompt on top. You can find out which by debugging Gemini Nano (the approach works for the Summarizer as well).

separate model assets (for example, a base Gemini Nano plus a summarization-specific fine-tuned adapter) being downloaded and maintained for the Summarizer API?

In the past, the Summarizer used LoRA to imrpove its response behavior, but not anymore. The only API based on Gemini Nano now that uses LoRA is the Proofreader.

Also see my recent article for details.

3. Performance & Speed Differences
Are there any measurable differences in latency or throughput between the Summarizer API and the Prompt API when performing summarization tasks?
In other words, does the Summarizer API provide any performance benefits beyond the built-in prompt optimization, or is it mainly a simplified interface with a smaller context window?

The Summarizer helps you insofar as that it has specific summarization types already built-in (like tldr, headline, etc.) and defined lengths and formats. With the Prompt API you would have to build this yourself.

Thanks in advance for any clarifications you can provide. At the moment, I’m leaning toward using only the Prompt API

If there's a specific task API, as in this case, I'd recommend you use it. The reason is that, (i) the Summarizer has already shipped, so it's a stable API you can rely upon, and (ii) if at any point there'll be a perfect summarization AI model released, browser vendors like Chrome will just update the underlying implementation and all users of the Summarizer API will profit. This won't be the case for the Prompt API, where, of course likewise new better model versions might be released, but they may not necessarily be better at summarizing (they well may be, but it's not given).

Cheers,

Tom

--

Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.com, toot.cafe/@tomayac)

Google Spain, S.L.U.

Torre Picasso, Pl. Pablo Ruiz Picasso, 1, Tetuán, 28020 Madrid, Spain

CIF: B63272603

Inscrita en el Registro Mercantil de Madrid, sección 8, Hoja M-435397 Tomo 24227 Folio 25

----- BEGIN PGP SIGNATURE -----

Version: GnuPG v2.4.8 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck

0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.

----- END PGP SIGNATURE -----

Kevin Weitgenant

unread,

Oct 29, 2025, 9:28:02 AMOct 29

to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Kevin Weitgenant

Hi Thomas,

Thanks a lot for the detailed response. I really appreciate how quickly you replied. I read it the same day but forgot to follow up!

For my use case, I think I’ll stick with the Prompt API since having finer control over prompts and access to a larger context window fits better with what I’m building. Still, your points about the Summarizer API’s stability and future-proofing make total sense, and I’ll definitely keep that in mind moving forward.

Thanks again for taking the time to clarify everything!

Thomas Steiner

unread,

Oct 29, 2025, 2:30:52 PMOct 29

to Kevin Weitgenant, Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner

You're very welcome :-) Thanks for the kind words! That being said, I'll check back with engineering if there's a reason for the limited `inputQuota` for Summarizer. It does use a system prompt internally and needs to leave space for the contexts, but it seems like we may have leeway to reduce the buffer space here.

Thomas Steiner

unread,

Oct 29, 2025, 2:41:45 PMOct 29

to Thomas Steiner, Kevin Weitgenant, Chrome Built-in AI Early Preview Program Discussions

On Wed, Oct 29, 2025 at 7:30 PM Thomas Steiner <to...@google.com> wrote:

You're very welcome :-) Thanks for the kind words! That being said, I'll check back with engineering if there's a reason for the limited `inputQuota` for Summarizer. It does use a system prompt internally and needs to leave space for the contexts, but it seems like we may have leeway to reduce the buffer space here.

My theory is that the lower Summarizer token limit is a leftover from the previously lower Prompt API token limit. See https://crbug.com/456164614 for details.

Reply all

Reply to author

Forward