Maximum Token Limits for Gemini Nano APIs

641 views
Skip to first unread message

Shashank Muthuraj

unread,
Nov 7, 2024, 5:18:54 PM11/7/24
to Chrome Built-in AI Early Preview Program Discussions
Hi, I'm curious about the maximum token limits for the Gemini Nano APIs, especially the Prompt, Summarise, and Writer APIs. Could someone provide details or point me to the relevant documentation? Thanks!

Thomas Steiner

unread,
Nov 7, 2024, 5:27:23 PM11/7/24
to Shashank Muthuraj, Chrome Built-in AI Early Preview Program Discussions
Hi Shashank,

This information should all be in the various API docs linked to from https://docs.google.com/document/d/18otm-D9xhn_XyObbQrc1v7SI-7lBX3ynZkjEpiS1V04/edit?usp=drivesdk

For example, for the Prompt API:

At the moment, there is a per prompt limit of 1024 tokens, and the session can retain the last 4096 tokens. We are discussing ways to simplify this so that you will be able to use the 4096 tokens as you wish (e.g. in one prompt, or over several prompts without limits on each prompt).

Or for the Summarizer API:

The context window is currently limited to 1024 tokens but we use about 26 of those under the hood. 
Thanks to your feedback via our second survey, we are exploring how to expand this feature to 4096 tokens, which should meet most developers' needs while maintaining performance and resource usage.

Cheers,
Tom

On Thu, Nov 7, 2024, 23:18 Shashank Muthuraj <muthuraj...@gmail.com> wrote:
Hi, I'm curious about the maximum token limits for the Gemini Nano APIs, especially the Prompt, Summarise, and Writer APIs. Could someone provide details or point me to the relevant documentation? Thanks!

--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/fa4b261d-8c0a-4b66-9f76-70170224ea96n%40chromium.org.

Vinayak Nigam

unread,
Nov 27, 2024, 3:15:00 AM11/27/24
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Shashank Muthuraj
I am a beginner in this so I wanted to ask that according to here, we can type system prompts but do these system prompt gets counted in per-prompt(1024 token) limit or just once in the global context window(of 4096 tokens)?

Thomas Steiner

unread,
Nov 27, 2024, 3:57:46 AM11/27/24
to Vinayak Nigam, Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Shashank Muthuraj
Hi Vinayak,

This is mentioned in a section of the explainer:

Tokenization, context window length limits, and overflow

A given language model session will have a maximum number of tokens it can process. Developers can check their current usage and progress toward that limit by using the following properties on the session object:

console.log(`${session.tokensSoFar}/${session.maxTokens} (${session.tokensLeft} left)`);

To know how many tokens a string will consume, without actually processing it, developers can use the countPromptTokens() method:

const numTokens = await session.countPromptTokens(promptString);

Some notes on this API:

  • We do not expose the actual tokenization to developers since that would make it too easy to depend on model-specific details.
  • Implementations must include in their count any control tokens that will be necessary to process the prompt, e.g. ones indicating the start or end of the input.
  • The counting process can be aborted by passing an AbortSignal, i.e. session.countPromptTokens(promptString, { signal }).

It's possible to send a prompt that causes the context window to overflow. That is, consider a case where session.countPromptTokens(promptString) > session.tokensLeft before calling session.prompt(promptString), and then the web developer calls session.prompt(promptString) anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the system prompt, which is never removed. If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the prompt() or promptStreaming() call will fail with an "QuotaExceededError" DOMException and nothing will be removed.

Such overflows can be detected by listening for the "contextoverflow" event on the session:

session.addEventListener("contextoverflow", () => {
  console.log("Context overflow!");
});

 
Cheers,
Tom

--
Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.3 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Vinayak Nigam

unread,
Nov 27, 2024, 7:56:19 AM11/27/24
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Chrome Built-in AI Early Preview Program Discussions, Shashank Muthuraj, Vinayak Nigam
Thank you so much Thomas!
Reply all
Reply to author
Forward
0 new messages