Hello,
I have a conceptual question while testing WebMCP.
From my understanding, WebMCP exposes tools based on the current page's context (or current DOM state). My question is: can the agent access or infer tools/components from states or pages that are not currently rendered?
For example, suppose an agent is navigating a food delivery app, and the current page shows a list of shops. Can the agent evaluate and select a shop based on a specific menu item it offers, even if those menu items belong to a different page routing and aren't exposed in the current shop list view?
I want to make sure I am understanding the WebMCP context scope correctly. If I have any misconceptions about this logic, please point them out.
Thanks,
Jun
Agentic AI Researcher, FCLab@skku
Kathy Hurchla
AI Engineering & Data Science Lead
--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/d4c7dcda-6506-4712-9c05-332ccb6de8cdn%40chromium.org.
Thanks for sharing your thoughts!
Yes, our team has actually considered that approach before. However, the main issue we found is that it consumes a significant amount of tokens. That's exactly why we wanted to ask the Google team about their recommended approach or best practices for this scenario.
Thanks again for the insights!
Jun
Hi Kathy,
Thank you for the clear explanation!
I understand completely. So the key is decoupling the data from the UI—meaning the agent's tools should directly access the service layer (like an API) to fetch the required data, rather than relying on the DOM of a rendered page.
This makes perfect sense and gives our team a great architectural direction to optimize tool usage without the overhead of unnecessary navigation.
Thanks again for your time and insights!
Best,
Jun
I really like the direction of decoupling data from the UI.
To me, the core issue isn't just DOM vs Service layer, but whether the agent is reasoning over rendered surfaces or a stable domain state.
If we expose a shared "world state" (or snapshot) that lives outside of routing and rendering, both the UI and the agent simply become projections of that same underlying model. In this setup, tools would trigger deterministic state transitions instead of relying on imperative callbacks tied to the page lifecycle.
This seems like a great way to reduce navigation overhead and token usage. Instead of traversing UI layers to figure out what it can do, the agent could just evaluate the structured state and its derived affordances.
Curious if the WebMCP team is looking into this kind of domain-level state abstraction, rather than keeping things strictly scoped to page-bound context.
Best,
Jung
Great thread. I ran into the same question and ended up building a demo around it using the Imperative API.
Jun's original question - can an agent access data from states it can't currently see - is really about where the source of truth lives. Kathy nailed it: decouple data from the UI and let tools talk to the service layer directly. Jung took it further: if both the UI and the agent are projections of the same underlying state, tools become deterministic state transitions rather than page-bound callbacks.
WebMCP Blackjack (https://webmcp-blackjack.heejae.dev) is a small implementation of that pattern using the Imperative API. Three agents - a human player, an AI player, and an AI dealer - share one page with a single game state object. Each tool's execute callback reads directly from that state, not from the DOM. get_my_hand returns the AI player's hand during their turn, but the dealer's full hand during the dealer's turn. The app calls clearContext() / registerTool() at each phase transition to swap the tool surface per role.
This is basically an implicit finite state machine: betting → player_turn → ai_turn → dealer_turn → settling, where each state defines which tools are available and to whom. Making that explicit - a formal FSM where each state declares its tool set and valid transitions - could be a useful pattern for more complex multi-agent WebMCP apps.
It's single-page, so it doesn't address the cross-page navigation problem directly. But the underlying pattern - state lives outside the UI, imperative tools project from it, and an FSM governs which tools are exposed when - should apply the same way across routes.
Source: https://github.com/happyhj/webmcp-blackjack

Thanks for your sharing expriences.
Returning the updated tools in the Promise<any> resolution is a very clever workaround for the missing tools/list_changed notification. It essentially builds an implicit state machine where each action yields the next valid context.
However, I wonder if this approach might run into severe synchronization issues in a multi-actor environment. If a human user interacts with the UI (e.g., clicks a tab) or a system event updates the screen, the available tools change. But since the agent didn't initiate an action, it receives no Promise resolution. The agent's context would immediately fall out of sync with the actual UI state.
This limitation really highlights the friction of mapping MCP tools 1:1 with UI components like buttons or forms. When we expose UI controls as tools, we force the LLM to navigate layout logic, which inevitably leads to the tool bloat and context drift you mentioned.
What if we shift the abstraction?
Instead of syncing the agent with the UI elements, we could sync it with the human intents relevant to the current state.
In this setup, tools aren't "page-bound callbacks"—they are deterministic state transition vectors.
For example, rather than registering granular UI-bound tools like fill_title_field, check_priority_box, and click_submit, you only expose a semantic, domain-level intent: create_task(title, priority).
The LLM simply translates natural language into this intent vector. The underlying domain logic executes it, and the UI passively projects the new state.
This naturally solves the core issues:
Multi-actor Sync: The UI and the agent are both just projections of a shared Domain State. When the state changes (by human or AI), the new valid intents are computed from that single source of truth.
Focused Context: You only expose high-level semantic intents valid for that exact moment, keeping the tool list incredibly focused.
O(1) Execution: The LLM doesn't have to loop through 5-10 turns of UI navigation. It just fires the intent, drastically reducing token usage and avoiding context drift.
I think moving the tool abstraction down to a shared domain state layer—focusing on what the user wants to do rather than how the UI renders it—is a really powerful path forward. Curious to hear your thoughts on this!
... only expose a semantic, domain-level intent: create_task(title, priority)
...(using Promise<any> resolution for tools/list_changed)... If a human user interacts with the UI (e.g., clicks a tab) or a system event updates the screen, the available tools change. But since the agent didn't initiate an action, it receives no Promise resolution. The agent's context would immediately fall out of sync with the actual UI state.
Kathy Hurchla
AI Engineering & Data Science Lead