Question about WebMCP logic

290 views
Skip to first unread message

정병준

unread,
Feb 20, 2026, 7:55:44 AM (6 days ago) Feb 20
to Chrome Built-in AI Early Preview Program Discussions

Hello,

I have a conceptual question while testing WebMCP.

From my understanding, WebMCP exposes tools based on the current page's context (or current DOM state). My question is: can the agent access or infer tools/components from states or pages that are not currently rendered?

For example, suppose an agent is navigating a food delivery app, and the current page shows a list of shops. Can the agent evaluate and select a shop based on a specific menu item it offers, even if those menu items belong to a different page routing and aren't exposed in the current shop list view?

I want to make sure I am understanding the WebMCP context scope correctly. If I have any misconceptions about this logic, please point them out.

Thanks,
Jun
Agentic AI Researcher, FCLab@skku

Yash Kumar Gupta

unread,
Feb 20, 2026, 12:34:47 PM (6 days ago) Feb 20
to Chrome Built-in AI Early Preview Program Discussions, 정병준
One way I have done this is that current page exposes all the tools and for ones which are outside this page context, these tool handlers know that they need to navigate first and then trigger the required functionality. 
In your example of food delivery shop, you would have to ensure current page context provides a tool to get the required menu information of each shop which might mean navigate first

That's my take on it but I will let others chime in as well.

Kathy Hurchla

unread,
Feb 20, 2026, 5:24:30 PM (6 days ago) Feb 20
to Yash Kumar Gupta, Chrome Built-in AI Early Preview Program Discussions, 정병준
My simple answer is no, but there's hope. For the imperative API, once a tool is called, its `execute` callback should access your full application surface available at that moment, but JavaScript still needs access to menu items, so this depends heavily on how you store them.

If your menu items only exist in the DOM after a component renders on another page, you need another way to expose that data to JavaScript.

A direct service layer (API or queryable alternative) would be ideal, e.g., going directly to the same source a component uses. But I know that was just an illustrative example, and the data may be hard-coded or a document or something. You'd need to find a way to get at it and decouple it from pages if you want to access it elsewhere via tools.

Shared state stores would only help if they'd already visited that other page, and would be too brittle.

Kathy Hurchla

AI Engineering & Data Science Lead



--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/d4c7dcda-6506-4712-9c05-332ccb6de8cdn%40chromium.org.

정병준

unread,
Feb 20, 2026, 8:48:24 PM (6 days ago) Feb 20
to Chrome Built-in AI Early Preview Program Discussions, Yash Kumar Gupta, 정병준

Thanks for sharing your thoughts!

Yes, our team has actually considered that approach before. However, the main issue we found is that it consumes a significant amount of tokens. That's exactly why we wanted to ask the Google team about their recommended approach or best practices for this scenario.

Thanks again for the insights!
Jun

2026년 2월 21일 토요일 AM 2시 34분 47초 UTC+9에 Yash Kumar Gupta님이 작성:

정병준

unread,
Feb 20, 2026, 8:56:33 PM (6 days ago) Feb 20
to Chrome Built-in AI Early Preview Program Discussions, Kathy Hurchla, Chrome Built-in AI Early Preview Program Discussions, 정병준

Hi Kathy,

Thank you for the clear explanation!

I understand completely. So the key is decoupling the data from the UI—meaning the agent's tools should directly access the service layer (like an API) to fetch the required data, rather than relying on the DOM of a rendered page.

This makes perfect sense and gives our team a great architectural direction to optimize tool usage without the overhead of unnecessary navigation.

Thanks again for your time and insights!

Best, 

Jun

2026년 2월 21일 토요일 AM 7시 24분 30초 UTC+9에 Kathy Hurchla님이 작성:

정성우

unread,
Feb 24, 2026, 1:55:00 AM (3 days ago) Feb 24
to Chrome Built-in AI Early Preview Program Discussions, 정병준, Kathy Hurchla, Chrome Built-in AI Early Preview Program Discussions

I really like the direction of decoupling data from the UI.

To me, the core issue isn't just DOM vs Service layer, but whether the agent is reasoning over rendered surfaces or a stable domain state.

If we expose a shared "world state" (or snapshot) that lives outside of routing and rendering, both the UI and the agent simply become projections of that same underlying model. In this setup, tools would trigger deterministic state transitions instead of relying on imperative callbacks tied to the page lifecycle.

This seems like a great way to reduce navigation overhead and token usage. Instead of traversing UI layers to figure out what it can do, the agent could just evaluate the structured state and its derived affordances.

Curious if the WebMCP team is looking into this kind of domain-level state abstraction, rather than keeping things strictly scoped to page-bound context.

Best,

Jung

Hee Jae Kim

unread,
Feb 24, 2026, 2:52:42 AM (3 days ago) Feb 24
to Chrome Built-in AI Early Preview Program Discussions, 정성우, 정병준, Kathy Hurchla, Chrome Built-in AI Early Preview Program Discussions

Great thread. I ran into the same question and ended up building a demo around it using the Imperative API.

Jun's original question - can an agent access data from states it can't currently see - is really about where the source of truth lives. Kathy nailed it: decouple data from the UI and let tools talk to the service layer directly. Jung took it further: if both the UI and the agent are projections of the same underlying state, tools become deterministic state transitions rather than page-bound callbacks.

WebMCP Blackjack (https://webmcp-blackjack.heejae.dev) is a small implementation of that pattern using the Imperative API. Three agents - a human player, an AI player, and an AI dealer - share one page with a single game state object. Each tool's execute callback reads directly from that state, not from the DOM. get_my_hand returns the AI player's hand during their turn, but the dealer's full hand during the dealer's turn. The app calls clearContext() / registerTool() at each phase transition to swap the tool surface per role.

This is basically an implicit finite state machine: betting → player_turn → ai_turn → dealer_turn → settling, where each state defines which tools are available and to whom. Making that explicit - a formal FSM where each state declares its tool set and valid transitions - could be a useful pattern for more complex multi-agent WebMCP apps.

It's single-page, so it doesn't address the cross-page navigation problem directly. But the underlying pattern - state lives outside the UI, imperative tools project from it, and an FSM governs which tools are exposed when - should apply the same way across routes.

Source: https://github.com/happyhj/webmcp-blackjack


Dealers_turn.gif

Kathy Hurchla

unread,
Feb 24, 2026, 5:01:28 PM (2 days ago) Feb 24
to Hee Jae Kim, Chrome Built-in AI Early Preview Program Discussions, 정성우, 정병준
Yeah, super interesting thread here, thanks all. It's nice not to navigate this stuff alone.

The blackjack demo got me thinking about coordinating clearContext() / registerTool() cycles with the A2UI / AG-UI rendering I'm using to repaint the UI.

The challenge I'm grappling with now is how to efficiently signal a client-agent that tools have changed. An agent should be able to offer proactive guidance to the human, informed by tools available. I can't assume a human will always read the updated UI and prompt their agent to do something, causing the agent to check for a suitable tool. 

At the start of a session my client-agent runs a `list_tools` or similar discovery tool, which I know is registered at the site, but I don't want to run that repeatedly during a session. As far as I've read, WebMCP does not include a notification spec for signaling to a client that tools have changed, unlike the standard MCP's `tools/list_changed` notification.

I'm testing sending a JSON back to the client-agent in the `Promise<any>` return type when each tool execute callback resolves, and enclosing the currently registered tools there. My goal is for the client to receive an updated tools list context with any tool completion, ready to combine with the user's intent to decide the next tool to call.
Has anyone effectively done this, or hit roadblocks with a similar approach? Since the promise is un-typed (i.e., <any>), it seems possible to use JSON there. I'd like to better understand the browser's role in "arbitrating" this model context between a client agent and a website registering tools to that context. The source I'm referencing for that approach is: https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md

Couple overall thoughts:
First, too many tools available at once can decrease an LLM's effectiveness and consistency in selecting an appropriate tool, assuming the typical tool calling pattern of passing tools into prompt context is used. This has been my experience, so I don't want to expose every tool capability of a site at any given time to an agent, whether I'm working with internal or external agents. So, I'm leaning toward an approach that can scale with the number of tools.

Also, to be clear, in my earlier reply I wasn't advocating to never navigate. I'm still developing an opinion on where page-level navigation may still have a role, as I test user journey scenarios. I see a combination of approaches in the future web.

Kathy Hurchla

AI Engineering & Data Science Lead


Štefan Balog

unread,
Feb 24, 2026, 5:11:18 PM (2 days ago) Feb 24
to Kathy Hurchla, Hee Jae Kim, Chrome Built-in AI Early Preview Program Discussions, 정성우, 정병준
Co tu riešiťe keď Slovensko a vláda je IT Walley mafia? 

World Rescue Organization

Dňa ut, 24. feb 2026, 23:01 Kathy Hurchla <kathy....@fantasy.co> napísal(a):

정성우

unread,
Feb 24, 2026, 7:16:31 PM (2 days ago) Feb 24
to Chrome Built-in AI Early Preview Program Discussions, Štefan Balog, Hee Jae Kim, Chrome Built-in AI Early Preview Program Discussions, 정성우, 정병준, Kathy Hurchla

Thanks for your sharing expriences.

Returning the updated tools in the Promise<any> resolution is a very clever workaround for the missing tools/list_changed notification. It essentially builds an implicit state machine where each action yields the next valid context.

However, I wonder if this approach might run into severe synchronization issues in a multi-actor environment. If a human user interacts with the UI (e.g., clicks a tab) or a system event updates the screen, the available tools change. But since the agent didn't initiate an action, it receives no Promise resolution. The agent's context would immediately fall out of sync with the actual UI state.

This limitation really highlights the friction of mapping MCP tools 1:1 with UI components like buttons or forms. When we expose UI controls as tools, we force the LLM to navigate layout logic, which inevitably leads to the tool bloat and context drift you mentioned.

What if we shift the abstraction? 

Instead of syncing the agent with the UI elements, we could sync it with the human intents relevant to the current state.

In this setup, tools aren't "page-bound callbacks"—they are deterministic state transition vectors.

 

For example, rather than registering granular UI-bound tools like fill_title_field, check_priority_box, and click_submit, you only expose a semantic, domain-level intent: create_task(title, priority). 

 The LLM simply translates natural language into this intent vector. The underlying domain logic executes it, and the UI passively projects the new state.

This naturally solves the core issues:

  1. Multi-actor Sync: The UI and the agent are both just projections of a shared Domain State. When the state changes (by human or AI), the new valid intents are computed from that single source of truth.

  2. Focused Context: You only expose high-level semantic intents valid for that exact moment, keeping the tool list incredibly focused.

  3. O(1) Execution: The LLM doesn't have to loop through 5-10 turns of UI navigation. It just fires the intent, drastically reducing token usage and avoiding context drift.

I think moving the tool abstraction down to a shared domain state layer—focusing on what the user wants to do rather than how the UI renders it—is a really powerful path forward. Curious to hear your thoughts on this!

Kathy Hurchla

unread,
12:21 PM (3 hours ago) 12:21 PM
to 정성우, Chrome Built-in AI Early Preview Program Discussions, Štefan Balog, Hee Jae Kim, 정병준
Hi 정성우,
... only expose a semantic, domain-level intent: create_task(title, priority)

That sounds well-suited for building an application (website) that securely generates code in real-time rather than executing pre-defined tools. On the website's side that 1 tool's execution could essentially be a wrapper around an AI harness for response generation, using web workers or a container, to build whatever is necessary to meet the user-agent's current intent. If you follow the 'LLMs are better at code than JSON' research. And the human's agent (the client) injects the semantic specifics representing intent at tool call time.

Grouping tools is another potential approach (worth a read: https://github.com/webmachinelearning/webmcp/issues/7). The core Group primitive from MCP isn't in WebMCP now, but might be useful for constantly re-scoping the registered tools context to only relevant tools at the time, such as those within one or more intent-centered groups. Maybe not exactly as proposed in that issue. A search or inference run over the available groups would need to happen somewhere, so efficiency would depend on the implementation.

And thank you: 
...(using Promise<any> resolution for tools/list_changed)... If a human user interacts with the UI (e.g., clicks a tab) or a system event updates the screen, the available tools change. But since the agent didn't initiate an action, it receives no Promise resolution. The agent's context would immediately fall out of sync with the actual UI state.
Yes! I didn't think that far ahead, clearly! I absolutely want to preserve the human agency to interact directly if they choose to. I now see there's an open issue proposing this also, which not surprisingly isn't getting traction. As you pointed out this solution is too narrow with WebMCP's focus on keeping a human present.

Kathy Hurchla

AI Engineering & Data Science Lead


Reply all
Reply to author
Forward
0 new messages