Proposal: Composite Tools for WebMCP — Capabilities over Clicks

62 views

Skip to first unread message

Idan Levin

unread,

Mar 14, 2026, 5:04:28 PMMar 14

to Chrome Built-in AI Early Preview Program Discussions

I wanted to share a proposal that addresses something I believe is a fundamental gap in the current WebMCP ecosystem: tool granularity — and the lack of guidance around how tools should be designed.

The problem:

When you run a typical WebMCP tool generator against a mid-size e-commerce site, you get 400+ tool definitions — one per DOM interaction (click, form submit, navigation link). Each tool maps to how the browser renders the UI, not to what the site actually does. An agent trying to fulfill "find me a waterproof jacket" has to discover and sequence ~6 atomic tools from a pool of 400+, with no orchestration support, no shared context, and no error handling between steps.

The current spec provides no guidelines for how tools should be designed. The implicit assumption seems to be that tools should mirror the UI — one tool per interactive element. But this approach fundamentally doesn't work for real agent workflows. It forces AI agents to reverse-engineer a site's information architecture from tool names alone, leading to hallucinated sequences, skipped steps, and fragile automations that break with every UI change.

The proposal:

Sites should expose composite, capability-level tools that map to user intents, not UI interactions. Instead of click_search → navigate_category → click_filter → click_apply, you'd register one search_products({ query, filters }) tool whose execute callback handles all the internal navigation and DOM manipulation.

I'm proposing a three-tier model:

• Tier 1 — Capability tools: One tool per user intent. ~20-30 tools per site instead of 400+. The site handles orchestration internally. This should be the recommended default.

• Tier 2 — Domain tools: Mid-level tools mapping to site sections or features. May require 2-3 calls to complete a journey.

• Tier 3 — Interaction tools: Current click-level tools. Should be reserved as a fallback for edge cases, not the default.

Concrete spec suggestions:

1. Add a "tier" field to ToolAnnotations so agents can understand tool granularity and prioritize higher-level tools.

2. Standardize structured return values (not just the string "Done") so agents can reason about results and chain calls intelligently.

3. Add optional next_actions hints in return values — non-binding wayfinding that tells the agent which tools are logical follow-ups, without adding orchestration to the spec itself.

4. Include a non-normative section with tool design guidelines for the ecosystem: default to capability-level, one tool per user intent, return structured data, and keep interaction tools as fallbacks.

In testing, this approach reduces a typical e-commerce site from ~400 interaction-level tools to ~22 capability-level tools — a 95% reduction — while making each tool self-documenting and resilient to UI redesigns.

I've written up the full analysis with a detailed tier breakdown, concrete tool examples, and proposed spec extensions. Happy to share the link if there's interest.

Would love to hear thoughts from the group, especially from those working on tool generators, framework integrations, or thinking about how agents should navigate tool sets at scale.

Best,

Idan

Reply all

Reply to author

Forward

0 new messages