Hello!
I wanted to raise some concerns and a discussion about maintaining coherent user-agent experience for flows on a web page.
Trusting authors of web pages to be responsible for keeping the capabilities of agents similar to the capabilities of users is a difficult practice. It's error-prone, can cause security risks due to imparities in APIs and permissions, and can easily be neglected as it's not trivial for humans to verify agent behaviors.
I come from the web accessibility domain, where we see these issues all the time. Websites tend to neglect users with disabilities, even when they really want to help them, just because it has a lot of overhead and complex requirements. It's cumbersome for developers and designers to use assistive technology like screen readers, and the UX for them is often harsh.
In order to avoid some pitfalls in the future of web agents, I believe that attempting to bind between native HTML and WebMCP is critical. Having a clear spec that is already widely supported would make the transition to AI first experience smooth, make discoverability simpler, and reduce maintenance costs. Just like with accessibility, when custom JavaScript is used instead of semantic HTML, the DOM and available flows can be left unattended.
What are you thinking about this direction?