Intent to Prototype: Web Translation API

1.189 visninger
Gå til det første ulæste opslag

Domenic Denicola

ulæst,
25. apr. 2024, 02.30.0525.04.2024
til blink-dev, Fergal Daly, Kenji Baheux

Contact emails

dom...@chromium.orgfer...@chromium.orgkenji...@chromium.org

Explainer

https://github.com/explainers-by-googlers/translation-api/blob/main/README.md

Specification

None yet, although the explainer does contain IDL which could help a bit

Summary

This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages.



Blink component

Blink

Motivation

Browsers are increasingly offering language translation to their users. Such translation capabilities can also be useful to web developers. This is especially the case when browser's built-in translation abilities cannot help, such as: - translating user input or other interactive features; - pages with complicated DOMs which trip up browser translation; - providing in-page UI to start the translation; or - translating content that is not in the DOM, e.g. spoken content. To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU.



Initial public proposal

https://github.com/WICG/proposals/issues/147

TAG review

https://github.com/w3ctag/design-reviews/issues/948

TAG review status

Pending

Risks



Interoperability and Compatibility

This feature has definite interoperability risks, including which languages are available across different browsers, how they are exposed, the quality of translations, and whether developers need the translations to be on-device or not. We can ameliorate some of these through API design, by making it clear that various methods might fail and that a fallback is required. Others, like translation quality, may end up as quality-of-implementation issues, similar to other machine learning-based APIs like shape detection.



Gecko: No signal (https://github.com/mozilla/standards-positions/issues/1015)

WebKit: No signal (https://github.com/WebKit/standards-positions/issues/339)

Web developers: No signals We have heard privately of this need from various partners. Publicly, we have a few thumbs-up on the WICG proposal but no substantive comments yet.

Other signals:

Activation

This feature would definitely benefit from having polyfills, backed by any of: cloud services, lazily-loaded on-device models using WebGPU, or the web developer's own server. We anticipate seeing an ecosystem of such polyfills grow as more developers experiment with this API.



WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

None



Debuggability

Basic tooling should be sufficient



Is this feature fully tested by web-platform-tests?

No

We hope to work on web platform tests for this feature, but how much we can guarantee as testable beyond the surface API is unclear. For example, since no specific languages are guaranteed to be supported, it's not clear we can actually test translations. APIs to mock the results might help here.



Flag name on chrome://flags

None yet, although we're working on one

Finch feature name

TranslationAPI

Requires code in //chrome?

True

Tracking bug

https://issues.chromium.org/issues/322229993

Estimated milestones

No milestones specified



Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5172811302961152

This intent message was generated by Chrome Platform Status.

Reilly Grant

ulæst,
25. apr. 2024, 17.18.3425.04.2024
til Domenic Denicola, blink-dev, Fergal Daly, Kenji Baheux
The specification could define Pig Latin as a mandatory test language with well-defined translation pairs with English.
Reilly Grant | Software Engineer | rei...@chromium.org | Google Chrome


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra8n%2BfTnOL502H8D6e2xXWT2zQj_2-gc6_8L4oBh1GWT5A%40mail.gmail.com.

Daniel Vogelheim

ulæst,
30. apr. 2024, 05.09.1030.04.2024
til Domenic Denicola, blink-dev, Fergal Daly, Kenji Baheux
Hi Domenic, et al.,

This intent came up in the OWP sec review today. We wonder whether there's XSS potential, and how input with plain text interspersed with tags is meant to be handled:

Several of the use cases seem to hint at the input being HTML strings (e.g. "pages with complicated DOM"). If the intended input would indeed be HTML strings, and the output is intended to be parsed & inserted into the DOM, then this basically implements a new XSS factory. In addition to the existing re-parsing risks, it would add new ones based on translation (e.g. "<schrift>" turning into "<script>"). The browser's built-in translation functionality can avoid this by only manipulating text nodes; but this would be difficult to replicate in a string-based API.

Can you clarify what happens with HTML tags in the input string, and whether that is a supported use case? Maybe the API can be reformulated to seperate string-based from HTML-based inputs?
It'd be good to add a note to the 'risks' section, so this isn't forgotten when this has taken a more concrete shape.

Thanks,
Daniel


--

Fergal Daly

ulæst,
30. apr. 2024, 05.31.0030.04.2024
til Daniel Vogelheim, Domenic Denicola, blink-dev, Kenji Baheux
On Tue, 30 Apr 2024 at 18:08, Daniel Vogelheim <voge...@google.com> wrote:
Hi Domenic, et al.,

This intent came up in the OWP sec review today. We wonder whether there's XSS potential, and how input with plain text interspersed with tags is meant to be handled:

Several of the use cases seem to hint at the input being HTML strings (e.g. "pages with complicated DOM"). If the intended input would indeed be HTML strings, and the output is intended to be parsed & inserted into the DOM, then this basically implements a new XSS factory. In addition to the existing re-parsing risks, it would add new ones based on translation (e.g. "<schrift>" turning into "<script>"). The browser's built-in translation functionality can avoid this by only manipulating text nodes; but this would be difficult to replicate in a string-based API.

"pages with complicated DOMs which trip up browser translation;" is referring to cases where the DOM is such that pages would rather handle their own translation. I.e. they would translate their own strings and insert them into their DOM. We would not expect pages to send HTML into this API. Anyone doing so is probably going to have a very bad time. We can rephrase that example to avoid giving the wrong impression, e.g. "pages with complicated structure".

In general, I would hope nobody would use the output of an AI API (translate, compose, etc) in this way but apart from warning them not to, I don't see how we can stop them, anymore than we can stop them `eval()`ing the result of a random `fetch()`,

F

Alex Russell

ulæst,
30. apr. 2024, 17.43.0830.04.2024
til Fergal Daly, Daniel Vogelheim, Domenic Denicola, blink-dev, Kenji Baheux
This effort seems worthwhile, and would like to see an explainer that discisses the various API options; that might provide some context for the security conversation.

Best,

Alex

Domenic Denicola

ulæst,
6. maj 2024, 23.54.3006.05.2024
til Alex Russell, Fergal Daly, Daniel Vogelheim, Domenic Denicola, blink-dev, Kenji Baheux
On Wed, May 1, 2024 at 6:43 AM Alex Russell <sligh...@chromium.org> wrote:
This effort seems worthwhile, and would like to see an explainer that discisses the various API options; that might provide some context for the security conversation.

Did you see the explainer linked in the original post? I'll post it here again: https://github.com/WICG/translation-api/blob/main/README.md

Alex Russell

ulæst,
15. maj 2024, 12.14.1315.05.2024
til Domenic Denicola, Fergal Daly, Daniel Vogelheim, blink-dev, Kenji Baheux
Ah, thanks. I'd missed that.

I don't see any considered alternatives in that doc. The streaming return value seems like it should, at a minimum, cause us to want to update the setHTML and innerHTML/innerText systems to handle stream assignments. Also, do streamed translations ever backtrack? E.g., do systems ever produce partial translations that they then change?

Best,

Alex

Domenic Denicola

ulæst,
15. maj 2024, 23.53.4415.05.2024
til Alex Russell, Domenic Denicola, Fergal Daly, Daniel Vogelheim, blink-dev, Kenji Baheux
On Thu, May 16, 2024 at 1:14 AM Alex Russell <sligh...@chromium.org> wrote:
Ah, thanks. I'd missed that.

I don't see any considered alternatives in that doc.

 
The streaming return value seems like it should, at a minimum, cause us to want to update the setHTML and innerHTML/innerText systems to handle stream assignments.

If you check out previous conversations on the subject, this is fairly complicated. Thankfully, it can be pursued orthogonally. Indeed, every time we add a streaming API to the platform, fulfilling this feature request would add even more convenience; but there's no blocking relationship between these two workstreams. (And since it's just convenience, I haven't yet seen a browser vendor prioritize the streaming-into-an-element workstream very highly.)
 
Also, do streamed translations ever backtrack? E.g., do systems ever produce partial translations that they then change?

Given our setting, with a single (non-streaming) input, this does not occur with the models we're aware of. Although there's plausibly some in which this can be the case.

Of course, if the input is allowed to be streaming, then indeed this can occur. Which is why we don't support streaming input for now.

Fergal Daly

ulæst,
17. jul. 2024, 01.28.0817.07.2024
til blink-dev, Domenic Denicola, Fergal Daly, Kenji Baheux
We are splitting the implementation of this into translate and language detection. So as well as the previous status entry, there is now a separate language detection API status entry,

F

On Thursday, April 25, 2024 at 3:30:05 PM UTC+9 Domenic Denicola wrote:
Svar alle
Svar til forfatter
Videresend
0 nye opslag