Intent to Prototype: Document Local Dictionary API

254 views
Skip to first unread message

Chromestatus

unread,
Jul 18, 2025, 6:08:19 AMJul 18
to blin...@chromium.org, ji...@igalia.com

Contact emails

ji...@igalia.com

Explainer

https://github.com/Igalia/explainers/tree/main/dictionary-api

Specification

None

Design docs


https://github.com/Igalia/explainers/tree/main/dictionary-api#-proposal

Summary

The proposed APIs enable users to modify the document local dictionary in the browser. Users can add, remove, and check words in the document local dictionary. This feature ensures the browser does not mark words in the document local dictionary as spelling errors.



Blink component

Blink>DOM

Motivation

Some words need to be added to the document custom dictionary so that the browser does not mark them as spelling errors. The added words need to be removed at some point if they aren't necessary. Current specs such as element.spellcheck attribute and ::spelling-error CSS pseudo-element manage the words already in the dictionary. Therefore, the new API would be needed to manipulate the document local dictionary.



Initial public proposal

None

TAG review

None

TAG review status

Pending

Risks



Interoperability and Compatibility

None



Gecko: No signal

WebKit: No signal

Web developers: No signals

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

None



Debuggability

None



Is this feature fully tested by web-platform-tests?

Yes

third_party/blink/web_tests/wpt_internal/dom/local-dictionary/* There is WIP patch which includes the tests



Flag name on about://flags

None

Finch feature name

None

Non-finch justification

None

Requires code in //chrome?

False

Tracking bug

https://issues.chromium.org/issues/428005649

Estimated milestones

No milestones specified



Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/6185007701557248?gate=4503614776934400

This intent message was generated by Chrome Platform Status.

Daniel Vogelheim

unread,
Jul 22, 2025, 7:37:28 AMJul 22
to ji...@igalia.com, sche...@chromium.org, blin...@chromium.org, Chromestatus
Hello,

This intent came up in security review, and I'm mostly confused:

- The explainer mostly seems to assume that these are stored in-memory, per-document. But it also talks about absence of cross-origin-requests; only to add info about CORS, which only makes sense for cross-origin requests.
- There are multiple references to loading data, but there is no explanation about what kind of network requests are being made when or where.
- The explainer suggests "Persistently store data" as an optimization for having to re-load large dictionaries. Again, no information about which requests are being optimized away.
- In "Data Storage" it is pointed out that CustomDictionaryEngine exists per renderer process. While renderer processes mostly don't have cross-origin data, they sometimes do. And they may hold multiple documents. This seems inconsistent with information being stored per-document.

Non-security feedback:
- Since this is a web-exposed API, I'd have expected some attempt at checking with other browser engines on support.
- I do not understand the "High-level Architecture". It seems to feature a stack of methods that feeds into yes/no decisions which feeds into a storage thing. I have no idea what this is meant to convey.
- Blink>DOM might not be the right component for this.


Could you please update the documentation to be more clear about where data is stored, and about which network requests are being made?


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/687a1d04.170a0220.2dad83.0168.GAE%40google.com.

Rick Byers

unread,
Jul 22, 2025, 10:55:05 AMJul 22
to Daniel Vogelheim, ji...@igalia.com, sche...@chromium.org, blin...@chromium.org, Chromestatus
FWIW I was also a little confused reading the explainer, but I think I understand the overall design and I think it's a good one: these dictionaries are transient and document-local, simply a mechanism to let pages selectively suppress spell check violations on their own page.

Presumably discussion of network fetches in the explainer are just about the app fetching from it's server (not fetches in the browser), and all the discussions of "persistent" storage are under the "future work" section so it's fine to me that there's no detail here (it's out of scope because it's hard). I'm not sure whether it would make sense to extend this design into persistent storage or not, but I'm also not sure it matters (as the explainer says it's simply an optimization - a problem that may or may not exist in practice so not worth worrying about today). 

Ensuring the data is reliably per-document is definitely a key implementation concern, so I agree with you there Daniel. And yes we'll eventually want signals from other browser vendors, but our process has that step only after prototyping is complete (often we learn a lot about the design from prototyping), so it's premature to ask for it now at I2P phase. 

Cheers,
  Rick 

Stephen Chenney

unread,
Jul 22, 2025, 11:11:59 AMJul 22
to Rick Byers, Daniel Vogelheim, ji...@igalia.com, blin...@chromium.org, Chromestatus
Thanks for the early feedback, and sorry for the lack of clarity on the explainer. We're working on improving the explainer to address the issues raised here and issues raised on github.

We're also considering an entirely different approach whereby a site provides a "spelling server" URL in the HTML header. That would operate more like the existing "send it to Google" spell checking options. We're super early in designing such a thing, but if anyone has early feedback on that approach we would be interested.

Cheers,
Stephen.

Rick Byers

unread,
Jul 22, 2025, 12:03:27 PMJul 22
to Stephen Chenney, Daniel Vogelheim, ji...@igalia.com, blin...@chromium.org, Chromestatus
Spelling server seems a lot harder to get right to me, obviously more to worry about regarding privacy etc. Can you share anything more about the motivating use cases here? Like how large do these custom dictionaries tend to be? I'd guess that for even dictionaries up to 1MB compressed it's probably faster and simpler to just have the client download the whole thing. RTT latency is generally a bigger performance problem these days than raw throughput. But if it's important to solve scenarios with really large dictionaries then maybe it's worth exploring?

Stephen Chenney

unread,
Jul 22, 2025, 3:18:29 PMJul 22
to Rick Byers, Daniel Vogelheim, ji...@igalia.com, blin...@chromium.org, Chromestatus
Regarding motivation, our client has financial data, such as stock symbols and company names. There are similar use cases for medical data, fan fiction, or anything else with words that might not appear in hunspell's dictionaries. It's conceivable that the Google internal spelling APIs have these words but clients may be very reluctant to send their strings to Google.

The proposal in this intent is relatively straightforward to implement and privacy and security is relatively simple to assess. But for developers there will probably be significant load time costs around it, to fetch the site's dictionary and process it to add the words. We have some ideas around that in future work but nothing concrete. I think we'll have to address it before we ship.

A HTTP header approach would make the ergonomics easier (assuming the infrastructure for setting up a spelling server is reasonably standard) and fits better into the existing code, But ti would not work offline. Maybe the approaches are complementary and we do both.

I'll try to get some idea on the size of typical dictionaries in this space. It is important to know,

Cheers,
Stephen.

Rick Byers

unread,
Jul 22, 2025, 5:02:41 PMJul 22
to Stephen Chenney, Daniel Vogelheim, ji...@igalia.com, blin...@chromium.org, Chromestatus
On Tue, Jul 22, 2025 at 3:18 PM Stephen Chenney <sche...@chromium.org> wrote:
Regarding motivation, our client has financial data, such as stock symbols and company names. There are similar use cases for medical data, fan fiction, or anything else with words that might not appear in hunspell's dictionaries. It's conceivable that the Google internal spelling APIs have these words but clients may be very reluctant to send their strings to Google.

The proposal in this intent is relatively straightforward to implement and privacy and security is relatively simple to assess. But for developers there will probably be significant load time costs around it, to fetch the site's dictionary and process it to add the words.

I'd love to see some figures on this. Maybe a bulk add API would be enough? As a quick example I picked a random website (bloomberg.com) and found it downloaded 3.4MB compressed including a number of individual scripts, images and JSON blobs which were around 100kB compressed each. In contrast the entire american-english dictionary on my linux machine compresses down to 270kB. So as long as we're talking about something that's less than 10% the size of the whole american english dictionary, my hunch is that the transfer cost will be insignificant and lost in the noise. But still an http approach to at least enable caching would be a good idea with little downside. I could imagine, for example, a <link rel=dictionary> tag or something that would be even simpler than this JS API approach? 

Anyway this is just random thoughts to try to nudge away from premature optimization, not API owner input or anything :-).
Reply all
Reply to author
Forward
0 new messages