What does screen_ai do?

1,493 views
Skip to first unread message

guest271314

unread,
Jan 16, 2024, 4:25:58 PM1/16/24
to Chromium-discuss
Chromium includes what appears to be some kind of extension, screen_ai. 

There's a file libchromescreenai.so in the folder that is 288 MB alone.

It is not clear at all what the extension does.

Here's the manifest.json

{
  "manifest_version": 2,
  "name": "Chrome Screen AI",
  "version": "122.1"
}

What does screen_ai do?

Ramin Halavati

unread,
Jan 17, 2024, 11:34:39 AM1/17/24
to Chromium-discuss, guest271314
It's a library that provides some AI tools, including main content extraction for reading mode and OCR for PDF accessibility.

guest271314

unread,
Jan 17, 2024, 8:08:32 PM1/17/24
to Ramin Halavati, Chromium-discuss
Thanks! 

Some questions:

- Why isn't that description in the manifest.json?
- Why and how is "AI" used for the tasks described?

TBH I ran > /home/user/.config/chromium/screen_ai/122.1/libchromescreenai.so to truncate the file to 0 size because I had no idea what "screen_ai" was supposed to be doing.

I was rather surprised to find a 288 MB shared object file re a screen_ai extension when Chrome still sends remote network requests for Google voices for SpeechSynthesisUtterance and for all STT for webkitSpeechRecognition for Web Speech API. I would think TTS and STT would be shipped in the browser before including "AI" code in the developer release.

Ramin Halavati

unread,
Jan 18, 2024, 5:21:57 AM1/18/24
to guest271314, Chromium-discuss
I will look into how I can update the manifest.json to include a description of the services that this component gives.
The size is a bit large on Linux and it's much smaller on other platforms, that also needs looking into.

Both tasks are entirely done on device (no network access).
For main content extraction, an AI module tries to find the main part of the page and present it in a sidebar (Reading Mode). That can be used by users who have attention or reading difficulties or for any other reason want to focus on the main text.
For OCR, if a PDF doesn't have accessible text for screen reader users, the images inside the PDF are sent to this module and the extracted text is injected into the PDF.

Ramin Halavati

Senior Software Engineer

rhal...@google.com
+49 173 2614796


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. 

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


guest271314

unread,
Jan 21, 2024, 3:37:18 PM1/21/24
to Ramin Halavati, Chromium-discuss
I don't understand how and why this leapfrogged Web Speech API implementation on-device. 

Web Speech API webkitSpeechSynthesisRecognition() is broken on Chromium the last time I checked. E.g., see https://github.com/mdn/dom-examples/issues/219 which consistently logs network error on Chromium 122. Because there is no in-browser STT implementation. And if I remember correctly I had to get Google API kets from net-log or built-in extension page in order to manually request the remote Google SST services.

When Google voices on Chrome are used a remote request is made to Google servers. I think PATTS is for "Googlers" only.

I just don't get how "A.I." can be shipped before TTS/STT in the browser. I could cynically just cast the decision off as "A.I." being the trending thingamajig to sell, no matter whatever else is left on the back-burner for years.

It would be very helpful, given the idea is for accessibility, to implement TTS and STT in the browser, using WebAssembly, or "A.I" if you prefer.

Slade Watkins

unread,
Jan 21, 2024, 5:50:09 PM1/21/24
to rhal...@google.com, guest271314, Chromium-discuss


On Jan 18, 2024, at 5:21 AM, 'Ramin Halavati' via Chromium-discuss <chromium...@chromium.org> wrote:

The size is a bit large on Linux and it's much smaller on other platforms, that also needs looking into.

Okay, out of curiosity: how much larger?? I assume that’s due to needing to build more into the Linux variant(s)?

confused,
Slade

Ramin Halavati

unread,
Jan 23, 2024, 4:28:29 AM1/23/24
to Slade Watkins, guest271314, Chromium-discuss
It's around 289MB for Linux and 50MB for Mac.
The OCR code includes many modules for training and performance optimization. These modules are dropped from ChromeOS, Mac and Windows build, but probably not as good for Linux and need more work.

Best,
Ramin
Reply all
Reply to author
Forward
0 new messages