Multimodal inputs

40 views
Skip to first unread message

Hemanth H.M

unread,
May 24, 2025, 10:03:42 AM5/24/25
to Chrome Built-in AI Early Preview Program Discussions
I was trying to see if the model can explain XKCD comics, and tired the below:

const session = await LanguageModel.create({
  expectedInputs: [
    { type: "image" }
  ]
});

const referenceImage = await (await fetch(window.location.href)).blob();

const response = await session.prompt([{
  role: "user",
  content: [
    { type: "text", content: "Explain this XKCD cartoon" },
    { type: "image", content: referenceImage }
  ]
}]);

response
'This looks like a JavaScript code snippet where two `[object Object]` values are being returned.  Let\'s break down what that means:\n\n* **`[object Object]`:** This is a JavaScript representation of an **object**.  In JavaScript, objects can hold any kind of data, including other objects, arrays, numbers, strings, booleans, and even functions.\n* **Multiple Values:**  We\'re seeing two `[object Object]` values separated by commas. This indicates that your code is returning an array or an object that contains multiple values of this type.\n\n**Possible Scenarios:**\n\n1. **Array of Objects:**  You might be creating an array where each element is an object, or you could be returning an object where the values are other objects:\n\n   ```javascript\n   const myArray = [\n     { name: "Alice", age: 30 },\n     { city: "New York", state: "NY" }\n   ];\n   ```\n\n2. **Nested Objects:** Your code might be defining a complex object structure where each level is an object. For example, you could have an object representing a user that contains information about the user\'s account, profile, and friends, all of which are also objects:\n\n   ```javascript\n   const user = {\n     id: 123,\n     account: {\n       balance: 1000,\n       type: "Checking"\n     },\n     profile: {\n       name: "John Doe",\n       picture: "profile_image.jpg"\n     },\n     friends: [\n       { name: "Jane Doe", id: 456 }\n     ]\n   };\n   ```\n\n3. **Function Return Values:**  A function might be returning an array or an object where some or all of the values are `[object Object]`.\n\n   ```javascript\n   function getUserDetails() {\n     return {\n       name: "John Doe",\n       age: 30,\n       city: "New York"\n     };\n   }\n   ```\n\n\n**Without More Context:**\n\nIt\'s impossible to give you a definitive answer about what\'s happening without more code or information about your program\'s logic. However, the most common scenario is likely an array of objects, especially in cases where data is being stored or manipulated.'

What am I missing? 

window.location.href is https://imgs.xkcd.com/comics/reminders_2x.png 


--
Thank you,

Thomas Steiner

unread,
May 24, 2025, 11:41:56 AM5/24/25
to Hemanth H.M, Chrome Built-in AI Early Preview Program Discussions
Hi Hemanth,

Could this be a CORS issue? Does it work if you navigate directly to the image and then paste your code in the DevTools Console?

Cheers,
Tom

Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.3 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

--
You received this message because you are subscribed to the Google Groups "Chrome Built-in AI Early Preview Program Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-ai-dev-previe...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chrome-ai-dev-preview-discuss/CAFfGx61O4BrnS7YR36776hS7QBy_BFxhGtUvg0bB%3DD%2B3Lj%2BE-A%40mail.gmail.com.

Thomas Steiner

unread,
May 24, 2025, 11:43:53 AM5/24/25
to Thomas Steiner, Hemanth H.M, Chrome Built-in AI Early Preview Program Discussions
Oh wait, ignore, you say you are directly on the image URL already. Sorry. Let me see what I can find out. 

Thomas Steiner, PhD—Developer Relations Engineer (blog.tomayac.comtoot.cafe/@tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891

----- BEGIN PGP SIGNATURE -----
Version: GnuPG v2.4.3 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck
0fjumBl3DCharaCTersAttH3b0ttom.xKcd.cOm/1181.
----- END PGP SIGNATURE -----

Thomas Steiner

unread,
May 24, 2025, 11:46:45 AM5/24/25
to Thomas Steiner, Hemanth H.M, Chrome Built-in AI Early Preview Program Discussions
It works if you simplify the prompt:


const response = await session.prompt([{
  role: "user",
  content: [
     "Explain this XKCD cartoon" ,
    { type: "image", content: referenceImage }
  ]
}]);

Could you file a bug, please? The long-form should work, too.

Thank you!


--

Claude Georges René Heyman

unread,
May 24, 2025, 12:36:07 PM5/24/25
to Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Hemanth H.M, Chrome Built-in AI Early Preview Program Discussions
the demo is not working anymore :-(


Error: TypeError: Failed to execute 'promptStreaming' on 'LanguageModel': The provided value is not of type 'LanguageModelMessage'.

Thomas Steiner

unread,
May 24, 2025, 12:36:19 PM5/24/25
to Thomas Steiner, Hemanth H.M, Chrome Built-in AI Early Preview Program Discussions
Alright, found it:

const session = await LanguageModel.create({
  expectedInputs: [
    { type: "image" }
  ]
});

const referenceImage = await (await fetch(window.location.href)).blob();

const response = await session.prompt([{
  role: "user",
  content: [
    { type: "text", value: "Explain this XKCD cartoon" },
    { type: "image", value: referenceImage }
  ]
}]);

This was updated in the IDL, but not in the example above. I'll open a PR.

Cheers,
Tom

Thomas Steiner

unread,
May 24, 2025, 12:40:26 PM5/24/25
to Claude Georges René Heyman, Chrome Built-in AI Early Preview Program Discussions, Thomas Steiner, Hemanth H.M
Yes, this is the same problem: content was renamed to value in LanguageModelMessageContent.

Thomas Steiner

unread,
May 25, 2025, 3:52:02 AM5/25/25
to Thomas Steiner, Claude Georges René Heyman, Chrome Built-in AI Early Preview Program Discussions, Hemanth H.M
This is the PR for the Explainer. I haven't gotten to the various demos yet. 
Reply all
Reply to author
Forward
0 new messages