an agentic loop involving the JS App to recieve ToolCalls, executePlease fix this WARNING reported by Spellchecker: "recieve" is a possible misspelling of "receive".
To bypass Spellchecker, add a...
"recieve" is a possible misspelling of "receive".
To bypass Spellchecker, add a footer with DISABLE_SPELLCHECKER
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
This IDL changes ('Tool use' v2) is ready for the review.
Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// When this object was created with LanguageModelToolDeclaration, this nowand expectedOutputs has {"tool-call"}.
Same for the comment in promptStreaming().
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the quick turnaround!
Promise<LanguageModelPromptResult> prompt(when we change the return type from DOMString to PromptResult, will we break backward compatibility for existing users?
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;are we also expecting to let model handle tool error (vs. application handles error)?
Also in the current structure, LanguageModelMessageContent contains a
LanguageModelMessageValue which can be a LanguageModelToolResponse which can contain LanguageModelMessageContent again...
I'm thinking if we can just use a single struct for {string callID, string name, object result} as the LanguageModelToolResponse, and if we want model to handle it, developer can put error info in the "result" field.
Promise<LanguageModelPromptResult> prompt(when we change the return type from DOMString to PromptResult, will we break backward compatibility for existing users?
Oops, nevermind, I just found the def for LanguageModelPromptResult in the other IDL.
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;are we also expecting to let model handle tool error (vs. application handles error)?
Also in the current structure, LanguageModelMessageContent contains a
LanguageModelMessageValue which can be a LanguageModelToolResponse which can contain LanguageModelMessageContent again...I'm thinking if we can just use a single struct for {string callID, string name, object result} as the LanguageModelToolResponse, and if we want model to handle it, developer can put error info in the "result" field.
Thanks for your comments and discussions.
(1) Tool Error Handling
Yes, we're expecting the application to handle tool errors and communicate them back to the model. The flow is:
// Model requests tool → App executes → App reports result (success or error) -> Model reacts
~~~
const response = await session.prompt([
{ role: "user", content: "What's the weather in SF?" },
{ role: "assistant", content: [
{ type: "tool-call", value: { callID: "1", name: "getWeather", arguments: {city: "SF"} }}
]},
{ role: "user", content: [
{ type: "tool-response", value: {
callID: "1",
name: "getWeather",
errorMessage: "API rate limit exceeded" // App reports error.
}}
]}
]);
// Model can now see the error and react (e.g., ask user to try later)
~~~
Both the app and model have knowledge of the error - the app decides how to report it, and the model can react accordingly (e.g., suggest alternatives, explain to user, etc.).
(2) LanguageModelMessageContent Recursion
You're right - we should create a non-recursive type for tool results.
'object result' is something I would like to avoid because it is not
structured.
I would like to make the IDL change as:
~~~
// Data-only value (no tools)
typedef (
ImageBitmapSource
or AudioBuffer
or HTMLAudioElement
or BufferSource
or DOMString
) LanguageModelDataValue;
// Data-only type enum
enum LanguageModelDataType { "text", "image", "audio" };
// Data-only content (no tool-call, no tool-response)
dictionary LanguageModelDataContent {
required LanguageModelDataType type;
required LanguageModelDataValue value;
};
// Successful tool execution result.
dictionary LanguageModelToolSuccess {
required DOMString callID;
required DOMString name;
required LanguageModelDataContent result;
};
// Failed tool execution result.
dictionary LanguageModelToolError {
required DOMString callID;
required DOMString name;
required DOMString errorMessage;
};
// The response from executing a tool call - either success or error.
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;
~~~
JS App example of usage for just 1 Tool Call:
~~~
const response = await session.prompt([
{ role: "user", content: "What's the weather in Paris?" },
// Model makes tool call
{ role: "assistant", content: [
{ type: "tool-call", value: {
callID: "call_1",
name: "getWeather",
arguments: {city: "Paris"}
}}
]},
// App returns tool result
{ role: "user", content: [
{ type: "tool-response", value: {
callID: "call_1",
name: "getWeather",
result: {
type: "text",
value: "Sunny, 22°C"
}
}}
]}
]);
~~~
I actually feel "object" is like a base::Value in c++, it can have arbitrary structure is still a json.
With LanguageModelData[Content|Type], how would we represent a map/dict? e.g, {"SFO": "22C", "LAX": false} is it always serialized to text? Also, our formatter works with a structured object (base::Value) below the chrome_ml_api layer.
JS APP will have to stringify it which is the Tool use v1 spec.
I would think C++ base::Value cannot be used for "audio" or "image" - the multimodal tool result we would like to address.
So when client pass a text string as tool result, we still require it to be serializable to a json object, and we throw error if it doesn't, correct?
And if client pass multimodal tool reuslt, I suppose we will check that the session is created with the correct "expectedInput" as well?
Do you have any use case for multimodal tool results? I don't think server side APIs support multimodal tool results either.
It could be just plain text or a serialized JSON string. We do not need to enforce it.
>...Do you have any use case for multimodal tool results?...
It is somewhat similar to [multimodal input sample](https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#multimodal-inputs).
[Add multimodal tool outputs #149](https://github.com/webmachinelearning/prompt-api/pull/149) has discussions in commit on getting a video frame with a ToolCall.
>...don't think server side APIs support multimodal tool results ...
Yes, plain text is expected at the moment.
[Tool calling: return types? #138](https://github.com/webmachinelearning/prompt-api/issues/138) discussed about multimodal tool results.
>...the correct "expectedInput" as well...
I think so.
Thank you for the pointers on those issues! Now I see why we need the {type - value} pair for multimodal tool results.
Then my only question left is the "DOMString" type of text tool result - Our formatter inside chrome_ml_api expects a structured object like base::Value to do various formatting and escaping. If we go with string in the signature, shall we try to de-serialize it into a json (base::Value) in the renderer?
An alternative would be to also include "object" in LanguageModelDataValue, and then we don't do any special processing to the string. it will be treated by the formatter as a base::Value with type string.
// TODO(crbugs.com/422803232): Implement tool capabilities.```suggestion
// TODO(crbug.com/422803232): Implement tool capabilities.
```
I have similar questions:
1) Should we add an `object` LanguageModelDataType and let tools yield serializable JS objects (like `inputSchema` itself), as a model-agnostic structured format, instead of forcing tools to serialize those objects into JSON strings themselves?
2) Should tools be able to yield a sequence of LanguageModelDataContent, so they can return mixed modalities?
3) Is there a real advantage of coercing errors into a named `errorMessage` string field on a dedicated error type, versus a shared LanguageModelToolResponse being constructed with `result: 'Error: 'API rate limit exceeded'`, or possibly also passing structured objects errors if we permit objects in (1)? Does the API itself need to provide this distinction to the model, or for session history?
i.e. What if we had:
```
// Data-only value (no tools)
typedef (
ImageBitmapSource
or AudioBuffer
or HTMLAudioElement
or BufferSource
or DOMString
or object
) LanguageModelDataValue;
// Data-only type enum
enum LanguageModelDataType { "text", "image", "audio", "object" };
// Data-only content (no tool-call, no tool-response)
dictionary LanguageModelDataContent {
required LanguageModelDataType type;
required LanguageModelDataValue value;
};
// Tool execution response; expresses a result or an error to the model.
dictionary LanguageModelToolResponse {
required DOMString callID;
required DOMString name;
// TODO(crbugs.com/422803232): Implement tool call handling. // TODO(crbugs.com/422803232): Implement tool response handling.Thank all for the thoughtful discussions here.
Hi Jingyun,
> ... include "object" in LanguageModelDataValue...
yes, we can use object.
...Our formatter inside chrome_ml_api expects a structured object like base::Value...
I investigated the current architecture. In chrome_ml_types.h, we have:
```
using InputPiece =
std::variant<Token, std::string, SkBitmap, AudioBuffer, bool>;
```
I am not sure something about `base::Value` input to chrome_ml_api Imp.
Anyway, I plan to add a new ToolResponse variant to InputPiece. This will preserve the tool response structure all the way to the ML API implementation layer, where each implementation (Chrome ML API, Edge ML API, etc.) can format it appropriately for their underlying models.
Hi Mike,
>...1) Should we add an object LanguageModelDataType...
Yes, will do.
>...2) Should tools be able to yield a sequence of LanguageModelDataContent, ...
Yes, agreed. This provides flexibility for multi-modal tool results in the
future.
>...3) Is there a real advantage of coercing errors into a named errorMessage string field on a dedicated error type, versus a shared LanguageModelToolResponse...
Thank you for the thoughtful question! I believe separate ToolSuccess/ToolError types provide meaningful advantages:
(a) Type Safety & Developer Experience: Both approaches work for JS apps, but separate types enable automatic type discrimination ('errorMessage' in response) without requiring developers to inspect content or parse for status fields.
(b) Debugging & Transparency: Explicit LanguageModelToolError makes the error condition immediately visible in logs, DevTools, and debuggers. With merged types, we'd need heuristics to determine if content represents an error.
(c) Web Platform Consistency: This pattern aligns with established Web APIs (e.g., FileReader: onload vs. onerror, Promise: .then() vs. .catch(), Fetch: Response vs. network errors) and C++ conventions.
(d) ML API Implementation Flexibility: The discriminated type allows ML API implementations to apply different prompt formatting strategies (e.g., "Tool succeeded: {...}" vs. "Tool FAILED: ..."), which may improve model understanding of tool execution status.
I'm happy to proceed with separate types unless there are other considerations I haven't addressed. Let me know your thoughts!
// TODO(crbugs.com/422803232): Implement tool capabilities.```suggestion
// TODO(crbug.com/422803232): Implement tool capabilities.
```
Done
// When this object was created with LanguageModelToolDeclaration, this nowand expectedOutputs has {"tool-call"}.
Same for the comment in promptStreaming().
Done
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;PTAL with PS 10. Thanks!
// TODO(crbugs.com/422803232): Implement tool call handling. // TODO(crbugs.com/422803232): Implement tool response handling.| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;Thanks Frank!
It turns out my understanding of javascript is shallow. After discussing with @rei...@chromium.org, I learned that we would need `any` instead of `object` to represent a union of [string, number, boolean, dict, array], basically anything that's serializable to a json.
So how about:
```
// Data-only type enum. "object" represents the union of a serializable json.
enum LanguageModelDataType { "image", "audio", "object" };
dictionary LanguageModelDataContent {
required LanguageModelDataType type;
required any value;
};
```
and we can convert the `any` (`blink::ScriptValue`) into `base::Value` using `content::V8ValueConverter`
--------
> I am not sure something about base::Value input to chrome_ml_api Imp.
Do you mean you will only add a generic ToolResponse struct without member types into InputPiece? But then how can opt guide construct the InputPiece without the specific type? I was thinking smth like
```
struct ToolResponse {string name; string id; base::Value results; (and an optional member for error) }
InputPiece = std::variant<Token, std::string, SkBitmap, AudioBuffer, bool, ToolDeclaration, ToolCall, ToolResponse>;
```
please let me know if you're thinking smth different!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;Ah. Thank you all for the insight.
>...represent a union of [string, number, boolean, dict, array], basically anything that's serializable to a json....
I have updated to allow primitive data types in addition to object (including
array) as you listed above.
- 'any' would accept null, undefined, functions, DOM objects (all non-serializable).
- Continue to keep LanguageModelDataValue in LanguageModelDataContent so as
to allow multimodal results discussed.
>...struct ToolResponse {string name; string id;...
yes. I meant the same to have them defined in on_device_model.mojom.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
use 'any' in LanguageModelDataContent per offline meeting. we will use the type to perform validation in the Blink layer code.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you! LGTM (couldn't the +1 button on the UI)
Thank you! LGTM (couldn't the +1 button on the UI)
Thank you for the review!
@m...@chromium.org, I will need your approval.
| Code-Review | +1 |
lgtm with minor comments, thank you for this work!
JavaScript apps now receive tool calls as structured messages, executenit: note the new runtime enabled feature
// expectedOutputs has {"tool-call"}, this now yields anit: (ditto below)
```suggestion
// expectedOutputs includes {type:"tool-call"}, this now yields a
```
if (options_->hasExpectedOutputs()) {optional nit: reject when input/outputs has tool types without the runtime feature enabled
// Data-only content (for tool results - no tool-call or tool-response).Please clarify this, e.g. explicitly noting `Data-only type for LanguageModelToolSuccess result.`; maybe renaming to `LanguageModelToolResult` would help?
// Object fitting the JSON input schema of this tool call.```suggestion
// Object fitting the JSON input schema of the tool's declaration.
```
dictionary LanguageModelToolDeclaration {+CC'ing script_tools folks. FYI: I think it's okay to prototype with this separate dictionary roughly aligned with ToolRegistrationParams for now, please feel free to comment.
name: "AIPromptAPIToolUse",Do you think it makes sense to stand up some very basic WPTs at this point?
"Linux": "experimental",nit: add `"ChromeOS": "experimental",` to align with other PromptAPI feature platforms
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
JavaScript apps now receive tool calls as structured messages, executenit: note the new runtime enabled feature
Done
// expectedOutputs has {"tool-call"}, this now yields anit: (ditto below)
```suggestion
// expectedOutputs includes {type:"tool-call"}, this now yields a
```
Done
optional nit: reject when input/outputs has tool types without the runtime feature enabled
Added TODOs to look into it in upcoming blink CL.
// Data-only content (for tool results - no tool-call or tool-response).Please clarify this, e.g. explicitly noting `Data-only type for LanguageModelToolSuccess result.`; maybe renaming to `LanguageModelToolResult` would help?
Done
// Object fitting the JSON input schema of this tool call.```suggestion
// Object fitting the JSON input schema of the tool's declaration.
```
Done
+CC'ing script_tools folks. FYI: I think it's okay to prototype with this separate dictionary roughly aligned with ToolRegistrationParams for now, please feel free to comment.
Acknowledged
Do you think it makes sense to stand up some very basic WPTs at this point?
I will add WPT tests in upcoming blink changes.
nit: add `"ChromeOS": "experimental",` to align with other PromptAPI feature platforms
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
enum LanguageModelMessageType { "text", "image", "audio", "tool-call", "tool-response" };should there be tool-response-success, tool-response-error
// - "object": value should be JSON-serializable (primitives, objects, arrays)
enum LanguageModelToolResultType { "text", "image", "audio", "object" };Perhaps this enum should match the input types to the language model that is this enum should just be LanguageModelMessageType. With the current definition there is a distinction between text and object, even though both are going to be converted to a string before being passed back to the LLM
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
enum LanguageModelMessageType { "text", "image", "audio", "tool-call", "tool-response" };should there be tool-response-success, tool-response-error
The success/error distinction is already handled by the LanguageModelToolResponse typedef union, which discriminates between LanguageModelToolSuccess and LanguageModelToolError at the structural level. This keeps the enum focused on content categories (text, image, audio, tool-call, tool-response) rather than outcome variants (success/error).
Having a single tool-response type provides simpler ergonomics when sending responses:
```
const toolResponse = handleToolCall(toolCall);
// toolResponse is either LanguageModelToolSuccess or LanguageModelToolError
await session.prompt([{
role: "user",
content: [{
type: "tool-response", // Single type for both success and error
value: toolResponse // Union typedef handles discrimination
}]
}]);
```Developers don't need to inspect the response structure to determine which enum value to use - they can simply use tool-response and let the typedef handle the variant.
By adding tool-response-success, tool-response-error, JS usage would require explicit discrimination:
```
const messageType = "errorMessage" in toolResponse
? "tool-response-error"
: "tool-response-success";
await session.prompt([{
role: "user",
content: [{
type: messageType, // Must compute the correct type
value: toolResponse
}]
}]);
```// - "object": value should be JSON-serializable (primitives, objects, arrays)
enum LanguageModelToolResultType { "text", "image", "audio", "object" };Perhaps this enum should match the input types to the language model that is this enum should just be LanguageModelMessageType. With the current definition there is a distinction between text and object, even though both are going to be converted to a string before being passed back to the LLM
Earlier discussion with @jin...@google.com pointed out that we should not use
LanguageModelMessageContent (or derivatives) as the result of the tool response
content - it would create confusing semantics of recursion as the LanguageModelMessageType includes tool-response and tool-call. Tools returning tool-calls or tool-responses would allow infinite nesting and unclear semantics.
With respect to 'text' vs 'object': While both eventually serialize to strings when passed to the LLM, earlier discussion on it was to not force
JS app to stringify its tool call result. For example,
~~~
// With "object" type - natural JavaScript:
{
type: "object",
value: { temperature: 72, humidity: 65 }
}
// Without "object" type - forced string serialization:
{
type: "text",
value: JSON.stringify({ temperature: 72, humidity: 65 }) // Awkward!
}
~~~
enum LanguageModelMessageRole { "system", "user", "assistant" };Apologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
enum LanguageModelMessageRole { "system", "user", "assistant" };Apologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
No problem.
we discussed that we don't need to add additional LanguageModelMessageRole enum values.
For "tool-response", JS APP can use "user"
For "tool-call", JS APP can use "assistant".
But, in the 10/29 meeting, we have this:
"
Are developers used to having a tool response role from Hugging Face?
Tool call role for responses?
https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
Possible to add roles later
"
I can add them in this CL ...
enum LanguageModelMessageRole { "system", "user", "assistant" };Frank LiApologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
No problem.
we discussed that we don't need to add additional LanguageModelMessageRole enum values.
For "tool-response", JS APP can use "user"
For "tool-call", JS APP can use "assistant".But, in the 10/29 meeting, we have this:
"
Are developers used to having a tool response role from Hugging Face?
Tool call role for responses?
https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
Possible to add roles later
"I can add them in this CL ...
Oooh gotcha. Thank you! I don't have a strong opinion (although, maybe add a comment for "user -> tool-response, assistant -> tool-call" if we're not expanding on the roles?), so whichever you prefer!
Hey Frank,
A side question related to this change: Would you also like to update the spec () and the explainer () to reflect the IDL changes? If not, I can update them and send a PR for review. https://webmachinelearning.github.io/prompt-api/#idl-index https://github.com/webmachinelearning/prompt-api/tree/main
Thank you!
On Tue, Nov 18, 2025 at 2:39 PM Frank Li (Gerrit) <noreply-gerritcodereview+gz4PwmYgtkZZNSdhq09RrA==@> wrote: chromium.org
Hey Frank,
A side question related to this change: Would you also like to update the spec () and the explainer () to reflect the IDL changes? If not, I can update them and send a PR for review. https://webmachinelearning.github.io/prompt-api/#idl-index https://github.com/webmachinelearning/prompt-api/tree/main
Thank you!
On Tue, Nov 18, 2025 at 2:39 PM Frank Li (Gerrit) <noreply-gerritcodereview+gz4PwmYgtkZZNSdhq09RrA==@> wrote: chromium.org
Hey Frank,
A side question related to this change: Would you also like to update the spec () and the explainer () to reflect the IDL changes? If not, I can update them and send a PR for review. https://webmachinelearning.github.io/prompt-api/#idl-index https://github.com/webmachinelearning/prompt-api/tree/main
Thank you!
- Jingyun
On Tue, Nov 18, 2025 at 2:39 PM Frank Li (Gerrit) <noreply-gerritcodereview+gz4PwmYgtkZZNSdhq09RrA==@> wrote: chromium.org
Yes, please do it. Thanks!
enum LanguageModelMessageRole { "system", "user", "assistant" };Frank LiApologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
No problem.
we discussed that we don't need to add additional LanguageModelMessageRole enum values.
For "tool-response", JS APP can use "user"
For "tool-call", JS APP can use "assistant".But, in the 10/29 meeting, we have this:
"
Are developers used to having a tool response role from Hugging Face?
Tool call role for responses?
https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
Possible to add roles later
"I can add them in this CL ...
Oooh gotcha. Thank you! I don't have a strong opinion (although, maybe add a comment for "user -> tool-response, assistant -> tool-call" if we're not expanding on the roles?), so whichever you prefer!
Adding Daniel for the review. Thanks!
In the future, please make sure I'm in the attention set. If I'm not in the attention set, I probably won't ever see the review.
// TODO(crbug.com/422803232): Implement tool capabilities.So to be clear, this is just plumbing to set up the model right? The actual tool bits would be sent via `AILanguageModelPromptContent`?
array<AILanguageModelExpected>? expected_outputs;I'm mostly wondering because we wouldn't ever expect tool response to be an expected output right? It would only be a tool call? Or is it something that might be supported in the future somehow?
If it can't be supported and won't be supported... maybe we need to better structure this to reflect that.
Daniel ChengAdding Daniel for the review. Thanks!
In the future, please make sure I'm in the attention set. If I'm not in the attention set, I probably won't ever see the review.
Acknowledged
// TODO(crbug.com/422803232): Implement tool capabilities.So to be clear, this is just plumbing to set up the model right? The actual tool bits would be sent via `AILanguageModelPromptContent`?
These are placeholders for now.
yes, tool bits could possibly be in AILanguageModelPromptContent
I'm mostly wondering because we wouldn't ever expect tool response to be an expected output right? It would only be a tool call? Or is it something that might be supported in the future somehow?
If it can't be supported and won't be supported... maybe we need to better structure this to reflect that.
... we wouldn't ever expect tool response to be an expected output right? It would only be a tool call?...
Yes.
The intention is to validate it in the blink layer when it has tool declaration to create the LanguageModel.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
enum LanguageModelMessageRole { "system", "user", "assistant" };Frank LiApologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
Jingyun LiuNo problem.
we discussed that we don't need to add additional LanguageModelMessageRole enum values.
For "tool-response", JS APP can use "user"
For "tool-call", JS APP can use "assistant".But, in the 10/29 meeting, we have this:
"
Are developers used to having a tool response role from Hugging Face?
Tool call role for responses?
https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
Possible to add roles later
"I can add them in this CL ...
Frank LiOooh gotcha. Thank you! I don't have a strong opinion (although, maybe add a comment for "user -> tool-response, assistant -> tool-call" if we're not expanding on the roles?), so whichever you prefer!
ok, I will add them.
Done
+ Sophie and Steven for owner of components/optimization_guide/core/model_execution/substitution.cc
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
array<AILanguageModelExpected>? expected_outputs;Frank LiI'm mostly wondering because we wouldn't ever expect tool response to be an expected output right? It would only be a tool call? Or is it something that might be supported in the future somehow?
If it can't be supported and won't be supported... maybe we need to better structure this to reflect that.
... we wouldn't ever expect tool response to be an expected output right? It would only be a tool call?...
Yes.
The intention is to validate it in the blink layer when it has tool declaration to create the LanguageModel.
Can we try to structure the Mojo in a way that it's not possible to send impossible inputs?
That probably means separating the input option type from the output option type instead of using the same one for both.
Is there a reason `languages` can be specified independent fo reach one? Does the languages option make sense for all possible input types?
array<AILanguageModelExpected>? expected_outputs;Frank LiI'm mostly wondering because we wouldn't ever expect tool response to be an expected output right? It would only be a tool call? Or is it something that might be supported in the future somehow?
If it can't be supported and won't be supported... maybe we need to better structure this to reflect that.
Daniel Cheng... we wouldn't ever expect tool response to be an expected output right? It would only be a tool call?...
Yes.
The intention is to validate it in the blink layer when it has tool declaration to create the LanguageModel.
Can we try to structure the Mojo in a way that it's not possible to send impossible inputs?
That probably means separating the input option type from the output option type instead of using the same one for both.
Is there a reason `languages` can be specified independent fo reach one? Does the languages option make sense for all possible input types?
>...Can we try to structure the Mojo in a way that it's not possible to send impossible inputs?...
Good point! I would like to do it in a follow-up CL - try to minimize the scope of this CL. WDYT?
```
// The expected input types that can be sent to the model.
// Both tool calls and tool responses can be sent as input:
// - Tool calls: When replaying conversation history in open-loop agentic apps
// - Tool responses: When providing tool execution results
enum AILanguageModelExpectedInputType {
kText = 0,
kImage = 1,
kAudio = 2,
kToolCall = 3,
kToolResponse = 4,
};
// The expected output types that can be generated by the model.
// Tool calls can be generated as output (model requests tool execution).
// Tool responses cannot be generated as output (they are user-provided).
enum AILanguageModelExpectedOutputType {
kText = 0,
kImage = 1,
kAudio = 2,
kToolCall = 3,
};
// The expected input data type and languages specified with session creation.
struct AILanguageModelExpectedInput {
AILanguageModelExpectedInputType type;
array<AILanguageCode>? languages;
};
// The expected output data type and languages specified with session creation.
struct AILanguageModelExpectedOutput {
AILanguageModelExpectedOutputType type;
array<AILanguageCode>? languages;
};
```
>...Is there a reason languages can be specified independent fo reach one...
The languages field is specified independently per type because different input types may have different language requirements in the same session. Per the spec example:
```
expectedInputs: [
{ type: "text", languages: ["en"] },
{ type: "image", languages: ["fr"] } // For OCR'ing French text in images
]
```
...Does the languages option make sense for all possible input types?...
I agree that the `kToolCall` and `kToolResponse` types don't semantically
require a languages option, since they represent structured JSON data rather than natural language content.
Options to enforce this:
Option (a): Runtime validation
```
if ((type == kToolCall || type == kToolResponse) && languages.has_value()) {
// return Error::kInvalidLanguageForType;
}
```
Option (b): Type-level enforcement with unions
Let `AILanguageModelExpectedInput` to be a union of different mojom
structs to enforce no languages for tools such as `AILanguageModelExpectedTool`.
```
struct AILanguageModelExpectedTool {
AILanguageModelToolType type; // only kToolCall or kToolResponse
// No languages field
};
```
If possible, I would like to address the issue in a separate CL.
WDYT?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
array<AILanguageModelExpected>? expected_outputs;| Code-Review | +1 |
enum LanguageModelMessageRole { "system", "user", "assistant" };Frank LiApologies for going back to this - I just realized we're not exapnding "tool-call" and "tool-response" to the roles, but only types.
Shall we also expand "tool-call" and "tool-response" to the roles, or do we have a convention/default role for people to use?
Jingyun LiuNo problem.
we discussed that we don't need to add additional LanguageModelMessageRole enum values.
For "tool-response", JS APP can use "user"
For "tool-call", JS APP can use "assistant".But, in the 10/29 meeting, we have this:
"
Are developers used to having a tool response role from Hugging Face?
Tool call role for responses?
https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
Possible to add roles later
"I can add them in this CL ...
Frank LiOooh gotcha. Thank you! I don't have a strong opinion (although, maybe add a comment for "user -> tool-response, assistant -> tool-call" if we're not expanding on the roles?), so whichever you prefer!
Frank Liok, I will add them.
Done
It's not obvious why we'd add TC/TR roles, since user and assistant seem sufficient, but we can always revisit this decision later.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[PromptAPI] Tool use IDL changes (v2 - no execution of tool by blink)
This CL adds IDL definitions for Tool Use v2 with an open agentic loop
for 'Tool use' scenario. Key changes:
- Add LanguageModelToolDeclaration for declaring available tools at
session creation with name, description, and JSON schema
- Add LanguageModelToolCall for model-->JS tool invocation requests with
callID, tool name, and structured arguments
- Add LanguageModelToolResponse for JS-->model tool execution results
- Update prompt() return type to support dynamic response (string or
structured messages based on tool usage)
- Refactor promise helpers to templates, eliminating code duplication
- This feature is gated behind the AIPromptAPIToolUse runtime
enabled feature.
JavaScript apps now receive tool calls as structured messages, execute
them using their own runtime and send results back to continue the
conversation. Blink acts as transport layer without executing tool
logic, providing common 'server-side Function Calling API' style for web
developers.
Prompt API Tool Use Scoping & Design doc:
https://docs.google.com/document/d/1Cyhk8X9jgpU4FFYQZKb8A5RgTQMqT9xIN-JnxlMdEus/edit?resourcekey=0-Date8jy3LWnhpwRzqJ-aDg&tab=t.0#heading=h.o25e9o4ywb3p
NO_IFTTT=It is not added by this CL.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |