an agentic loop involving the JS App to recieve ToolCalls, executePlease fix this WARNING reported by Spellchecker: "recieve" is a possible misspelling of "receive".
To bypass Spellchecker, add a...
"recieve" is a possible misspelling of "receive".
To bypass Spellchecker, add a footer with DISABLE_SPELLCHECKER
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
This IDL changes ('Tool use' v2) is ready for the review.
Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// When this object was created with LanguageModelToolDeclaration, this nowand expectedOutputs has {"tool-call"}.
Same for the comment in promptStreaming().
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the quick turnaround!
Promise<LanguageModelPromptResult> prompt(when we change the return type from DOMString to PromptResult, will we break backward compatibility for existing users?
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;are we also expecting to let model handle tool error (vs. application handles error)?
Also in the current structure, LanguageModelMessageContent contains a
LanguageModelMessageValue which can be a LanguageModelToolResponse which can contain LanguageModelMessageContent again...
I'm thinking if we can just use a single struct for {string callID, string name, object result} as the LanguageModelToolResponse, and if we want model to handle it, developer can put error info in the "result" field.
Promise<LanguageModelPromptResult> prompt(when we change the return type from DOMString to PromptResult, will we break backward compatibility for existing users?
Oops, nevermind, I just found the def for LanguageModelPromptResult in the other IDL.
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;are we also expecting to let model handle tool error (vs. application handles error)?
Also in the current structure, LanguageModelMessageContent contains a
LanguageModelMessageValue which can be a LanguageModelToolResponse which can contain LanguageModelMessageContent again...I'm thinking if we can just use a single struct for {string callID, string name, object result} as the LanguageModelToolResponse, and if we want model to handle it, developer can put error info in the "result" field.
Thanks for your comments and discussions.
(1) Tool Error Handling
Yes, we're expecting the application to handle tool errors and communicate them back to the model. The flow is:
// Model requests tool → App executes → App reports result (success or error) -> Model reacts
~~~
const response = await session.prompt([
{ role: "user", content: "What's the weather in SF?" },
{ role: "assistant", content: [
{ type: "tool-call", value: { callID: "1", name: "getWeather", arguments: {city: "SF"} }}
]},
{ role: "user", content: [
{ type: "tool-response", value: {
callID: "1",
name: "getWeather",
errorMessage: "API rate limit exceeded" // App reports error.
}}
]}
]);
// Model can now see the error and react (e.g., ask user to try later)
~~~
Both the app and model have knowledge of the error - the app decides how to report it, and the model can react accordingly (e.g., suggest alternatives, explain to user, etc.).
(2) LanguageModelMessageContent Recursion
You're right - we should create a non-recursive type for tool results.
'object result' is something I would like to avoid because it is not
structured.
I would like to make the IDL change as:
~~~
// Data-only value (no tools)
typedef (
ImageBitmapSource
or AudioBuffer
or HTMLAudioElement
or BufferSource
or DOMString
) LanguageModelDataValue;
// Data-only type enum
enum LanguageModelDataType { "text", "image", "audio" };
// Data-only content (no tool-call, no tool-response)
dictionary LanguageModelDataContent {
required LanguageModelDataType type;
required LanguageModelDataValue value;
};
// Successful tool execution result.
dictionary LanguageModelToolSuccess {
required DOMString callID;
required DOMString name;
required LanguageModelDataContent result;
};
// Failed tool execution result.
dictionary LanguageModelToolError {
required DOMString callID;
required DOMString name;
required DOMString errorMessage;
};
// The response from executing a tool call - either success or error.
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;
~~~
JS App example of usage for just 1 Tool Call:
~~~
const response = await session.prompt([
{ role: "user", content: "What's the weather in Paris?" },
// Model makes tool call
{ role: "assistant", content: [
{ type: "tool-call", value: {
callID: "call_1",
name: "getWeather",
arguments: {city: "Paris"}
}}
]},
// App returns tool result
{ role: "user", content: [
{ type: "tool-response", value: {
callID: "call_1",
name: "getWeather",
result: {
type: "text",
value: "Sunny, 22°C"
}
}}
]}
]);
~~~
I actually feel "object" is like a base::Value in c++, it can have arbitrary structure is still a json.
With LanguageModelData[Content|Type], how would we represent a map/dict? e.g, {"SFO": "22C", "LAX": false} is it always serialized to text? Also, our formatter works with a structured object (base::Value) below the chrome_ml_api layer.
JS APP will have to stringify it which is the Tool use v1 spec.
I would think C++ base::Value cannot be used for "audio" or "image" - the multimodal tool result we would like to address.
So when client pass a text string as tool result, we still require it to be serializable to a json object, and we throw error if it doesn't, correct?
And if client pass multimodal tool reuslt, I suppose we will check that the session is created with the correct "expectedInput" as well?
Do you have any use case for multimodal tool results? I don't think server side APIs support multimodal tool results either.
It could be just plain text or a serialized JSON string. We do not need to enforce it.
>...Do you have any use case for multimodal tool results?...
It is somewhat similar to [multimodal input sample](https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#multimodal-inputs).
[Add multimodal tool outputs #149](https://github.com/webmachinelearning/prompt-api/pull/149) has discussions in commit on getting a video frame with a ToolCall.
>...don't think server side APIs support multimodal tool results ...
Yes, plain text is expected at the moment.
[Tool calling: return types? #138](https://github.com/webmachinelearning/prompt-api/issues/138) discussed about multimodal tool results.
>...the correct "expectedInput" as well...
I think so.
Thank you for the pointers on those issues! Now I see why we need the {type - value} pair for multimodal tool results.
Then my only question left is the "DOMString" type of text tool result - Our formatter inside chrome_ml_api expects a structured object like base::Value to do various formatting and escaping. If we go with string in the signature, shall we try to de-serialize it into a json (base::Value) in the renderer?
An alternative would be to also include "object" in LanguageModelDataValue, and then we don't do any special processing to the string. it will be treated by the formatter as a base::Value with type string.
// TODO(crbugs.com/422803232): Implement tool capabilities.```suggestion
// TODO(crbug.com/422803232): Implement tool capabilities.
```
I have similar questions:
1) Should we add an `object` LanguageModelDataType and let tools yield serializable JS objects (like `inputSchema` itself), as a model-agnostic structured format, instead of forcing tools to serialize those objects into JSON strings themselves?
2) Should tools be able to yield a sequence of LanguageModelDataContent, so they can return mixed modalities?
3) Is there a real advantage of coercing errors into a named `errorMessage` string field on a dedicated error type, versus a shared LanguageModelToolResponse being constructed with `result: 'Error: 'API rate limit exceeded'`, or possibly also passing structured objects errors if we permit objects in (1)? Does the API itself need to provide this distinction to the model, or for session history?
i.e. What if we had:
```
// Data-only value (no tools)
typedef (
ImageBitmapSource
or AudioBuffer
or HTMLAudioElement
or BufferSource
or DOMString
or object
) LanguageModelDataValue;
// Data-only type enum
enum LanguageModelDataType { "text", "image", "audio", "object" };
// Data-only content (no tool-call, no tool-response)
dictionary LanguageModelDataContent {
required LanguageModelDataType type;
required LanguageModelDataValue value;
};
// Tool execution response; expresses a result or an error to the model.
dictionary LanguageModelToolResponse {
required DOMString callID;
required DOMString name;
// TODO(crbugs.com/422803232): Implement tool call handling. // TODO(crbugs.com/422803232): Implement tool response handling.Thank all for the thoughtful discussions here.
Hi Jingyun,
> ... include "object" in LanguageModelDataValue...
yes, we can use object.
...Our formatter inside chrome_ml_api expects a structured object like base::Value...
I investigated the current architecture. In chrome_ml_types.h, we have:
```
using InputPiece =
std::variant<Token, std::string, SkBitmap, AudioBuffer, bool>;
```
I am not sure something about `base::Value` input to chrome_ml_api Imp.
Anyway, I plan to add a new ToolResponse variant to InputPiece. This will preserve the tool response structure all the way to the ML API implementation layer, where each implementation (Chrome ML API, Edge ML API, etc.) can format it appropriately for their underlying models.
Hi Mike,
>...1) Should we add an object LanguageModelDataType...
Yes, will do.
>...2) Should tools be able to yield a sequence of LanguageModelDataContent, ...
Yes, agreed. This provides flexibility for multi-modal tool results in the
future.
>...3) Is there a real advantage of coercing errors into a named errorMessage string field on a dedicated error type, versus a shared LanguageModelToolResponse...
Thank you for the thoughtful question! I believe separate ToolSuccess/ToolError types provide meaningful advantages:
(a) Type Safety & Developer Experience: Both approaches work for JS apps, but separate types enable automatic type discrimination ('errorMessage' in response) without requiring developers to inspect content or parse for status fields.
(b) Debugging & Transparency: Explicit LanguageModelToolError makes the error condition immediately visible in logs, DevTools, and debuggers. With merged types, we'd need heuristics to determine if content represents an error.
(c) Web Platform Consistency: This pattern aligns with established Web APIs (e.g., FileReader: onload vs. onerror, Promise: .then() vs. .catch(), Fetch: Response vs. network errors) and C++ conventions.
(d) ML API Implementation Flexibility: The discriminated type allows ML API implementations to apply different prompt formatting strategies (e.g., "Tool succeeded: {...}" vs. "Tool FAILED: ..."), which may improve model understanding of tool execution status.
I'm happy to proceed with separate types unless there are other considerations I haven't addressed. Let me know your thoughts!
// TODO(crbugs.com/422803232): Implement tool capabilities.```suggestion
// TODO(crbug.com/422803232): Implement tool capabilities.
```
Done
// When this object was created with LanguageModelToolDeclaration, this nowand expectedOutputs has {"tool-call"}.
Same for the comment in promptStreaming().
Done
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;PTAL with PS 10. Thanks!
// TODO(crbugs.com/422803232): Implement tool call handling. // TODO(crbugs.com/422803232): Implement tool response handling.| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |