Hi All,
This CL is ready for the review.
Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Load-balancing this to dom@ for the Blink / Blink public API bits in the mojom
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// Tool use is not yet supported in browser-side model execution.So there is no production usage of the `tool_call` and `tool_response` members that this CL adds to `AILanguageModelPromptContent` yet, right?
array<AILanguageModelToolDeclaration>? tools;I don't see any browser-side usage of this yet. Am I missing it?
// but handle it safely. Skip this tool rather than sending invalid data.Shouldn't we just CHECK() instead, if we're so sure it shouldn't happen? Or expose the error to JavaScript?
base::JSONReader::Read(json_str, base::JSON_PARSE_CHROMIUM_EXTENSIONS);The behavior of `JSON_PARSE_CHROMIUM_EXTENSIONS` is web-exposed right? Is there a concern about exporting this "CHROMIUM_EXTENSION" behavior to the platform? Is it spec'd?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
@Dominic - thank you for your review. PTAL with the updates.
// Tool use is not yet supported in browser-side model execution.So there is no production usage of the `tool_call` and `tool_response` members that this CL adds to `AILanguageModelPromptContent` yet, right?
yes. no production usage yet.
content/browser/ai/echo_ai_language_model.{h,cc} use them for blink web tests.
I don't see any browser-side usage of this yet. Am I missing it?
content/browser/ai/echo_ai_language_model.{h,cc} use them for tests.
// but handle it safely. Skip this tool rather than sending invalid data.Shouldn't we just CHECK() instead, if we're so sure it shouldn't happen? Or expose the error to JavaScript?
Exposed the error to JS. Updated test expectations.
base::JSONReader::Read(json_str, base::JSON_PARSE_CHROMIUM_EXTENSIONS);The behavior of `JSON_PARSE_CHROMIUM_EXTENSIONS` is web-exposed right? Is there a concern about exporting this "CHROMIUM_EXTENSION" behavior to the platform? Is it spec'd?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// but handle it safely. Skip this tool rather than sending invalid data.Frank LiShouldn't we just CHECK() instead, if we're so sure it shouldn't happen? Or expose the error to JavaScript?
Exposed the error to JS. Updated test expectations.
You did this by calling ToLocalChecked(), right? Doesn't this just crash if an exception was thrown in the stringifier? Does `Stringifier()` call an author's `toString()` implementation? If so, that can throw an error, and I think we want to re-throw the error to JS, which I don't think this CL does.
base::JSONReader::Read(json_str, base::JSON_PARSE_CHROMIUM_EXTENSIONS);Frank LiThe behavior of `JSON_PARSE_CHROMIUM_EXTENSIONS` is web-exposed right? Is there a concern about exporting this "CHROMIUM_EXTENSION" behavior to the platform? Is it spec'd?
Good suggestion. changed it to use JSON_PARSE_RFC.
The result of an object's `toString()` implementation can still provide a String that is invalid JSON. We want semantics that match `JSON.parse()` I think.
Can you add a WPT ensuring that when an inputScheme's object's toString() is something like the following invalid strings that throw in `JSON.parse()` they also throw appropriately here?
```
JSON.parse(`{"a": 10,}`);
JSON.parse(`{
"a": 10,
// Comment.
"b": 11
}
`);
```
v8::JSON::Stringify(context, schema_v8);Does this call an author's `toString()` implementation.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
v8::JSON::Stringify(context, schema_v8);Does this call an author's `toString()` implementation.
Oops... forgot to go back and edit this incomplete sentence. Yeah, I just wanted to re-check that if this calls the developer-supplied stringifier, we handle the error appropriately. The use of `ToLocalChecked()` below seems risky.
// that shouldn't be web-exposed.Suggestion: "CHROMIUM_EXTENSIONS, for example, would relax the parser beyond what the platform currently allows with `JSON.parse()`."
base::JSONReader::Read(json_str, base::JSON_PARSE_RFC);Btw, did you consider using `ParseJSON()` in renderer/platform here at all? Would it be any more ergonomic?
} else {in the non-streaming API, the model might also output text + tool call altogether. So I think we need to package the response + tool_call together and call resolver.Resolve()?
if (schema_v8.IsEmpty() || schema_v8->IsNull()) {
return "Tool inputSchema cannot be null.";
}To double check, does this check allows tools without parameter? basically the JS argument would be `{inputSchema: {}}`?
// Reject kToolCall in expectedInputs - tool calls are model outputs, not
// inputs. Tool responses should be used to send results back.I think we still need to allow people to prefill/append ToolCall and ToolResponse for few shot examples? (if the feature is enabled)
// If all tools failed to convert (e.g., JSON.stringify failures), weshould we throw error if any tool failed to convert because the prompt won't match client's intention?
Thanks @Jingyun and @Dominic for the CR feedback!
PTAL with the updates.
in the non-streaming API, the model might also output text + tool call altogether. So I think we need to package the response + tool_call together and call resolver.Resolve()?
Good catch! Added code/test to cover this.
if (schema_v8.IsEmpty() || schema_v8->IsNull()) {
return "Tool inputSchema cannot be null.";
}To double check, does this check allows tools without parameter? basically the JS argument would be `{inputSchema: {}}`?
It fails because it is an object without type key. I have added the
"Test tool with empty object inputSchema." test.
// but handle it safely. Skip this tool rather than sending invalid data.Frank LiShouldn't we just CHECK() instead, if we're so sure it shouldn't happen? Or expose the error to JavaScript?
Dominic FarolinoExposed the error to JS. Updated test expectations.
You did this by calling ToLocalChecked(), right? Doesn't this just crash if an exception was thrown in the stringifier? Does `Stringifier()` call an author's `toString()` implementation? If so, that can throw an error, and I think we want to re-throw the error to JS, which I don't think this CL does.
I had changed the code to use `v8::TryCatch` to capture exceptions prior to use `ToLocalChecked()`. It is safe to use it now.
>...Does Stringifier() call an author's toString() implementation?...
JSON.stringify() calls toJSON() (not toString()), and also invokes getters when accessing properties.
>...can throw an error, and I think we want to re-throw the error to JS...
Added code/test for this.
base::JSONReader::Read(json_str, base::JSON_PARSE_CHROMIUM_EXTENSIONS);Frank LiThe behavior of `JSON_PARSE_CHROMIUM_EXTENSIONS` is web-exposed right? Is there a concern about exporting this "CHROMIUM_EXTENSION" behavior to the platform? Is it spec'd?
Dominic FarolinoGood suggestion. changed it to use JSON_PARSE_RFC.
The result of an object's `toString()` implementation can still provide a String that is invalid JSON. We want semantics that match `JSON.parse()` I think.
Can you add a WPT ensuring that when an inputScheme's object's toString() is something like the following invalid strings that throw in `JSON.parse()` they also throw appropriately here?
```
JSON.parse(`{"a": 10,}`);
JSON.parse(`{
"a": 10,
// Comment.
"b": 11
}
`);
```
Added the test with the your examples: `createLanguageModel with schema containing invalid-JSON-like text (trailing commas, comments) as string content succeeds because JSON.stringify escapes properly`
JSON.stringify escapes them, making the RFC parser validation unreachable through normal web API usage. The DLOG tracings of the code:
```
[26716:24284:0122/121231.318:INFO:third_party\blink\renderer\modules\ai\language_model_create_client.cc:151] [TEMP] JSON.stringify output for tool '"exampleTool"':
{"type":"object","properties":{"trailingCommaExample":{"type":"string","description":"{\"a\": 10,}"},"commentExample":{"type":"string","description":"{\n \"a\": 10,\n //
Comment.\n \"b\": 11\n}"}}}
[26716:24284:0122/121231.319:INFO:third_party\blink\renderer\modules\ai\language_model_create_client.cc:161] [TEMP] RFC parsing result for tool '"exampleTool"': SUCCESS
```
Dominic FarolinoDoes this call an author's `toString()` implementation.
Oops... forgot to go back and edit this incomplete sentence. Yeah, I just wanted to re-check that if this calls the developer-supplied stringifier, we handle the error appropriately. The use of `ToLocalChecked()` below seems risky.
see previous response from https://chromium-review.googlesource.com/c/chromium/src/+/7490714/comment/265e9b7f_4017c3e6/
Suggestion: "CHROMIUM_EXTENSIONS, for example, would relax the parser beyond what the platform currently allows with `JSON.parse()`."
Done
base::JSONReader::Read(json_str, base::JSON_PARSE_RFC);Btw, did you consider using `ParseJSON()` in renderer/platform here at all? Would it be any more ergonomic?
Investigated `ParseJSON()` - it returns `JSONValue*` while we need `base::Value` for Mojo. Current approach avoids conversion. Is this acceptable or would you prefer adding the conversion step?
// Reject kToolCall in expectedInputs - tool calls are model outputs, not
// inputs. Tool responses should be used to send results back.I think we still need to allow people to prefill/append ToolCall and ToolResponse for few shot examples? (if the feature is enabled)
I have added TODO for it.
// If all tools failed to convert (e.g., JSON.stringify failures), weshould we throw error if any tool failed to convert because the prompt won't match client's intention?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (schema_v8.IsEmpty() || schema_v8->IsNull()) {
return "Tool inputSchema cannot be null.";
}Frank LiTo double check, does this check allows tools without parameter? basically the JS argument would be `{inputSchema: {}}`?
It fails because it is an object without type key. I have added the
"Test tool with empty object inputSchema." test.
I think we want to support tools without inputs too, especially when the tool is informative or stateful.
// Reject kToolCall in expectedInputs - tool calls are model outputs, not
// inputs. Tool responses should be used to send results back.Frank LiI think we still need to allow people to prefill/append ToolCall and ToolResponse for few shot examples? (if the feature is enabled)
I have added TODO for it.
thanks! can you also add TODO in LanguageModelPromptBuilder::ProcessEntry?
Hi @Jingyun,
PTAL with the updates. Thanks!
if (schema_v8.IsEmpty() || schema_v8->IsNull()) {
return "Tool inputSchema cannot be null.";
}Frank LiTo double check, does this check allows tools without parameter? basically the JS argument would be `{inputSchema: {}}`?
Jingyun LiuIt fails because it is an object without type key. I have added the
"Test tool with empty object inputSchema." test.
I think we want to support tools without inputs too, especially when the tool is informative or stateful.
It is already supported for tool without input parameters.
Example: {name: "getCurrentTime", description: "...", inputSchema: {type: "object"}}
I have also added above format into the existing no argument test.
`inputSchema` is required from idl. `type` of `object` is required by JSON Schema.
// Reject kToolCall in expectedInputs - tool calls are model outputs, not
// inputs. Tool responses should be used to send results back.Frank LiI think we still need to allow people to prefill/append ToolCall and ToolResponse for few shot examples? (if the feature is enabled)
Jingyun LiuI have added TODO for it.
thanks! can you also add TODO in LanguageModelPromptBuilder::ProcessEntry?
Ah. Thanks for the reminder! I have added TODO comment there too.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
// It is not used in the browser-side unit_tests yet.`NOTREACHED();`?
v8::String::NewFromUtf8Literal(script_state->GetIsolate(), "type");From code like [this](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_iterator.h;l=210;drc=448ab749d3436af5c6686ca81457648466cf65c4) and [this](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/canvas/image_data.cc;l=401;drc=bf712ec1a13783224debb691ba88ad5c15b93194), I get the impression that `V8AtomicString()` might be more efficient. Could you change this and below instances?
result.exception = try_catch.Exception();Do we need to call `try_catch.Reset()` here? When rejecting a Promise with an exception scooped up from a TryCatch, I'm used to [seeing `Reset()`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_iterator.cc;l=336-338;drc=5f8643cf150e97b694f2214ee092c34cc0ccf5fe), although I admit I'm not positive if it's necessary.
base::JSONReader::Read(json_str, base::JSON_PARSE_RFC);Frank LiBtw, did you consider using `ParseJSON()` in renderer/platform here at all? Would it be any more ergonomic?
Investigated `ParseJSON()` - it returns `JSONValue*` while we need `base::Value` for Mojo. Current approach avoids conversion. Is this acceptable or would you prefer adding the conversion step?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hi @Domininic,
PS 14 has the updates for your review. Thanks!
// It is not used in the browser-side unit_tests yet.Frank Li`NOTREACHED();`?
Done
v8::String::NewFromUtf8Literal(script_state->GetIsolate(), "type");From code like [this](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_iterator.h;l=210;drc=448ab749d3436af5c6686ca81457648466cf65c4) and [this](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/canvas/image_data.cc;l=401;drc=bf712ec1a13783224debb691ba88ad5c15b93194), I get the impression that `V8AtomicString()` might be more efficient. Could you change this and below instances?
Done
Do we need to call `try_catch.Reset()` here? When rejecting a Promise with an exception scooped up from a TryCatch, I'm used to [seeing `Reset()`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_iterator.cc;l=336-338;drc=5f8643cf150e97b694f2214ee092c34cc0ccf5fe), although I admit I'm not positive if it's necessary.
Although the destructor also cleans it up, I have followed standard existing pattern that you provided to be explicit.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for working on this, and being patient with our reviews!
// Tool use is not yet supported in browser-side model execution.optional nit:
```suggestion
// TODO(crbug.com/422803232): Support on_device_model tool use
```
#include "base/notreached.h"remove if unused
(input.find("[TOOL_SUCCESS:") != std::string::npos) ||
(input.find("[Tool Error:") != std::string::npos);nit: align tool response formats (maybe `<tool-success name='get_weather' id='123' result=['rainy', '52F']>` and `<tool-error message='Location Not Found'>` or similar. Maybe define string constants in this file for parsing and building those strings? Maybe leverage base::Value ValueView/JSON/DebugString?
// Generate tool calls if tools are available and input doesn't contain toolIt might be aligned with EchoAILanguageModel behavior to just 'echo' the tool declarations in it's initialization, like `did_echo_initial_prompts_`, but we'd like to trend away from Echo impl behaviors that aren't to spec, so the pattern you're using here is better.
Perhaps this class should just emit tool calls when callers explicitly request mock tool call generation with a well-known input string prefix, like "<GenerateSimpleToolCalls>". or explicitly track a `did_emit_tool_calls_` instead of inferring from tool response presence is `input`.
// Check if any tool has a ToolCallResponsePrefix for mixed text + toolIt seems like any input (e.g. "<GenerateSimpleToolCalls>Here is your tool call:") could be echoed before tool calls are emitted (optionally stripping the pseudo-token), to simplify logic and usage quite a bit.
Per-tool prefixes might be useful if they were ordered immediately preceding the respective tool call, but currently the first tool decl with a prefix is yielded before all calls, which seems like a suboptimal first impl.
responder->OnToolCalls(std::move(tool_calls));
// Open-loop pattern: Must call OnCompletion() after OnToolCalls() to
// signal that tool call generation is complete. The renderer will convert
// tool calls to structured messages and resolve the prompt() promise.
responder->OnCompletion(
blink::mojom::ModelExecutionContextInfo::New(current_tokens_));
return;Since GenerateSimpleToolCalls actually just ignores the `input` param, that input is never echo'ed. That should probably be fixed along with suggestions above.
for (size_t prompt_idx = 0; prompt_idx < prompts.size(); ++prompt_idx) {
const auto& prompt = prompts[prompt_idx];
for (size_t content_idx = 0; content_idx < prompt->content.size();
++content_idx) {
auto& content = prompt->content[content_idx];nit: revert if indicies aren't used?
// Tool calls should not be echoed back to the model.Eventually, we will support this (e.g. clients append assistant-role tool calls to restore past session history or serve as multi-shot examples)
if (text_value) {
result += *text_value;
}```suggestion
result = base::StrCat({result, text_value ? *text_value : "<text>"});
```
result += "<" + (type_str ? *type_str : "unknown") + ">";nit: StrCat here and elsewhere
if (!args_json.has_value()) {
// No valid Args JSON found.
return base::Value(base::Value::Dict());
}
return std::move(*args_json);nit:
```suggestion
return args_json.value_or(base::Value(base::Value::Dict()));
```
EchoAILanguageModel::ExtractToolCallResponsePrefix() {nit: Use 'ToolCallPrefix' here and elsewhere instead of 'ToolCallResponsePrefix'
```suggestion
EchoAILanguageModel::ExtractToolCallPrefix() {
```
std::vector<blink::mojom::ToolCallPtr>& tool_calls) {optional nit: make this a return type?
// Create tool call.nit: remove comments that just repeat the code
// Extract argument hints from tool description.
base::Value args_value = ExtractArgumentHints(tool->description);
tool_call->arguments = std::move(args_value);```suggestion
tool_call->arguments = ExtractArgumentHints(tool->description);
```
// Reject kToolCall in expectedInputs - tool calls are model outputs,ditto, we'll eventually support this, but okay to punt for now.
// Tool declarations for `Tool Use`.nit: remove
// Tool calls should be converted to structured messages and
// returned to JavaScript. The caller will NOT automatically execute tools -
// JavaScript must handle execution manually and send results back.Let's generalize the recommended pattern for handling calls in this context, maybe:
```suggestion
// Clients should validate and then execute or surface calls to JS API client.
// Model sessions typically expect a tool result or error input thereafter.
```
"+base/json/json_reader.h",Per other comment, can we use v8::JSON::Parse|Stringify to avoid atypical base/json use in Blink?
// LanguageModelMessageContent with type "tool-call". This is used in `Tool Use`
// to return tool calls as structured messages instead of executing them
// automatically.optional nit: remove second sentence
v8::Local<v8::Context> context = script_state->GetContext();Maybe check ContextIsValid(), ditto in helper below
// Convert base::Value to JSON string.Would V8ValueConverterImpl::ToV8Value help here?
https://source.chromium.org/chromium/chromium/src/+/main:content/renderer/v8_value_converter_impl.h;drc=bf712ec1a13783224debb691ba88ad5c15b93194;l=29
It seems like base::JSONWriter is not generally used in Blink
return ScriptValue(isolate, v8::Object::New(isolate));Maybe ScriptValue::CreateNull(isolate) here and in other failures below?
auto* content = LanguageModelMessageContent::Create();
content->setType(V8LanguageModelMessageType(
V8LanguageModelMessageType::Enum::kToolCall));Maybe move this down to where the value is set?
ConvertMojoValueToV8(script_state, tc->arguments);Should this disambiguate empty args from a failure to parse args?
DummyExceptionStateForTesting exception_state;This doesn't seem appropriate can we construct an ExceptionState from the isolate and actually yield them in the context of the of the API client?
tools; // Empty for CanCreateLanguageModel.nit: remove comment or move above decl
result.error_message = "Tool description cannot be empty.";Maybe include the tool name in all errors here and below
result.error_message = "Duplicate tool name: " + tool->name();nit: StrCat here and elsewhere
// Check if the object is actually null (Web IDL might pass null as object).Can't we expect automatic enforcement of this check (and the fact that schema is a non-null object?), by the fact that LanguageModelToolDeclaration idl uses the `required` keyword and specifies the member type, etc?
V8AtomicString(script_state->GetIsolate(), "type");optional nit: cache isolate for repeated use
v8::TryCatch try_catch(script_state->GetIsolate());If we didn't explicitly catch, would that just throw to the caller, and is that sufficient?
// Check if a V8 exception was thrown (from custom toJSON/getter).
if (try_catch.HasCaught()) {
// Capture the exception value before resetting.
result.exception = try_catch.Exception();
try_catch.Reset();
return result;
}Maybe do this before checking whether maybe_json is empty?
// TODO(crbug.com/422803232): Investigate allowing kToolCall in
// expectedInputs for few-shot prompting examples. This would require
// careful consideration of:
// 1. How to distinguish few-shot examples from actual assistant outputs.
// 2. Whether models can effectively learn from few-shot tool call
// examples.
// 3. Whether system prompts can adequately substitute for this use case.Let's omit listing thorough considerations here, the TODO can just be:
```suggestion
// TODO(crbug.com/422803232): Maybe allow kToolCall expectedInputs.
```
if (expected->type != mojom::blink::AILanguageModelPromptType::kText &&
expected->type !=
mojom::blink::AILanguageModelPromptType::kToolResponse &&nit: invert to check for kImage or kAudio
bool has_tool_call_in_expected_outputs = false;Determine this from the expectedOutputs loop above
// Validate tools if provided, before processing initialPrompts.
if (options_->hasTools() && !options_->tools().empty()) {
// Check if the AIPromptAPIToolUse feature is enabled.
if (!RuntimeEnabledFeatures::AIPromptAPIToolUseEnabled(
GetExecutionContext())) {
GetResolver()->Reject(DOMException::Create(
"Tool use feature is not enabled",
DOMException::GetErrorName(DOMExceptionCode::kNotSupportedError)));
return;
}
// Validate tool declarations.
ValidationResult validation_result =
ValidateToolDeclarations(GetScriptState(), options_->tools());
// If an exception was thrown, reject the promise with it.
if (!validation_result.exception.IsEmpty()) {
GetResolver()->Reject(validation_result.exception);
return;
}
// No exception, so if there's a validation error, reject with it.
if (!validation_result.error_message.empty()) {
GetResolver()->RejectWithTypeError(validation_result.error_message);
return;
}
// Validate that expectedOutputs includes "tool-call" when tools are
// provided.
bool has_tool_call_in_expected_outputs = false;
if (options_->hasExpectedOutputs()) {
for (const auto& expected : options_->expectedOutputs()) {
if (expected->type() == V8LanguageModelMessageType::Enum::kToolCall) {
has_tool_call_in_expected_outputs = true;
break;
}
}
}
if (!has_tool_call_in_expected_outputs) {
GetResolver()->RejectWithTypeError(
"When tools are provided, expectedOutputs must include {type: "
"\"tool-call\"}.");
return;
}Perhaps some of this tool decl validation code could be moved into ValidateToolDeclarations? It's a lot of added logic in LanguageModelCreateClient::Create
maybe_tools = ConvertToolsToMojo(GetScriptState(), options_->tools());It seems worth considering combining validation and conversion.
for (size_t i = 0; i < result_items.size(); ++i) {avoid index if not needed
std::optional<base::Value> parsed_value =
base::JSONReader::Read(json_str, base::JSON_PARSE_RFC);
if (!parsed_value) {
// JSONReader failed to parse the stringified value.
// This can happen if the value is undefined, or in other edge cases.
return std::nullopt;
}stringifying and then immediately parsing seems strange, can we maybe use V8ValueConverterImpl::FromV8Value() or similar? Maybe fail later base::JSON handling fails in lower layers?
// Tool calls should not be provided as input to the model.
// Tool calls are generated BY the model, not sent TO the model.
// JavaScript should only send back tool RESPONSES after executing tools.
// TODO(crbug.com/422803232): Consider supporting tool calls as input for
// few-shot examples (prefilling conversations with example tool
// interactions).Ditto, please soften and simplify these comments and error strings
// LanguageModelToolResponse is a union of ToolSuccess and ToolError
// interfaces. The bindings generator properly discriminates between the
// two types.nit: remove
if (!allowed_types_.Contains(
mojom::blink::AILanguageModelPromptType::kToolResponse)) {
Reject(DOMException::Create(
"Tool responses not supported. Session is not initialized with "
"tool support.",
DOMException::GetErrorName(DOMExceptionCode::kNotSupportedError)));
return;
}Check this before validating the response content as success/error
// Failed to serialize tool result (circular reference, function,
// etc).remove comments that echo code
// completion. For `Tool Use` in LanguageModel API: resolves with tool call
// messages if present, otherwise with text response. For other APIs
// (Proofreader, etc.): always resolves with text response, ignoring tool_calls
// parameter.This generic resolve-on-completing helper could more simply note that tool calls are ignored here. (and optionally check that the vector is empty?)
// For `Tool Use`: `complete_callback` receives BOTH text response AND mojo
// tool calls. Text response is always provided (may be empty). Tool calls
// vector may be empty if no tools were called. The caller must convert
// non-empty tool calls to Blink dictionaries in V8 context when resolving.This rather unnecessarily and inflexibly enforces model-specific details in a messaging tool.
I'd prefer to see a separate tool_call_callback handler added, leaving usage of this class to handle the ordering of streaming text, then calls, then a completion signal, and avoid changes to the complete callback and generic helpers like ResolvePromiseOnCompletion.
// `Tool Use` Open-loop: Store mojo tool calls. We'll convert them to
// LanguageModelToolCall interface instances in OnCompletion when we have a
// proper V8 context. This avoids V8 handle lifetime issues across async
// mojo callbacks.I'm not sure this detail is pertinent here. What makes the v8 context lifetime situation different here from the streamingresponder impl?
// Test callback: Verify context_info and resolve with text
// response. Tool calls are not exercised in this test.nit: remove comment
// Adapter lambda: Proofreader doesn't use tool calls. Extract textIt would be nice to avoid this
"platforms": ["Linux", "Mac", "Win"],we might be able to enabled Chrome OS here too
// META: title=Language Model Tool Use (Open Loop)I'll have to review this file later
This suite runs Web Platform Tests (WPT) for the Language Model API with Tool Usenit: wrap file at 80 chars
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
In the future, when it's feasible, smaller changes would likely be easier to review.
Hi Mike,
Thank you for your thorough review! PTAL with the updates (PS 15 and 16).
// Tool use is not yet supported in browser-side model execution.optional nit:
```suggestion
// TODO(crbug.com/422803232): Support on_device_model tool use
```
Done
#include "base/notreached.h"Frank Liremove if unused
removed: "base/notreached.h"
(input.find("[TOOL_SUCCESS:") != std::string::npos) ||
(input.find("[Tool Error:") != std::string::npos);nit: align tool response formats (maybe `<tool-success name='get_weather' id='123' result=['rainy', '52F']>` and `<tool-error message='Location Not Found'>` or similar. Maybe define string constants in this file for parsing and building those strings? Maybe leverage base::Value ValueView/JSON/DebugString?
updated code in: `EchoAILanguageModel::ExtractToolResponseText()`
Changed from [TOOL_SUCCESS:] / [Tool Error:] to
<tool-success id='X' name='Y' result=[...]> / <tool-error id='X' name='Y' message="...">
// Generate tool calls if tools are available and input doesn't contain toolIt might be aligned with EchoAILanguageModel behavior to just 'echo' the tool declarations in it's initialization, like `did_echo_initial_prompts_`, but we'd like to trend away from Echo impl behaviors that aren't to spec, so the pattern you're using here is better.
Perhaps this class should just emit tool calls when callers explicitly request mock tool call generation with a well-known input string prefix, like "<GenerateSimpleToolCalls>". or explicitly track a `did_emit_tool_calls_` instead of inferring from tool response presence is `input`.
// Check if any tool has a ToolCallResponsePrefix for mixed text + toolIt seems like any input (e.g. "<GenerateSimpleToolCalls>Here is your tool call:") could be echoed before tool calls are emitted (optionally stripping the pseudo-token), to simplify logic and usage quite a bit.
Per-tool prefixes might be useful if they were ordered immediately preceding the respective tool call, but currently the first tool decl with a prefix is yielded before all calls, which seems like a suboptimal first impl.
Done
responder->OnToolCalls(std::move(tool_calls));
// Open-loop pattern: Must call OnCompletion() after OnToolCalls() to
// signal that tool call generation is complete. The renderer will convert
// tool calls to structured messages and resolve the prompt() promise.
responder->OnCompletion(
blink::mojom::ModelExecutionContextInfo::New(current_tokens_));
return;Since GenerateSimpleToolCalls actually just ignores the `input` param, that input is never echo'ed. That should probably be fixed along with suggestions above.
Done
for (size_t prompt_idx = 0; prompt_idx < prompts.size(); ++prompt_idx) {
const auto& prompt = prompts[prompt_idx];
for (size_t content_idx = 0; content_idx < prompt->content.size();
++content_idx) {
auto& content = prompt->content[content_idx];nit: revert if indicies aren't used?
Done
// Tool calls should not be echoed back to the model.Eventually, we will support this (e.g. clients append assistant-role tool calls to restore past session history or serve as multi-shot examples)
Added TODO.
```suggestion
result = base::StrCat({result, text_value ? *text_value : "<text>"});
```
Done
result += "<" + (type_str ? *type_str : "unknown") + ">";nit: StrCat here and elsewhere
Done
if (!args_json.has_value()) {
// No valid Args JSON found.
return base::Value(base::Value::Dict());
}
return std::move(*args_json);nit:
```suggestion
return args_json.value_or(base::Value(base::Value::Dict()));
```
build error:
~~~
../../third_party/libc++/src/include\optional(1118,19): error: static assertion failed due to requirement 'is_copy_constructible_v<base::Value>': optional<T>::value_or: T must be copy constructible
1118 | static_assert(is_copy_constructible_v<_Tp>, "optional<T>::value_or: T must be copy constructible");
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../content/browser/ai/echo_ai_language_model.cc(387,20): note: in instantiation of function template specialization 'std::optional<base::Value>::value_or<base::Value>' requested here
387 | return args_json.value_or(base::Value(base::Value::Dict()));
| ^
In file included from ../../content/browser/ai/echo_ai_language_model.cc:5:
In file included from ../..\content/browser/ai/echo_ai_language_model.h:8:
../../third_party/libc++/src/include\optional(1120,32): error: call to deleted constructor of 'const value_type' (aka 'const base::Value')
1120 | return this->has_value() ? this->__get() : static_cast<_Tp>(std::forward<_Up>(__v));
| ^~~~~~~~~~~~~
../..\base/values.h(718,3): note: 'Value' has been explicitly marked deleted here
718 | Value(const Value&) = delete;
| ^
~~~
The issue is that base::Value is move-only and .value_or() requires copy-constructibility.
EchoAILanguageModel::ExtractToolCallResponsePrefix() {nit: Use 'ToolCallPrefix' here and elsewhere instead of 'ToolCallResponsePrefix'
```suggestion
EchoAILanguageModel::ExtractToolCallPrefix() {
```
Removed `ExtractToolCallResponsePrefix()` since it is no longer needed.
std::vector<blink::mojom::ToolCallPtr>& tool_calls) {optional nit: make this a return type?
Done
nit: remove comments that just repeat the code
Done
// Extract argument hints from tool description.
base::Value args_value = ExtractArgumentHints(tool->description);
tool_call->arguments = std::move(args_value);```suggestion
tool_call->arguments = ExtractArgumentHints(tool->description);
```
Done
// Reject kToolCall in expectedInputs - tool calls are model outputs,ditto, we'll eventually support this, but okay to punt for now.
Added TODO.
// Tool declarations for `Tool Use`.Frank Linit: remove
Done
// Tool calls should be converted to structured messages and
// returned to JavaScript. The caller will NOT automatically execute tools -
// JavaScript must handle execution manually and send results back.Let's generalize the recommended pattern for handling calls in this context, maybe:
```suggestion
// Clients should validate and then execute or surface calls to JS API client.
// Model sessions typically expect a tool result or error input thereafter.
```
Done
Per other comment, can we use v8::JSON::Parse|Stringify to avoid atypical base/json use in Blink?
Updated to use Stringify
// LanguageModelMessageContent with type "tool-call". This is used in `Tool Use`
// to return tool calls as structured messages instead of executing them
// automatically.optional nit: remove second sentence
Done
v8::Local<v8::Context> context = script_state->GetContext();Maybe check ContextIsValid(), ditto in helper below
Done
Would V8ValueConverterImpl::ToV8Value help here?
https://source.chromium.org/chromium/chromium/src/+/main:content/renderer/v8_value_converter_impl.h;drc=bf712ec1a13783224debb691ba88ad5c15b93194;l=29It seems like base::JSONWriter is not generally used in Blink
yes. Converted the code to use `V8ValueConverterImpl::ToV8Value`
return ScriptValue(isolate, v8::Object::New(isolate));Maybe ScriptValue::CreateNull(isolate) here and in other failures below?
Done
auto* content = LanguageModelMessageContent::Create();
content->setType(V8LanguageModelMessageType(
V8LanguageModelMessageType::Enum::kToolCall));Maybe move this down to where the value is set?
Done
Should this disambiguate empty args from a failure to parse args?
Done
This doesn't seem appropriate can we construct an ExceptionState from the isolate and actually yield them in the context of the of the API client?
Done
nit: remove comment or move above decl
Done
result.error_message = "Tool description cannot be empty.";Maybe include the tool name in all errors here and below
Done
result.error_message = "Duplicate tool name: " + tool->name();nit: StrCat here and elsewhere
Done
// Check if the object is actually null (Web IDL might pass null as object).Can't we expect automatic enforcement of this check (and the fact that schema is a non-null object?), by the fact that LanguageModelToolDeclaration idl uses the `required` keyword and specifies the member type, etc?
removed the check.
optional nit: cache isolate for repeated use
Done
If we didn't explicitly catch, would that just throw to the caller, and is that sufficient?
No, the exception does NOT automatically propagate to JavaScript. Without the explicit TryCatch, the exception is set in the V8 isolate's pending exception state but simply returning from the method won't cause the Promise to reject with that exception. We need the TryCatch to capture the exception and pass it to GetResolver()->Reject().
// Check if a V8 exception was thrown (from custom toJSON/getter).
if (try_catch.HasCaught()) {
// Capture the exception value before resetting.
result.exception = try_catch.Exception();
try_catch.Reset();
return result;
}Maybe do this before checking whether maybe_json is empty?
Done
// TODO(crbug.com/422803232): Investigate allowing kToolCall in
// expectedInputs for few-shot prompting examples. This would require
// careful consideration of:
// 1. How to distinguish few-shot examples from actual assistant outputs.
// 2. Whether models can effectively learn from few-shot tool call
// examples.
// 3. Whether system prompts can adequately substitute for this use case.Let's omit listing thorough considerations here, the TODO can just be:
```suggestion
// TODO(crbug.com/422803232): Maybe allow kToolCall expectedInputs.
```
Done
if (expected->type != mojom::blink::AILanguageModelPromptType::kText &&
expected->type !=
mojom::blink::AILanguageModelPromptType::kToolResponse &&nit: invert to check for kImage or kAudio
Done
Determine this from the expectedOutputs loop above
Done
// Validate tools if provided, before processing initialPrompts.Perhaps some of this tool decl validation code could be moved into ValidateToolDeclarations? It's a lot of added logic in LanguageModelCreateClient::Create
combined them into `ValidateAndConvertToolDeclarations()`
maybe_tools = ConvertToolsToMojo(GetScriptState(), options_->tools());It seems worth considering combining validation and conversion.
Done
avoid index if not needed
Done
std::optional<base::Value> parsed_value =
base::JSONReader::Read(json_str, base::JSON_PARSE_RFC);
if (!parsed_value) {
// JSONReader failed to parse the stringified value.
// This can happen if the value is undefined, or in other edge cases.
return std::nullopt;
}stringifying and then immediately parsing seems strange, can we maybe use V8ValueConverterImpl::FromV8Value() or similar? Maybe fail later base::JSON handling fails in lower layers?
I have converted the code to use `WebV8ValueConverter`.
// Tool calls should not be provided as input to the model.
// Tool calls are generated BY the model, not sent TO the model.
// JavaScript should only send back tool RESPONSES after executing tools.
// TODO(crbug.com/422803232): Consider supporting tool calls as input for
// few-shot examples (prefilling conversations with example tool
// interactions).Ditto, please soften and simplify these comments and error strings
Done
// LanguageModelToolResponse is a union of ToolSuccess and ToolError
// interfaces. The bindings generator properly discriminates between the
// two types.Frank Linit: remove
Done
if (!allowed_types_.Contains(
mojom::blink::AILanguageModelPromptType::kToolResponse)) {
Reject(DOMException::Create(
"Tool responses not supported. Session is not initialized with "
"tool support.",
DOMException::GetErrorName(DOMExceptionCode::kNotSupportedError)));
return;
}Check this before validating the response content as success/error
Done
// Failed to serialize tool result (circular reference, function,
// etc).remove comments that echo code
Done
// completion. For `Tool Use` in LanguageModel API: resolves with tool call
// messages if present, otherwise with text response. For other APIs
// (Proofreader, etc.): always resolves with text response, ignoring tool_calls
// parameter.This generic resolve-on-completing helper could more simply note that tool calls are ignored here. (and optionally check that the vector is empty?)
Removed the comment since we have changed to use separate tool_call_callback handler without the extra tool_calls param here.
// For `Tool Use`: `complete_callback` receives BOTH text response AND mojo
// tool calls. Text response is always provided (may be empty). Tool calls
// vector may be empty if no tools were called. The caller must convert
// non-empty tool calls to Blink dictionaries in V8 context when resolving.This rather unnecessarily and inflexibly enforces model-specific details in a messaging tool.
I'd prefer to see a separate tool_call_callback handler added, leaving usage of this class to handle the ordering of streaming text, then calls, then a completion signal, and avoid changes to the complete callback and generic helpers like ResolvePromiseOnCompletion.
Refactor to add separate tool_call_callback handler.
// `Tool Use` Open-loop: Store mojo tool calls. We'll convert them to
// LanguageModelToolCall interface instances in OnCompletion when we have a
// proper V8 context. This avoids V8 handle lifetime issues across async
// mojo callbacks.I'm not sure this detail is pertinent here. What makes the v8 context lifetime situation different here from the streamingresponder impl?
Done. Removed comment about V8 handle lifetime issues as it's not pertinent here.
// Test callback: Verify context_info and resolve with text
// response. Tool calls are not exercised in this test.Frank Linit: remove comment
Done
// Adapter lambda: Proofreader doesn't use tool calls. Extract textIt would be nice to avoid this
done. removed the adapter lambda
we might be able to enabled Chrome OS here too
Done
I'll have to review this file later
Acknowledged
This suite runs Web Platform Tests (WPT) for the Language Model API with Tool Usenit: wrap file at 80 chars
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hi Nathan,
Added you here as the owner of files touched under chrome/browser/ai/*.
Thanks!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |