Hi all,
As promised here are the responses from the engineering team (everything italic in quotes is verbatim):
"The model and the browser side [do] exactly the same thing for these two APIs, the only difference is the renderer will do some aggregation on the partial results before returning the final output for prompt(). So performance wise they should be at the same level."
For 3), code snippet courtesy of
@Mingyu Lei:
It seems like there's a bug, but engineering isn't super decided yet. The following test suggests a complex long prompt that's aborted causes a follow-up simple prompt to take longer than expected, but they are still investigating.
let s = await ai.languageModel.create();
let s2 = await ai.languageModel.create();
let c = new AbortController();
s.prompt("what's the result of 1+2?", { signal: c.signal })
console.log(Date.now());
c.abort();
let ss = await s2.promptStreaming("what's the result of 1+1?");
for await (const chunk of ss) {
console.log(Date.now());
console.log(chunk);
}
- "The model is loaded when a session is first created
- The model is unloaded after a delay when the last session is deleted (currently 1 minute, but this may change)
So if a dev wants to be sure the model stays loaded and avoid model load cost for future sessions, the easiest way is to keep a session alive. This is only recommended if they are sure it will be used again, as the model takes up a lot of system resources."
Cheers,
Tom