Hi Seth,
I'll give this a shot since nobody else has replied so far...
We're aware that this is an aspect of V8's optimization behavior that isn't necessarily always ideal, depending on your perspective. One might argue that triggering optimization of a function after a certain amount of total usage of that function represents our guess that the function will likely see enough future usage that the CPU cost of optimizing it will likely be amortized eventually, and as such it's fair to optimize any function that accumulates enough total usage, no matter how slowly it reaches that point -- but there is also the memory angle, as you point out.
I think we would probably generally be fine with making changes that avoid (or further delay) optimization of functions that consume their budget extremely slowly, if you want to explore that problem space. I'm not aware of any concrete plans on our side to tackle this specific issue. That said, as you're probably aware we're in the process of rolling out Maglev (and improving it further) and refactoring Turbofan (mostly for better compilation speed), and both of those will likely shake up the existing rules and heuristics for optimization. In general terms, the existence of Maglev means that Turbofan optimization can happen later than it used to; OTOH Turbofan getting faster might counterbalance that to some extent. I personally don't know what the typical size ratio between Maglevved and Turbofanned code for the same function is; *if* there is a significant difference in either direction then that would obviously impact whether we want to make it common or uncommon that a Maglevved function is *not* Turbofanned later.
As a related data point, for Wasm functions unoptimized (Liftoff) code is typically (though not always) quite a bit bigger than optimized code, so the consideration that optimization has a memory cost does not apply to Wasm in the same way it applies to JavaScript (where bytecode is particularly space efficient).
Cheers,
Jakob