I strongly disagree with this assessment, and I think that committing to this approach is what will make the difference in preventing AI developers from adopting the local models for anything beyond demos and little toy applications.
To be clear: It's absolutely the right call to approach it this way from an API perspective. There should be no changes to the API unless there needs to be some new mode of input or output (e.g., if we were to start sending videos to the model and getting videos in response, etc.).
However, with regard to meaningful AI application behavior, a different model is not merely an "implementation detail" (and intuitively, this is a little bit obvious if you ask yourself: "Why are we even bothering to build different models if we expect to see no substantial difference in behavior?").
As an AI developer, I run evaluations on a model-by-model basis. The applications I build at work using OpenAI and Anthropic's APIs are aimed at a specific dated snapshot of a model checkpoint. They are not aimed broadly at "whatever the latest rolling version of GPT/Claude is".
From the perspective of an AI applications developer, every single change to a model is potentially a breaking change. Model modifications are holistic rather than carefully scoped, and the effects of changes on downstream applications are not fully known to either the vendor or the developer.
Without being able to commit to releasing information about specific models (and pertinent information about those models), developers won't be able to build reliable, useful, sophisticated systems around them.
The probabilistic nature of LLMs already makes them a "moving target" from the perspective of those of us building applications around these systems; This is a double-edged sword, and we work around the downsides. But if you compound the issues by pushing hidden rolling updates as though it were a product for end-users (rather than as an API for developers), it should be unsurprising that it's too unpredictable for developers to adopt.