--Best,Pulkit
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/CAKsyoMCYKXZHTgtojNesc996NgcaN%2Bb84XSHfEPCrkf7zN621w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/CANq_MgV2520pQJbgcE5O_d7OuL2x%3DwyTspdmjWJmhTfHoQ0j%2BA%40mail.gmail.com.
Hi everyone, we have spent more time exploring TFLite use cases and compatibility requirements. As mentioned in TensorFlow Lite update on StableHLO use, TFLite would like to consume StableHLO as an input source and leverage VHLO versioning facilities. In order to achieve this goal, we propose the following changes to the StableHLO compatibility - please let us know of any feedback you have!
Proposed Compatibility Window
On average, Android phones receive 3 years of OS update guarantee(e.g. Pixel update Policy, update policy from other companies), and such OS updates must not break existing models on-device. Given that, we would like to propose a backward compatibility window of 3 years for StableHLO to support this use case. Additionally, new models that don’t use new features must be runnable on these supported OS, so we would also like to propose a 3 year forward compatibility window. And we would like to work with you to determine when should the compatibility window be extended.
Proposed VHLO Compatibility Guarantees
Up until this point the StableHLO project has focused on compatibility guarantees from the opset perspective, without providing specific guarantees for implementation details like the VHLO dialect. However, on the TFLite side, we found VHLO to be really useful, and we would like to propose to formalize some of its properties - for the most part these properties are already maintained in practice, and this RFC proposes to formally document and maintain them:
VHLO op version number must only change by increment, if and only if there is a change to Operator behavior. (i.e. add_v1 → add_v2).
VHLO ops must not be deleted within the compatibility window.
VHLO ops must always be convertible to StableHLO ops within the compatibility window using machinery maintained in the openxla/stablehlo repository (i.e. not an external tool).
VHLO programs must be roundtrippable with StableHLO ops (an equivalent of today's --vhlo-to-version='target=current' --vhlo-legalize-to-stablehlo --stablehlo-legalize-to-vhlo --vhlo-to-version='target=...'), meaning a VHLO program from an older version must be able to be converted to the StableHLO dialect and returned back to the original version number. This allows running StableHLO passes on an older VHLO program and re-serializing for the original version of that program.
Proposed Documentation Enhancement
For developers that could be interacting directly with this serialized VHLO, we propose a documentation enhancement. Namely, there must be an easy way to access documentation detailing the changes between different versions of the same op.. The exact mechanism can be determined in a follow-up RFC.
Best regards,
Zichuan Wei
Hi everyone, we have spent more time exploring TFLite use cases and compatibility requirements. As mentioned in TensorFlow Lite update on StableHLO use, TFLite would like to consume StableHLO as an input source and leverage VHLO versioning facilities. In order to achieve this goal, we propose the following changes to the StableHLO compatibility - please let us know of any feedback you have!
Proposed Compatibility Window
On average, Android phones receive 3 years of OS update guarantee(e.g. Pixel update Policy, update policy from other companies), and such OS updates must not break existing models on-device. Given that, we would like to propose a backward compatibility window of 3 years for StableHLO to support this use case. Additionally, new models that don’t use new features must be runnable on these supported OS, so we would also like to propose a 3 year forward compatibility window. And we would like to work with you to determine when should the compatibility window be extended.
Proposed VHLO Compatibility Guarantees
Up until this point the StableHLO project has focused on compatibility guarantees from the opset perspective, without providing specific guarantees for implementation details like the VHLO dialect. However, on the TFLite side, we found VHLO to be really useful, and we would like to propose to formalize some of its properties - for the most part these properties are already maintained in practice, and this RFC proposes to formally document and maintain them:
VHLO op version number must only change by increment, if and only if there is a change to Operator behavior. (i.e. add_v1 → add_v2).
VHLO ops must not be deleted within the compatibility window.
VHLO ops must always be convertible to StableHLO ops within the compatibility window using machinery maintained in the openxla/stablehlo repository (i.e. not an external tool).
VHLO programs must be roundtrippable with StableHLO ops (an equivalent of today's --vhlo-to-version='target=current' --vhlo-legalize-to-stablehlo --stablehlo-legalize-to-vhlo --vhlo-to-version='target=...'), meaning a VHLO program from an older version must be able to be converted to the StableHLO dialect and returned back to the original version number. This allows running StableHLO passes on an older VHLO program and re-serializing for the original version of that program.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/9f500293-2898-460b-928c-b3acf23330f8n%40openxla.org.
For on-device use cases, the server producing the IR tends to be more frequently updated than the on-device consumer. In order for the newer version of the on-server compiler to generate portable artifacts that can be parsed correctly by the on-device consumer, we will need the downgrade capability.
In addition, there is also a difference between when an artifact is created and when additional optimization is performed, e.g. a float model created a year ago needs to be quantized today, and the quantization passes are only written at the head. Then we first need to upgrade the model to the latest head so passes can be executed correctly.
I want to point out that these use cases are not unique to on-device, as the current stablehlo compatibility has already been providing such guarantees and we’re simply proposing to extend the guarantee to 3 years:
“Portable artifacts serialized by a new version of libStablehlo have the same semantics when deserialized by an old version of libStablehlo if these versions are built from openxla/stablehlo commits which are less than 1 month apart, unless the program is using new features introduced since the old version.”
As long as no new features are introduced during the IR upgrade, I think it’s reasonable for the user to be able to downgrade the model back to the original version. This allows for authoring passes on the latest opset, instead of on VHLO directly, and passes which don’t introduce new features can still leverage StableHLO compatibility guarantees.
Hi all, Today TFLite offers open ended compatibility, and we are aware of some android apps that are shipping model assets created more than 4 years ago. (e.g. mobilenetV3 remains to be very popular) We propose to extend the backward compatibility window of 5 years. Additionally, the TFlite team will follow up by soliciting feedback from their developer community, as well as exploring other mechanisms to meet their developers platform stability requirements and assess if there is any scope to further refine the compatibility window in a future RFC.
Reflecting on the community feedback on the round-trip ability, this is not a requirement for TFLite. We do propose to extend the forward compatibility requirement to 2 years, in order to support the TFLite and other community members on an annual release cycle.
All other proposed features remain the same:
VHLO op version number must only change by increment, if and only if there is a change to Operator behavior. (i.e. add_v1 → add_v2).
VHLO ops must not be deleted within the compatibility window.
VHLO ops must always be convertible to StableHLO ops within the compatibility window using machinery maintained in the openxla/stablehlo repository (i.e. not an external tool).
Thanks for the RFC Zichuan / Pulkit!
Overall I’m supportive of this RFC: The compatibility window is based on existing known use cases for compatibility, it enables important use cases within the OpenXLA ecosystem (mobile deployment), there are a other community members on similar annual update cycles who can leverage these guarantees, and we now have over a year of experience with evolving the opset with forward/backward compatibility guarantees with maintenance costs proving to be fairly low. I’ve spent a bit of time thinking / iterating within our team on maintenance costs, evolution implications, and potential alternatives. A brief summary:
The maintenance cost boils down to maintaining an ever-growing VHLO opset, along with the MLIR passes and tests, including:
Maintain the VHLO opset, which grows for all StableHLO opset changes.
Maintain IR upgrade / downgrade patterns, which grow at about the same pace as VHLO.
Applying upstream MLIR changes to the VHLO opset (ex: properties).
Maintaining compatibility tests for all versions within the compatibility window.
The evolution cost is another potential concern. I don't foresee many changes to opset evolution with extended compatibility, nor changes to the review process. We aim to provide means of experimenting / escape hatches, and an RFC review process for standardizing useful features. Although we currently guarantee 1mo forward and 6mo backward compatibility, in practice we have >1yr forward/backward compatibility today, which I don’t believe has greatly hindered evolution. Back when authoring the initial StableHLO Compatibility RFC, I went through the MHLO dialect history and at that time, in the ~3yrs I’d looked at, there were no opset changes that would have required breaking compatibility. If push came to shove we could add something like V2 ops/attrs/types to StableHLO to keep evolving, but we should do our best to avoid that. Overall we may incur some tech debt, but given out current experience, it will likely be very manageable.
We also explored several alternatives, almost all of which push the burden of maintenance to on-device users, as deserialization is where the compatibility issues are likely to occur. It seems likely that on-device compilers will want to use StableHLO with similar compatibility guarantees, and pushing the maintenance elsewhere will have additional costs on the entire on-device ecosystem, likely amounting to a similar amount of maintenance on StableHLO maintainers, on-device deployment teams, and on-device compilers. Given that, I'm on board with proceeding with this RFC, as all of this amounts to a very reasonable cost to enable on-device StableHLO.
Interested in any feedback! Next step otherwise is to discuss when extended compatibility should kick in.
Best,
Kevin