Well this path also uses parts of tf2xla path for now :)
Yes and yes (although there is no hook exposed TF side, beyond registering a GraphOptimization pass and running it internally, and support for using this via experimental_compile=true just landed).
No good example that shows the partial state unfortunately - we should add one, good point. I'd say easiest is to build tf-mlir-opt tool and then run `tf-opt -xla-legalize-tf=allow-partial-conversion file.mlir` with a file such as
func @fusedBatchNormGrad_noTraining(%arg0: tensor<8x8x8x8xf32>, %arg1: tensor<8x8x8x8xf32>, %arg2: tensor<8xf32>, %arg3: tensor<8xf32>, %arg4: tensor<8xf32>) -> (tensor<8x8x8x8xf32>) {
%0:5 = "tf.FusedBatchNormGrad"(%arg0, %arg1, %arg2, %arg3, %arg4) {T = "tfdtype$DT_FLOAT", data_format = "NHWC", epsilon = 0.001 : f32, is_training = false} : (tensor<8x8x8x8xf32>, tensor<8x8x8x8xf32>, tensor<8xf32>, tensor<8xf32>, tensor<8xf32>) -> (tensor<8x8x8x8xf32>, tensor<8xf32>, tensor<8xf32>, tensor<8xf32>, tensor<8xf32>)
%1 = "tf.Acosh"(%0#0) : (tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32>
return %1 : tensor<8x8x8x8xf32>
}
(which has an op lowered via a different pass and so would remain, that will change but not for next day :)). Post that I'd suggest running canonicalize to reduce all shape computations (tf-opt -xla-legalize-tf=allow-partial-conversion -canonicalize file.mlir) and then you get
func @fusedBatchNormGrad_noTraining(%arg0: tensor<8x8x8x8xf32>, %arg1: tensor<8x8x8x8xf32>, %arg2: tensor<8xf32>, %arg3: tensor<8xf32>, %arg4: tensor<8xf32>) -> tensor<8x8x8x8xf32> {
%0 = mhlo.constant dense<1.000000e-03> : tensor<8xf32>
%1 = mhlo.add %arg4, %0 : tensor<8xf32>
%2 = "mhlo.rsqrt"(%1) : (tensor<8xf32>) -> tensor<8xf32>
%3 = mhlo.multiply %arg2, %2 : tensor<8xf32>
%4 = "mhlo.broadcast_in_dim"(%3) {broadcast_dimensions = dense<3> : tensor<1xi64>} : (tensor<8xf32>) -> tensor<8x8x8x8xf32>
%5 = mhlo.multiply %arg0, %4 : tensor<8x8x8x8xf32>
%6 = "tf.Acosh"(%5) : (tensor<8x8x8x8xf32>) -> tensor<8x8x8x8xf32>
return %6 : tensor<8x8x8x8xf32>
}
[we recently renamed the dialect to meta HLO/mhlo per request to avoid ambiguity as it has some ops that are not in XLA HLO]. From there its open to users, if you want to execute via TF at the end you could group all the HLO ops into some tf.MyDeviceCompileAndExecute op (where you could even encode the HloProto as string attribute) and execute TF graph as normal.
-- Jacques