Limits to XLA variable de-capture

Joel Berkeley

unread,

Jun 14, 2026, 9:28:07 PMJun 14

to OpenXLA Discuss

Hi,

XLA can't do variable capture for higher-order functions, but it can lift values into the necessary scope if it's valid. For example, in the pseudo-code

```

main () {

%0 tensor<1.0>

%1tensor<2.0>

%2 map(%0) (x => x + %0) %1

return %2

}

```

it can move %0 into the function `x => x + %0`, removing variable capture. What are the limits on this? I've found this code is rejected, even though it's straightforward to do the above with it

```

module @root {
func.func @main() -> tensor<f64> {
%cst = stablehlo.constant dense<2.000000e+00> : tensor<f64>
%cst_0 = stablehlo.constant dense<1.000000e+00> : tensor<f64>
%cst_1 = stablehlo.constant dense<3.000000e-01> : tensor<1xf64>
%cst_2 = stablehlo.constant dense<0.000000e+00> : tensor<f64>
%0 = stablehlo.broadcast_in_dim %cst_2, dims = [] : (tensor<f64>) -> tensor<f64>
%1 = stablehlo.reduce(%cst_1 init: %0) applies stablehlo.add across dimensions = [0] : (tensor<1xf64>, tensor<f64>) -> tensor<f64>
%2 = stablehlo.while(%iterArg = %cst) : tensor<f64>
cond {
%3 = stablehlo.compare GT, %iterArg, %cst_0 : (tensor<f64>, tensor<f64>) -> tensor<i1>
stablehlo.return %3 : tensor<i1>
} do {
%3 = "stablehlo.map"(%iterArg) <{dimensions = array<i64>}> ({
^bb0(%arg0: tensor<f64>):
%4 = stablehlo.subtract %arg0, %1 : tensor<f64>
stablehlo.return %4 : tensor<f64>
}) : (tensor<f64>) -> tensor<f64>
stablehlo.return %3 : tensor<f64>
}
return %2 : tensor<f64>
}

}

```

Joel Berkeley

unread,

Jun 14, 2026, 9:38:30 PMJun 14

to OpenXLA Discuss, Joel Berkeley

this is XLA commit 65f49e0e74ffdbfc9f475dec50607f35d368bd32

Kevin Gleason

unread,

Jun 26, 2026, 11:26:59 AMJun 26

to Joel Berkeley, OpenXLA Discuss

Hello!

I believe the constant capture happens here, the rule is mostly "must be ConstantOp to be captured" since it involves cloning the IR into the body:
https://github.com/openxla/xla/blob/cc3b30d5c7a6b75bbc2432977d14842967eb1386/xla/mlir_hlo/stablehlo_ext/transforms/stablehlo_prepare_for_hlo_export.cpp#L117

Exceptions are listed here, since these ops instead capture values as new operands to the op instead of cloning so there are less restrictions on what can be captured:
https://source.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/mlir_hlo/mhlo/transforms/prepare_for_export/prepare_for_export.cc;rcl=918286383;l=156

These ops instead capture during HLO lowering like this (search file for "implicit_operand_set" for the other instances):
https://github.com/openxla/xla/blob/cc3b30d5c7a6b75bbc2432977d14842967eb1386/xla/hlo/translate/mhlo_to_hlo/mlir_hlo_to_hlo.cc#L2003

Best,

Kevin

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/5c62da25-9947-47e5-8e97-6f35470c50f6n%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Joel Berkeley

unread,

Jun 26, 2026, 11:56:35 AMJun 26

to OpenXLA Discuss, Kevin Gleason, OpenXLA Discuss, Joel Berkeley

Thanks.

No need to edit for my purposes, as I understand the situation, but that second link is behind a corporate login

Joel Berkeley

unread,

Jun 26, 2026, 12:18:02 PMJun 26

to OpenXLA Discuss, Joel Berkeley, Kevin Gleason, OpenXLA Discuss

Curious that it clones the IR. There will be cases where simply moving it into the body would work. I don't even imagine that being particularly difficult, since one could move it naively, then use standard MLIR tooling to check if the resulting graph makes sense. I suppose a more nuanced approach would be a combination of move and clone. Perhaps iteratively: move one op, check IR is correct, if it's not, clone instead, repeat .... Obviously a more efficient algorithm would exist.

I'll may end up implementing this in my higher-level IR.

Reply all

Reply to author

Forward