In general XLA aims to avoid materializing data when possible - it won't reverse engineer something that looks like a broadcasted i32[3] into an i32[3], but it should be able to avoid duplication in the i32[3]->i32[100,3] direction by throwing away induction variables on lowering (x[i, j] + y[i, j] == > x[j] + y[i, j]).
Also, I should note that this is technically implementation defined and I've only looked into this on the TPU side, but guessing its fairly similar mechanics across the board. Seems like a thing that would bite us quickly if we didn't do it.
Best,
Kevin