Hi Chris,
In order to simplify these further you'll need to break down U1 and U2 so that they are of compatible shape with D1 and the 0-matrix. At the moment there is no way to take these larger matrices and simplify them against smaller ones. This mismatch would be more evident if we had size-sensitive printing. For example seeing the U_1^T on the right as a big solid block rather than a single letter would make this mismatch easier to spot.
One way to cut the matrices into smaller blocks is with blockcut. The inputs are input matrix, desired row sizes, desired block sizes. Output is a blockmatrix of MatrixSlice objects
U1 = blockcut(U1, (n1,), (n1, n2))
U2 = blockcut(U2, (n2,), (n1, n2))
They look like this
In [13]: U1
Out[13]: [U₁[:n₁, :n₁] U₁[:n₁, n₁:n₁ + n₂]]
After you do this (and go through the rest of your pipeline) then block_collapse reduces things cleanly to a 2x2 block matrix. All of the indexing can get in the way of clean printing unfortunately. You might consider creating MatrixSymbols U11, U12, U21, U22 explicitly.
Best,
-Matt