Hello,
A point to note here is that there is symmetry in your cost function. For example, the transformation X(:, :, 1) -----> X(:, :, 1)*Q does not change the cost function for all orthogonal matrices of d-by-d, Q'*Q = Q*Q' = I. Similarly for X(:, :, 2). Therefore, the optimization should be on the product of the Grassmann manifolds and not on the product of the Stiefel manifolds.
Attached is the code for your cost function.
Regards,
BM