If the size n of the matrices is not too large: one naive approach is to set up the system of n^2 quadrics in n^2 variables, and try to find rational points. Here's some proof-of-concept code (for which I make no claims regarding its optimization):
squareRootsMatrix = A -> (
n := numrows A;
x := getSymbol "x";
R := (ring A)(monoid[x_(1,1)..x_(n,n)]);
X := genericMatrix(R,n,n);
I := ideal(X^2 - A);
apply(select(minimalPrimes I, p -> degree p == 1), p -> X % p)
)
A = matrix{{4_(ZZ/11),-1,4,3,4,5},{0,5,1,5,-5,3},{0,0,-2,0,4,-3},{0,0,0,1,-5,1},{0,0,0,0,4,-1},{0,0,0,0,0,1}}
elapsedTime squareRootsMatrix A
Brief testing indicates practical runtimes up to size n = 6. Of course one can try and conjugate the input matrix A beforehand - generally, the sparser the better. (Note that not every matrix is similar to a triangular matrix - indeed, this happens if and only if the characteristic polynomial splits into (not necessarily distinct) linear factors over the field.)
Justin