Hi Tomasz,
I can't address all of your questions but I can say that XMALab doesn't use DLT for the calibration. DLT is an older technique that allows for easy transformation between 2D and 3D but it is incompatible with the virtual cameras in 3D modeling software (e.g., Maya, Blender).
Instead of DLT, XMALab uses a standard computer vision camera model with internal parameters (focal length and principal point in x and y) and external parameters (rotation and position of the camera). This is compatible with the virtual cameras in 3D modeling software. There is also an option to add distortion parameters.
I can't speak to the mismatch between the estimated and real-world locations of cameras / X-ray sources. When I've done something similar in the past, the output camera positions and orientations did not match where my cameras "actually were" either. I'm guessing the estimated camera positions and orientations are more like theoretical virtual cameras matching the camera model parameters, not the actual cameras since that might require knowing additional parameters about the camera itself (e.g., the type of lens, type of camera, etc.).
Hope that helps!
Aaron