Good day,
sure, but the real example is really big.
I'm running into it when running test "test_forward_mode_AD_linalg_det_singular_cpu_float64" from file "test/test_ops_fwd_gradients.py" from pytorch (
https://github.com/pytorch/pytorch/).
When inputs are following:
(Pdb) print(inputs)
(tensor([[[ 0.7286, -0.7828, 0.8605, -1.0006, 1.5612],
[ 0.4191, -0.6455, 0.4113, -0.9425, 1.5177],
[ 0.3450, -0.4250, 0.5090, -0.5508, 0.8464],
[ 0.8105, -0.1038, -0.0516, 0.0171, 0.7254],
[-0.5127, -0.0989, -0.7508, -0.4660, 0.4803]],
[[-0.6177, 0.6802, 0.2151, -0.7662, -0.8119],
[-0.2069, -0.2237, -0.0123, 0.9310, -0.1486],
[ 0.4840, 0.4670, 0.8628, -0.3466, 0.1116],
[-0.2402, -0.2788, -0.1812, 0.1283, -0.1653],
[ 0.2426, 0.3254, 0.0861, -0.8608, 0.1582]]], dtype=torch.float64,
requires_grad=True),)
And expected results are following:
(Pdb) print(numerical_vJu)
[[tensor([-0.0134, -0.0296], dtype=torch.float64)]]
On s390x I'm getting different results:
(Pdb) print(analytical_vJu)
((tensor([-0.0000, -0.0296], dtype=torch.float64),),)
Here's code of function:
I've tracked it down the call stack and difference started with call to openblas function dgetrf_ I've used as base for example I've posted in initial post. Some memory might be uninitialized before call, but to me it didn't look like it's the source of current issue.
Kind regards,
Aleksei Nikiforov