Issue with Multi-process Calculation Correctness for the heev Interface

14 views
Skip to first unread message

韩梅

unread,
Dec 6, 2024, 10:58:03 AM12/6/24
to SLATE User
Hi,Mark,

When testing the heev interface using the test program, with 4 processes, dim=1000, and nb=384 as the test case parameters, the results were incorrect. Specific details are as follows:
[mshuangzd@b08r3n11 test]$ mpirun -n 4 ./tester --dim 1000 --nb 384 --ref y heev
% SLATE version 2023.11.05, id f1c8490
% input: ./tester --dim 1000 --nb 384 --ref y heev
% 2024-12-06 23:50:27, 4 MPI ranks, CPU-only MPI, 1 OpenMP threads, 4 GPU devices per MPI rank
                                                                                                                                                   
type  origin  target  eig   A   jobz    uplo       n    nb  ib    p    q  la  pt  value err   back err    Z orth.   time (s)  ref time (s)  status  
   d    host    task   dc   1  novec   lower    1000   384  32    2    2   1   1   3.61e-02         NA         NA      2.913         0.326  FAILED  

% Matrix kinds:
%  1: rand, cond unknown

% 1 tests FAILED: heev
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50583,1],0]
  Exit code:    1
--------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages