Upgrade to PETSc 3.19.1 causes failure in TS regression test

46 views
Skip to first unread message

Hammond, Glenn E

unread,
May 22, 2023, 11:46:07 AM5/22/23
to pflotran-dev (pflotran-dev@googlegroups.com)
Heeho and Satish,

PETSc added a bug fix to v3.19.1 that alters the convergence of line search:

https://gitlab.com/petsc/petsc/-/commit/51dbb4743480cffa8ff1aea6c2d5dbc193819d06#58ab43bc63db1d6d2d6f8c8881d95cf981ffd3d2_1098_1097

This fix causes th_ts_1d to fail when CI is upgraded from v3.19.0 to v3.19.1:

https://gitlab.com/pflotran/pflotran/-/jobs/4318504022/artifacts/file/test-pflotran/regression_tests/pflotran-tests-2023-05-20_00-07-39.testlog

Run...
cd /scratch/regression_tests/default/anisothermal
/scratch/src/pflotran/pflotran -malloc_debug no -successful_exit_code 86 -input_prefix th_ts_1d
# th_ts_1d : run time : 0.41 seconds

ERROR : th_ts_1d : pflotran returned an error code (15) indicating that the
simulation crashed. Please check 'th_ts_1d.out' and 'th_ts_1d.stdout' for
error messages (included below).
...
== TH_TS FLOW ==================================================================
0 2r: 5.67E-05 2x: 0.00E+00 2u: 0.00E+00 ir: 5.02E-05 iu: 0.00E+00 rsn: 0
1 2r: 5.62E-05 2x: 1.06E+06 2u: 5.67E+03 ir: 4.98E-05 iu: 4.92E+03 rsn: 0
2 2r: 5.57E-05 2x: 1.06E+06 2u: 5.62E+03 ir: 4.93E-05 iu: 4.88E+03 rsn: 0
3 2r: 5.53E-05 2x: 1.06E+06 2u: 5.57E+03 ir: 4.89E-05 iu: 4.83E+03 rsn: 0
4 2r: 5.48E-05 2x: 1.07E+06 2u: 5.53E+03 ir: 4.85E-05 iu: 4.79E+03 rsn: 0
WARNING: Potential oscillatory convergence
5 2r: 5.43E-05 2x: 1.07E+06 2u: 5.48E+03 ir: 4.81E-05 iu: 4.75E+03 rsn: 0
WARNING: Potential oscillatory convergence
6 2r: 5.38E-05 2x: 1.07E+06 2u: 5.43E+03 ir: 4.76E-05 iu: 4.71E+03 rsn: 0
WARNING: Potential oscillatory convergence
...
85 2r: 2.70E-05 2x: 1.19E+06 2u: 2.72E+03 ir: 2.39E-05 iu: 2.36E+03 rsn: 0
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery
[0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc!
[0]PETSC ERROR: Option left: name:-input_prefix value: th_ts_1d source: command line
[0]PETSC ERROR: Option left: name:-successful_exit_code value: 86 source: command line
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.19.1, unknown
[0]PETSC ERROR: /scratch/src/pflotran/pflotran on a petsc-arch named buildkitsandbox by Unknown Sat May 20 00:08:41 2023
[0]PETSC ERROR: Configure options PETSC_ARCH=petsc-arch --with-cc=/scratch/mpich-4.1/install/bin/mpicc --with-cxx=/scratch/mpich-4.1/install/bin/mpicxx --with-fc=/scratch/mpich-4.1/install/bin/mpif90 --COPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0 -Wno-unused-function" --with-clanguage=c --with-debugging=1 --with-shared-libraries=0 --download-hdf5 --download-metis --download-parmetis --download-fblaslapack --download-hypre --with-hdf5-fortran-bindings=yes
[0]PETSC ERROR: #1 TSStep() at /scratch/petsc/src/ts/interface/ts.c:3470
[0]PETSC ERROR: #2 TSSolve() at /scratch/petsc/src/ts/interface/ts.c:3845
[0]PETSC ERROR: #3 /scratch/src/pflotran/timestepper_TS.F90:229
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: No error traceback is available, the problem could be in the main program.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: PMIU_write error; fd=-1 buf=:cmd=abort exitcode=59 message=application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
:
system msg for write_line failure : Bad file descriptor
----------------------------------------------------------------------------------------------------

When I run the test problem through the debugger, the maximum step with the updated scaling by linesearch->fnorm is ~5000, whereas before maxstep was on the order of 100K to 1M.

Do either of you have thought regarding a solution? Perhaps begin the simulation with a smaller time step?

Glenn

heeho...@gmail.com

unread,
May 24, 2023, 4:11:04 PM5/24/23
to pflotran-dev
Yes. I was seeing this error too. that's why I pushed my commit with 3.19.0. I will have to use debugger to look into this further.
Reply all
Reply to author
Forward
0 new messages