trouble with floating point divide by zero in libmesh, petsc, lapack, gesvd (ieeeck?)

197 views
Skip to first unread message

Andrew....@csiro.au

unread,
Sep 8, 2014, 7:00:13 AM9/8/14
to moose...@googlegroups.com
This is rather a long shot that anyone will be able to help me.  I’m using petsc’s lapack gesvd to calculate the singular values of a matrix.  Libmesh bombs out with a floating point divide by zero.  I set a breakpoint at libmesh_handleFPE, and lldb gives a backtrace starting with:

* thread #1: tid = 0x481c1, 0x00000001051d7446 libpetsc.dylib`ieeeck_ + 166, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_SSEEXTERR, subcode=0x1f37)

  * frame #0: 0x00000001051d7446 libpetsc.dylib`ieeeck_ + 166

    frame #1: 0x00000001051d85f7 libpetsc.dylib`ilaenv_ + 4419

    frame #2: 0x00000001051af80f libpetsc.dylib`dlasq2_ + 1535

    frame #3: 0x00000001051af161 libpetsc.dylib`dlasq1_ + 775

    frame #4: 0x00000001051729c4 libpetsc.dylib`dbdsqr_ + 642

    frame #5: 0x0000000105186821 libpetsc.dylib`dgesvd_ + 31985

    frame #6: 0x0000000100166b0e libtensor_mechanics-dbg.0.dylib`FiniteStrainMultiPlasticity::singularValuesOfR(std::vector<RankTwoTensor, std::allocator<RankTwoTensor> > const&, std::vector<double, std::allocator<double> >&) + 1646


It’s rather strange, because I’m pretty confident that calling gesvd with identical arguments has already succeeded in the same simulation.  If you look at the “ieeeck” routine, it’s all about checking that 1/0 is “correctly” calculated, and similar things to that.  Anyway, as I said, this is a long shot, but perhaps someone can help….

a

Jed Brown

unread,
Sep 8, 2014, 10:05:42 AM9/8/14
to Andrew....@csiro.au, moose...@googlegroups.com
Andrew....@csiro.au writes:
> It's rather strange, because I'm pretty confident that calling gesvd
> with identical arguments has already succeeded in the same simulation.
> If you look at the "ieeeck" routine, it's all about checking that 1/0
> is "correctly" calculated, and similar things to that. Anyway, as I
> said, this is a long shot, but perhaps someone can help....

When PETSc calls functions that do this, we disable the FP trapping to
work around this (presumably unintended) side-effect of LAPACK.

Cody Permann

unread,
Sep 8, 2014, 10:08:14 AM9/8/14
to <moose-users@googlegroups.com>, Andy Wilkins
So we turn on all floating point trapping in debug mode which catches even normal floating point operations (things like infinity) or intentional divisions by zero.  In MOOSE if you don't want to trap these operations, simply use "--no-trap-fpe" on the command line.

Cody

Jed Brown

unread,
Sep 8, 2014, 10:27:45 AM9/8/14
to Cody Permann, <moose-users@googlegroups.com>, Andy Wilkins
Cody Permann <codyp...@gmail.com> writes:

> So we turn on all floating point trapping in debug mode which catches even
> normal floating point operations (things like infinity) or intentional
> divisions by zero. In MOOSE if you don't want to trap these operations,
> simply use "--no-trap-fpe" on the command line.

That prevents being able to trap the interesting ones. This is why
PETSc disables trapping only for the troublesome operation, restoring it
afterward.

Andrew....@csiro.au

unread,
Sep 8, 2014, 4:33:11 PM9/8/14
to j...@jedbrown.org, codyp...@gmail.com, moose...@googlegroups.com
Aha, thank you Cody and Jed. Now my code runs fine in opt and debug. I
do think what Jed says below makes sense, and I¹ll leave it to you devels
to ponder over this issue and address it or not. (The particular function
call is LAPACKgesvd_("N", "N", &bm, &bn , &a[0], &bm, &s[0], &u[0],
&sizeu, &vt[0], &sizevt, &work[0], &sizework, &info) , provided by
petscblaslapack.h)

a
Reply all
Reply to author
Forward
0 new messages