super lu

664 views
Skip to first unread message

Alexander Lindsay

unread,
Sep 15, 2015, 11:53:13 AM9/15/15
to moose-users
Hi all,

Does anyone on the list serve have much experience with using super lu? (I suppose that I could also send this to the petsc list serve). I'm repeatedly encountering the following behavior:

Time Step 121, time = 2.40013e-07
                dt
= 9e-10
   
|residual|_2 of individual variables:
                  potential
: 43259.3
                  em
:        0.0322282
                 
Arp:       0.00186253

 
0 Nonlinear |R| = 4.325927e+04
 
****** Warning from MC64A/AD. INFO(1) =  2
       
Some scaling factors may be too large.
   
|residual|_2 of individual variables:
                  potential
: 0.00626003
                  em
:        0.00197176
                 
Arp:       3.50658e-05

 
1 Nonlinear |R| = 6.563314e-03
   
|residual|_2 of individual variables:
                  potential
: 1.87338e-09
                  em
:        2.12766e-05
                 
Arp:       2.92495e-08

 
2 Nonlinear |R| = 2.127659e-05
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2
 
Solve Converged!

Outlier Variable Residual Norms:
  em
: 2.127657e-05

Time Step 122, time = 2.41093e-07
                dt
= 1.08e-09
   
|residual|_2 of individual variables:
                  potential
: 51886.9
                  em
:        0.0323299
                 
Arp:       0.00186719

 
0 Nonlinear |R| = 5.188692e+04
 
****** Warning from MC64A/AD. INFO(1) =  2
       
Some scaling factors may be too large.
Nonlinear solve did not converge due to DIVERGED_LINE_SEARCH iterations 0
 
Solve Did NOT Converge!

Solve failed, cutting timestep.


Two questions:

1) I glanced at some super lu source code, trying to figure out what would cause the "Some scaling factors may be too large" warning, but it wasn't immediately clear to me why that warning would get called.

2) Good strategies for combating DIVERGED_LINE_SEARCH?




Cody Permann

unread,
Sep 15, 2015, 12:14:54 PM9/15/15
to moose-users
Which version of PETSc are you using? Also have you verified that your simulation works up to that point just using serial lu? The reason I ask is that we are currently investigating a separate issue with the latest version of SuperLU that was bundled with PETSc in version 3.6.x.

Cody



--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.
Visit this group at http://groups.google.com/group/moose-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/moose-users/2a3e8b97-1232-4d00-adca-6408de1b7020%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Lindsay

unread,
Sep 15, 2015, 12:15:36 PM9/15/15
to moose-users
I can probably answer my first question...the warning seems to come before the first non-linear iteration when the residual of the potential is quite large. However, if there aren't any problems with the line search, the residual of the potential is reduced by orders of magnitude after the first non-linear iteration, and then the warning no longer appears for subsequent non-linear iterations.

I'm torn about my scaling of the potential: if the problem converges for a given time step, then the final residual of the potential is quite small, smaller than the residuals of my other variables. However, I'm wondering whether the initially large residual of the potential is causing the problems for time steps where convergence is not achieved because of diverged_line_search

Alexander Lindsay

unread,
Sep 15, 2015, 12:18:54 PM9/15/15
to moose-users
The problem solves fantastically using serial lu.

Petsc version is 3.6.0

Cody Permann

unread,
Sep 15, 2015, 2:14:21 PM9/15/15
to moose-users
This is a shot in the dark but you might try the previous version of PETSc/SuperLU to see if you get the same behavior. If you have the MOOSE package you should be able to swap out the module and rebuild everything from libMesh up after running this command:

module swap moose/.mpich_petsc-3.6.0-clang moose/.mpich_petsc-3.5.4-clang

Cody

Alexander Lindsay

unread,
Sep 16, 2015, 11:58:34 AM9/16/15
to moose...@googlegroups.com
Cody, I will look into trying that, although when I run my case on my laptop, which uses petsc-3.5.2, I encounter the exact same problems.

I posted my issues on the petsc serve, and one recommendation was to try MUMPS, another direct matrix solver capable of running in parallel, instead of super LU. Testing around, it doesn't look like MUMPS comes with the petsc that is bundled in the MOOSE deb package. I'm going to look into the possibility of configuring/installing MUMPS. If anyone has any helpful suggestions, they ware welcome.

Alex
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

Cody Permann

unread,
Sep 16, 2015, 5:31:02 PM9/16/15
to moose...@googlegroups.com
We are looking into adding MUMPS into our PETSc build. Stay tuned.

Alexander Lindsay

unread,
Sep 17, 2015, 10:27:00 AM9/17/15
to moose...@googlegroups.com
After building petsc with mumps and rebuilding moose, I was able to solve my problem beautifully with mumps without changing any other input from the super-lu case. Go mumps!

Alex

Cody Permann

unread,
Sep 17, 2015, 11:34:40 AM9/17/15
to moose...@googlegroups.com
Great news! We plan to add Mumps to PETSc libraries in our next MOOSE package so you'll have more options in the future.

John

unread,
Jan 3, 2018, 6:31:10 PM1/3/18
to moose-users
Alex,

What were all the executioner options you used to get this to converge? Because I am having the same issue

Time Step 201, time = 517.675
                dt = 0.0125829
 0 Nonlinear |R| = 7.227210e-10

 ****** Warning from MC64A/AD. INFO(1) =        2
        Some scaling factors may be too large.
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
 Solve Did NOT Converge!

John

Alexander Lindsay

unread,
Jan 4, 2018, 10:42:08 AM1/4/18
to moose...@googlegroups.com
Wow, this was a long time ago...So you saw that error with super-lu? Have you tried with MUMPS?

To unsubscribe from this group and stop receiving emails from it, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/moose-users/55e41790-16e6-46ce-9aeb-40e3d524b96f%40googlegroups.com.

John Haase

unread,
Jan 4, 2018, 10:49:46 AM1/4/18
to moose...@googlegroups.com
Alex,

Here is my executioner block

[Executioner]
type = Transient
end_time = ${EndTime} # ${/ 1e-3 ${time_units}}

petsc_options = '-snes_ksp_ew -superlu_dist -snes_converged_reason -snes_linesearch_monitor'
solve_type = NEWTON
petsc_options_iname = '-pc_type -pc_factor_mat_solver_package -snes_linesearch_minlambda -ksp_gmres_restart'
petsc_options_value = 'lu superlu_dist 1e-3 100'

nl_rel_tol = 5E-3
nl_abs_tol = 1E-14

dtmin = ${/ 1E-19 ${time_units}}
dtmax = ${/ ${onTime} 50 }
nl_max_its = 40
[./TimeStepper]
type = IterationAdaptiveDT
dt = 0.004096
cutback_factor = 0.8
growth_factor = 1.2
optimal_iterations = 25
[../]
[]

Regards,

John R. Haase

To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.

John Haase

unread,
Jan 4, 2018, 11:00:47 AM1/4/18
to moose...@googlegroups.com
I'm not sure what you mean by am I using MUMPS...

I have "download-mumps=1" in my petsc install command.

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 4, 2018, 11:09:55 AM1/4/18
to moose...@googlegroups.com
Ok good, that means you should have it available. MUMPS is just another parallel direct solver package.

So try changing your petsc options to:

petsc_options_iname = '-pc_type -pc_factor_mat_solver_package -snes_linesearch_minlambda -ksp_gmres_restart'
petsc_options_value = 'lu mumps 1e-3 100'

John Haase

unread,
Jan 4, 2018, 11:26:38 AM1/4/18
to moose...@googlegroups.com
Alex,

I get a different error message

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-1, INFO(2)=7

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.6.4, Apr, 12, 2016
[0]PETSC ERROR: /afs/crc.nd.edu/user/j/jhaase1/zapdos/zapdos-opt on a arch-linux2-c-opt named crcfe02.crc.nd.edu by jhaase1 Thu Jan  4 11:20:19 2018
[0]PETSC ERROR: Configure options --prefix=/afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/petsc-3.6.4/gcc-opt --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-scalapack=1 --download-mumps=1 CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 CFLAGS="-fPIC -fopenmp" CXXFLAGS="-fPIC -fopenmp" FFLAGS="-fPIC -fopenmp" FCFLAGS="-fPIC -fopenmp" F90FLAGS="-fPIC -fopenmp" F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4
[0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() line 1172 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/mat/impls/aij/mpi/mumps/mumps.c
[0]PETSC ERROR: #2 MatLUFactorNumeric() line 2958 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 PCSetUp_LU() line 152 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: #4 PCSetUp() line 983 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #5 KSPSetUp() line 332 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #6 KSPSolve() line 547 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 233 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/snes/impls/ls/ls.c
[0]PETSC ERROR: #8 SNESSolve() line 3906 in /afs/crc.nd.edu/user/j/jhaase1/ZapdosDir/petsc/temp/petsc-3.6.4/src/snes/interface/snes.c

 

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 5, 2018, 1:00:46 PM1/5/18
to moose...@googlegroups.com
Did the error occur on the same time step?

John Haase

unread,
Jan 5, 2018, 1:04:37 PM1/5/18
to moose...@googlegroups.com
Approximately. I'm trying the SMP preconditioner with a FD solve type. But it's not a viable option because I've been on the first time step for 30 minutes

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 5, 2018, 1:13:05 PM1/5/18
to moose...@googlegroups.com
Is it possible for you to use a newer Petsc version? Not sure what version of SuperLU is built by 3.6.4 but I know there were some fairly recent bug fixes in SuperLU that may have resolved the issue you're seeing. Have you tried an iterative solver as opposed to direct? I know our plasma systems often require a very strong preconditioner.

I believe an FD solve type will form both your Jacobian and preconditioning matrices with finite differencing and it will not use coloring, so it does not surprise me that it is extremely slow (your own Jacobian statements will never get called with an FD solve type).

Are you worried that the Jacobian is wrong?

Alex



John Haase

unread,
Jan 5, 2018, 2:47:29 PM1/5/18
to moose...@googlegroups.com
I'll try updating to v3.7.7 to see if that fixes it.

PJFNK fails the same time as Newton. The problem is a DIVERGED_LINEAR_SOLVE that occurs when there is an electric field of 0 at the cathode.

I don't believe my Jacobians are wrong, I was just trying all the levers I can think of. And I know that going to FD is an "in case of emergency break glass" option.

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 5, 2018, 3:16:11 PM1/5/18
to moose...@googlegroups.com
Is the electric field non-zero before that failing time step?

Can you run with `-ksp_monitor_singular_value -ksp_monitor_true_residual` and paste in the solve (non-linear and linear solve output) for the failing time step when running with PJFNK?

Alexander Lindsay

unread,
Jan 5, 2018, 3:16:44 PM1/5/18
to moose...@googlegroups.com
And also maybe solve information for a time step that succeeds

John Haase

unread,
Jan 5, 2018, 4:00:19 PM1/5/18
to moose...@googlegroups.com
Alex,

The electric field is zero before that, just not at the cathode. I've attached the output for the last successful and final time step. Also, here are my PetSc options


petsc_options = '-snes_ksp_ew -superlu_dist -snes_converged_reason -snes_linesearch_monitor -ksp_monitor_singular_value -ksp_monitor_true_residual'
solve_type = PJFNK # NEWTON

petsc_options_iname = '-pc_type -pc_factor_mat_solver_package -pc_factor_shift_type -pc_factor_shift_amount -snes_linesearch_minlambda -ksp_gmres_restart'
petsc_options_value = 'lu mumps NONZERO 1.e-10 1e-3 100'




Regards,

John R. Haase

Output.txt

John Haase

unread,
Jan 5, 2018, 6:19:51 PM1/5/18
to moose...@googlegroups.com
Alex,

I crept up on the failure point and then ran forward in gdb. Running in gdb, I just get "Solve did not converge!"

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 8, 2018, 9:34:13 AM1/8/18
to moose...@googlegroups.com
This looks to me to just be a tough solve. All your variable residuals with the exception of your ions are around your solves absolute tolerance. Have you tried without a line search?

How many DOFs do you have in your problem? If it's not a lot, I'd be curious if you could try running with NEWTON and with '-pc_type svd -pc_svd_monitor' and see whether you can get to your failure point. If you can, I'm wondering what your condition number looks like with such small residuals. (Don't run this if you have a large problem...you'll run out of memory)

'-ksp_monitor_singular_value' is useful for estimating condition numbers of large systems but it's only useful when running with `-pc_type none`, and if we ran with that option, I suspect you wouldn't get out to 507 seconds.

John Haase

unread,
Jan 8, 2018, 10:25:03 AM1/8/18
to moose...@googlegroups.com
Alex,

I haven't tried without a line search. Just to make sure I'm looking at the right thing, is that "Executioner/line_search = none"?

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 8, 2018, 12:04:29 PM1/8/18
to moose...@googlegroups.com

John Haase

unread,
Jan 8, 2018, 5:43:06 PM1/8/18
to moose...@googlegroups.com
Alex,

Here are the results of all these fun experiments
  • Without a line search the simulation fails at 517.164 ns.
  • Doing SVD it's been sitting at time step 1, 0th nonlinear iteration for 2+ hours (starting at 507.45 ns). The number of DOFs in the non-linear system is 15005 and the number of DOFs for the auxiliary system is 69000, I'm running on 8 cores.
  • Without a preconditioner it can advance from 0 ns to 3E-6 ns.

Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 8, 2018, 8:16:29 PM1/8/18
to moose...@googlegroups.com
On Mon, Jan 8, 2018 at 3:42 PM, John Haase <jha...@nd.edu> wrote:
Alex,

Here are the results of all these fun experiments
  • Without a line search the simulation fails at 517.164 ns.
So are your base units in nano-seconds? If so, that means you made it a little further without a line search correct? 
  • Doing SVD it's been sitting at time step 1, 0th nonlinear iteration for 2+ hours (starting at 507.45 ns). The number of DOFs in the non-linear system is 15005 and the number of DOFs for the auxiliary system is 69000, I'm running on 8 cores.

Does that mean you did a restart? (And presumably used a different preconditioner up to that time stamp?) Yea, that may be a little large for it. However, if you're going to do a restart, then you could instead use `-pc_type none -ksp_monitor_singular_value -ksp_gmres_restart 1000` to estimate the singular values and condition number within the GMRES cycle. This solve will definitely fail without the preconditioner, but you could then paste in the condition number estimate at the end of the linear solve. You can run this either with NEWTON or JFNK.

  • Without a preconditioner it can advance from 0 ns to 3E-6 ns.

Not too surprising if the condition number is not close to unity.

Alex 

John Haase

unread,
Jan 23, 2018, 2:58:36 PM1/23/18
to moose...@googlegroups.com
Alex,

Going back to my old idea, is there a place where I could put in the line

if ( (number of non-linear iterations) < 1 ) { run_non_linear_iteration; }

? Because then I don't need to have the high absolute tolerance.

I've been digging down into FEProblemBase.C and I can't make heads or tails of it.

Regards,

John R. Haase

Kong, Fande

unread,
Jan 23, 2018, 3:16:41 PM1/23/18
to moose...@googlegroups.com
Does "-snes_force_iteration" work for you?

It will force PETSc take at least one nonlinear iteration.

Fande,

John Haase

unread,
Jan 23, 2018, 3:49:55 PM1/23/18
to moose...@googlegroups.com
No, it does iteration 0 and then says

Time Step 362, time = 14.1586
                dt = 0.04
 0 Nonlinear |R| = 2.837148e-10
 Solve Converged!

Outlier Variable Residual Norms:
  potential: 2.006165e-10
  native_potential: 2.006165e-10

Postprocessor Values:
+----------------+----------------------+------------------------+
| time           | Full_EmissionCurrent | Native_EmissionCurrent |
+----------------+----------------------+------------------------+
:                :                      :                        :
|   1.359864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.363864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.367864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.371864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.375864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.379864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.383864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.387864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.391864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.395864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.399864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.403864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.407864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.411864e+01 |         1.625915e-02 |           1.625269e-02 |
|   1.415864e+01 |         1.625915e-02 |           1.625269e-02 |
+----------------+----------------------+------------------------+



Regards,

John R. Haase

Kong, Fande

unread,
Jan 23, 2018, 4:31:01 PM1/23/18
to moose...@googlegroups.com
How about this?

  petsc_options = "-log_view"

  petsc_options_iname = '-snes_force_iteration'
  petsc_options_value = '1'

You may want to put these into the executioner block.

Fande,

John Haase

unread,
Jan 23, 2018, 5:54:41 PM1/23/18
to moose...@googlegroups.com
Same thing

Time Step 746, time = 58.7539
                dt = 0.08
 0 Nonlinear |R| = 9.991944e-11

 Solve Converged!
Outlier Variable Residual Norms:
  Arp: 9.991866e-11

Postprocessor Values:
+----------------+----------------------+------------------------+
| time           | Full_EmissionCurrent | Native_EmissionCurrent |
+----------------+----------------------+------------------------+
:                :                      :                        :
|   5.763389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.771389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.779389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.787389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.795389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.803389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.811389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.819389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.827389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.835389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.843389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.851389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.859389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.867389e+01 |         2.499758e-02 |           1.556269e-02 |
|   5.875389e+01 |         2.499758e-02 |           1.556269e-02 |
+----------------+----------------------+------------------------+


Regards,

John R. Haase

Kong, Fande

unread,
Jan 23, 2018, 6:06:53 PM1/23/18
to moose...@googlegroups.com
John,

Could you send us the entire output? Not sure if PETSc takes your options or not.

Thanks,

Fande,

John

unread,
Jan 23, 2018, 6:22:44 PM1/23/18
to moose-users
See attached input file and output
Initial.i
output.txt

Kong, Fande

unread,
Jan 23, 2018, 6:47:06 PM1/23/18
to moose...@googlegroups.com
You do not have any options specified in Executioner.  Do you have multiple input files? You sent us a wrong input file?  And there is no any petsc output in "output.txt", that indicates you did not set petsc options  properly.

I would like to try something as follows:

[Executioner]
   type = Transient
   petsc_options = '-log_view -log_summary'

   petsc_options_iname = '-snes_force_iteration'
   petsc_options_value = '1'
   num_steps = 3
[]


Do you have a moose-based input file?  So we could try it.

Fande,

On Tue, Jan 23, 2018 at 4:22 PM, John <haas...@gmail.com> wrote:
See attached input file and output

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

John

unread,
Jan 23, 2018, 7:12:07 PM1/23/18
to moose-users
Fande,

Initial.i is an input file. The 'petsc_options' is on line 76, 'petsc_options_iname' is on line 82, and 'petsc_options_value' is on 83.

However, I cleaned up the Executioner block in Input.i. So 'petsc_options' is on 68, 'petsc_options_iname' is on 71, and 'petsc_options_value' is on 72.

John
Input.i
output.txt

Alexander Lindsay

unread,
Jan 24, 2018, 12:50:01 PM1/24/18
to moose...@googlegroups.com
John, this is an issue on our end. We use our own convergence testing instead of using Petsc's default, and currently we don't honor the `-snes_force_iteration` option. However, Fande is submitting a PR that'll fix this.

Alex

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

Kong, Fande

unread,
Jan 24, 2018, 2:29:44 PM1/24/18
to moose...@googlegroups.com
Hi John,

This PR should fix this issue. https://github.com/idaholab/moose/pull/10599

BTW, it requires an API update at the PETSc side. If you want to try right now, you could just checkout my petsc branch at https://bitbucket.org/petsc/petsc/pull-requests/843/added-api-snesgetforceiteration/diff

Thanks!

Fande,

John Haase

unread,
Jan 25, 2018, 1:07:32 PM1/25/18
to moose...@googlegroups.com
Fande,

This is going to sound like a stupid question... How do I install the new version of Petsc? In my projects directory, I went to petsc then 

cd ~/projects/petsc
git clone -b maint https://bitbucket.org/petsc/petsc petsc
git checkout Fande-Kong/get_force_iteration
cd petsc 
PETSC_DIR=`pwd -P`
 
./configure \
--prefix=$PETSC_DIR \

--download-hypre=1 \
--with-ssl=0 \
--with-debugging=no \
--with-pic=1 \
--with-shared-libraries=1 \
--with-cc=mpicc \
--with-cxx=mpicxx \
--with-fc=mpif90 \
--download-fblaslapack=1 \
--download-metis=1 \
--download-parmetis=1 \
--download-superlu_dist=1 \
--download-scalapack=1 \
--download-mumps=1 \
CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 \
CFLAGS='-fPIC -fopenmp' \
CXXFLAGS='-fPIC -fopenmp' \
FFLAGS='-fPIC -fopenmp' \
FCFLAGS='-fPIC -fopenmp' \
F90FLAGS='-fPIC -fopenmp' \
F77FLAGS='-fPIC -fopenmp' \
PETSC_DIR=`pwd`

And that doesn't work because prefix and PETSC_DIR are the same. I also get an error if I take out the prefix directory.

What am I doing wrong?

Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.

Kong, Fande

unread,
Jan 25, 2018, 2:12:00 PM1/25/18
to moose...@googlegroups.com
Hi John,

You may do not need to use "prefix", and you just install in the current petsc directory  because it is a temporary installation for you.   For this temporary installation, you need to specify environment variables PETSC_DIR and PETSC_ARCH. 


After the installation of PETSc, you may want to rebuild moose and libmesh.

Fande,

John Haase

unread,
Jan 25, 2018, 2:36:28 PM1/25/18
to moose...@googlegroups.com
Fande,

Taking out prefix does not work. I attached the entire configure.log

petsc $ echo $PETSC_DIR
~/project/petsc/petsc
petsc $ echo $PETSC_ARCH
gcc
./configure \

--download-hypre=1 \
--with-ssl=0 \
--with-debugging=no \
--with-pic=1 \
--with-shared-libraries=1 \
--with-cc=mpicc \
--with-cxx=mpicxx \
--with-fc=mpif90 \
--download-fblaslapack=1 \
--download-metis=1 \
--download-parmetis=1 \
--download-superlu_dist=1 \
--download-scalapack=1 \
--download-mumps=1 \
CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 \
CFLAGS='-fPIC -fopenmp' \
CXXFLAGS='-fPIC -fopenmp' \
FFLAGS='-fPIC -fopenmp' \
FCFLAGS='-fPIC -fopenmp' \
F90FLAGS='-fPIC -fopenmp' \
F77FLAGS='-fPIC -fopenmp' \
PETSC_DIR=`pwd`

         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
-------------------------------------------------------------------------------
--download-fblaslapack libraries cannot be used


Regards,

John R. Haase

configure.log

Alexander Lindsay

unread,
Jan 25, 2018, 5:52:15 PM1/25/18
to moose...@googlegroups.com
I usually configure with --prefix=$HOME/petsc/installed (e.g. different from the root directory $HOME/petsc)

John Haase

unread,
Jan 25, 2018, 6:23:16 PM1/25/18
to moose...@googlegroups.com
Hi Alex,

THank you for the suggestion. I'm still getting the error

*******************************************************************************
         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
-------------------------------------------------------------------------------
--download-fblaslapack libraries cannot be used
*******************************************************************************
  File "./config/configure.py", line 393, in petsc_configure
    framework.configure(out = sys.stdout)
  File "~/projects/petsc/petsc/config/BuildSystem/config/framework.py", line 1097, in configure
    self.processChildren()
  File "~/projects/petsc/petsc/config/BuildSystem/config/framework.py", line 1086, in processChildren
    self.serialEvaluation(self.childGraph)
  File "~/projects/petsc/petsc/config/BuildSystem/config/framework.py", line 1067, in serialEvaluation
    child.configure()
  File "~/projects/petsc/petsc/config/BuildSystem/config/package.py", line 857, in configure
    self.executeTest(self.configureLibrary)
  File "~/projects/petsc/petsc/config/BuildSystem/config/base.py", line 126, in executeTest
    ret = test(*args,**kargs)
  File "~/projects/petsc/petsc/config/BuildSystem/config/packages/BlasLapack.py", line 339, in configureLibrary
    for (name, blasLibrary, lapackLibrary) in self.generateGuesses():
  File "~/projects/petsc/petsc/config/BuildSystem/config/packages/BlasLapack.py", line 164, in generateGuesses
    raise RuntimeError('--download-fblaslapack libraries cannot be used')
================================================================================
Finishing Configure Run at Thu Jan 25 17:58:44 2018
================================================================================

Regards,

John R. Haase

Kong, Fande

unread,
Jan 25, 2018, 6:42:00 PM1/25/18
to moose...@googlegroups.com
Hi John,

Please send your config.log to petsc list (petsc...@mcs.anl.gov ). They should be able to help one this.

Fande,

John Haase

unread,
Jan 27, 2018, 12:56:16 AM1/27/18
to moose...@googlegroups.com
Fande,

Thank you for the advice to go to the petsc board. They got the installation out. However, when I came back to Moose and tried to rebuild libmesh I get the error

/usr/bin/ld: gk_jbufs: TLS definition in contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss mismatches non-TLS definition in ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so section .bss
${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[1]: *** [libmesh_dbg.la] Error 1
make[1]: Leaving directory `${HOME}/projects/moose/libmesh/build'


Regards,

John R. Haase

Alexander Lindsay

unread,
Jan 27, 2018, 10:10:46 AM1/27/18
to moose...@googlegroups.com
How did you rebuild?

Are you certain that your environment is clear of your previous petsc installation? E.g. it's not in your library or include paths?
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

John Haase

unread,
Jan 27, 2018, 1:40:23 PM1/27/18
to moose...@googlegroups.com
Alex,

I changed all the environment variables over

~ $ tail -n 20 .bashrc
export MOOSE_BASE=${HOME}/projects/
export MOOSE_DIR=$MOOSE_BASE/moose/
export ZAPDIR=$MOOSE_BASE/zapdos/
export BOLOSDIR=$MOOSE_BASE/bolos/
export PATH=$MOOSE_BASE/moose/gui:$PATH
export CC=mpicc
export CXX=mpicxx
export F90=mpif90
export F77=mpif77
export FC=mpif90
source ~/.moose-profile.bash
clear
~ $ cat .moose-profile.bash
#!/bin/csh
 
#  Moose location
export PACKAGES_DIR=$MOOSE_BASE/
export CC=mpicc
export CXX=mpicxx
export F77=mpif77
export F90=mpif90
export FC=mpif90
export ARCH=gcc
export PETSC_DIR=$PACKAGES_DIR/petsc/petsc/ #-3.6.4/$ARCH-opt

I deleted my libmesh folder as was recommended here, then tried to rebuild like normal in Step 3 of the Getting started guide

~ $ cd ~/projects/moose/
moose $ echo $PETSC_DIR
{HOME}/projects///petsc/petsc/
moose $ rm -rf libmesh/
moose $ scripts/update_and_rebuild_libmesh.sh
.
.
.
/usr/bin/ld: gk_jbufs: TLS definition in contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss mismatches non-TLS definition in ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so section .bss
${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[1]: *** [libmesh_dbg.la] Error 1
make[1]: Leaving directory `${HOME}/projects/moose/libmesh/build'
make: *** [all-recursive] Error 1

And attached is the full output


Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
output.txt

Kong, Fande

unread,
Jan 29, 2018, 10:17:34 AM1/29/18
to moose...@googlegroups.com
If I remembered  correctly,  your petsc envireorments should look like: PETSC_DIR=${HOME}/projects/petsc/petsc PETSC_ARCH=arch-linux2-c-opt


Please  do:

PETSC_DIR=${HOME}/projects/petsc/petsc PETSC_ARCH=arch-linux2-c-opt ./scripts/update_and_rebuild_libmesh.sh



Fande,

Cody Permann

unread,
Jan 29, 2018, 10:35:53 AM1/29/18
to moose...@googlegroups.com
Fande, This is probably not the best way to advise users to install their own version of PETSC (as this thread is quickly showing). 

Jason created a "petsc_head" module for users to load that makes it a very easy to install a custom version of PETSc. Jason, do we have instructions on the wiki somewhere that explain this process?

Thanks! 

To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.

Peterson, JW

unread,
Jan 29, 2018, 10:43:47 AM1/29/18
to moose-users
On Fri, Jan 26, 2018 at 10:55 PM, John Haase <jha...@nd.edu> wrote:
Fande,

Thank you for the advice to go to the petsc board. They got the installation out. However, when I came back to Moose and tried to rebuild libmesh I get the error

/usr/bin/ld: gk_jbufs: TLS definition in contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss mismatches non-TLS definition in ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so section .bss
${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[1]: *** [libmesh_dbg.la] Error 1
make[1]: Leaving directory `${HOME}/projects/moose/libmesh/build'


This error means the Metis that comes with PETSc is conflicting with the one that is in libMesh. 

To fix this, you have to configure libmesh with --with-metis=PETSc

Normally you don't have to do this with the PETSc that comes in the MOOSE package because we don't build it with Metis, but if you install your own PETSc with your own configure flags you have to be a bit more careful.

--
John

John Haase

unread,
Jan 29, 2018, 11:22:14 AM1/29/18
to moose...@googlegroups.com
John,

I was following the Petsc install instructions from http://mooseframework.org/wiki/ClusterInstructions/ClusterSetupSingleUser/. And it has the "--download-metis=1"

Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

luksh...@gmail.com

unread,
Jan 29, 2018, 11:23:33 AM1/29/18
to moose...@googlegroups.com
On Monday, January 29, 2018 11:43 PM, Peterson, JW wrote:
>
>
> On Fri, Jan 26, 2018 at 10:55 PM, John Haase <jha...@nd.edu
> <mailto:jha...@nd.edu>> wrote:
>
> Fande,
>
> Thank you for the advice to go to the petsc board. They got the
> installation out. However, when I came back to Moose and tried to
> rebuild libmesh I get the error
>
> /usr/bin/ld: gk_jbufs: TLS definition in
> contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss
> mismatches non-TLS definition in
> ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so
> section .bss
> ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so:
> error adding symbols: Bad value
> collect2: error: ld returned 1 exit status
> make[1]: *** [libmesh_dbg.la
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libmesh-5Fdbg.la&d=DwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=_xYTkKhT7Uoxyv7QjGq6Rz7jy4vPwjDwCsDP55WDDmc&m=SGMhxsXuJtz1ZWfdhGqIhKcEZCgzeQ0ig123cuS8ZHM&s=U1MnTAN-Y7r8u5rcyolr0d3kWRv0DJLdK753xrlVSMA&e=>]
> Error 1
> make[1]: Leaving directory `${HOME}/projects/moose/libmesh/build'
>
>
>
> This error means the Metis that comes with PETSc is conflicting with the
> one that is in libMesh.
>
> To fix this, you have to configure libmesh with --with-metis=PETSc
>
> Normally you don't have to do this with the PETSc that comes in the
> MOOSE package because we don't build it with Metis, but if you install
> your own PETSc with your own configure flags you have to be a bit more
> careful.

John,

PETSc from the moose-environment package did come built with Metis. It's
here /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so/,
for example. Am I reading you correctly?

I chipped in because I encounter a similar problem a while ago with a
conflict between a system Metis and the moose-provided one. Here's the
error message when linking unit_tests-dbg

```
CXXLD unit_tests-dbg
/usr/bin/ld: gk_cur_jbufs: TLS definition in
/usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
non-TLS definition in
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
/usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
```

I did try `--enable-metis --with-metis=PETSc` but got the same error.

What is your advice on the configure options to use to avoid such collision?

Thanks in advance,
-- st

Peterson, JW

unread,
Jan 29, 2018, 11:35:20 AM1/29/18
to moose-users
On Mon, Jan 29, 2018 at 9:21 AM, John Haase <jha...@nd.edu> wrote:
John,

I was following the Petsc install instructions from http://mooseframework.org/wiki/ClusterInstructions/ClusterSetupSingleUser/. And it has the "--download-metis=1"

OK, it sounds like we need to update our cluster setup instructions to include the --with-metis=PETSc option when configuring libmesh.

--
John

Peterson, JW

unread,
Jan 29, 2018, 11:41:55 AM1/29/18
to moose-users
Hmm... I admit I have not tested our configure scripts in an environment that has a Metis in /usr/lib, that is a pretty non-standard configuration.

In your case, the conflict is not between libMesh's Metis and PETSc's Metis, but between PETSc's Metis and the one in /usr/lib. This *could* happen if you somehow have /usr/lib in your link line before /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib, but we do our best to make sure that doesn't happen.

I don't suppose you still have a config.log file laying around for this failed build?

--
John

John Haase

unread,
Jan 29, 2018, 11:10:21 PM1/29/18
to moose...@googlegroups.com
John,

I had not saved the config files because I recompiled with the old version of petsc. However, I tried to recompile moose with the new petsc. Attached are the two config.log's that I found in my moose folder after the attempt. The names are the paths from ~/projects/moose.

Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
libmesh-build-contrib-netcdf-v4-config.log
libmesh-build-config.log

luksh...@gmail.com

unread,
Jan 30, 2018, 6:53:15 AM1/30/18
to moose...@googlegroups.com
On Tuesday, January 30, 2018 12:41 AM, Peterson, JW wrote:
>
>
> On Mon, Jan 29, 2018 at 9:23 AM, <luksh...@gmail.com
> <mailto:luksh...@gmail.com>> wrote:
>
> On Monday, January 29, 2018 11:43 PM, Peterson, JW wrote:
>
>
>
> On Fri, Jan 26, 2018 at 10:55 PM, John Haase <jha...@nd.edu
> <mailto:jha...@nd.edu> <mailto:jha...@nd.edu
> <mailto:jha...@nd.edu>>> wrote:
>
>     Fande,
>
>     Thank you for the advice to go to the petsc board. They got the
>     installation out. However, when I came back to Moose and
> tried to
>     rebuild libmesh I get the error
>
>     /usr/bin/ld: gk_jbufs: TLS definition in
>     contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss
>     mismatches non-TLS definition in
>     ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so
>     section .bss
>     ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so:
>     error adding symbols: Bad value
>     collect2: error: ld returned 1 exit status
>     make[1]: *** [libmesh_dbg.la <http://libmesh_dbg.la>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libmesh-5Fdbg.la&d=DwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=_xYTkKhT7Uoxyv7QjGq6Rz7jy4vPwjDwCsDP55WDDmc&m=SGMhxsXuJtz1ZWfdhGqIhKcEZCgzeQ0ig123cuS8ZHM&s=U1MnTAN-Y7r8u5rcyolr0d3kWRv0DJLdK753xrlVSMA&e=
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libmesh-5Fdbg.la&d=DwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=_xYTkKhT7Uoxyv7QjGq6Rz7jy4vPwjDwCsDP55WDDmc&m=SGMhxsXuJtz1ZWfdhGqIhKcEZCgzeQ0ig123cuS8ZHM&s=U1MnTAN-Y7r8u5rcyolr0d3kWRv0DJLdK753xrlVSMA&e=>>]
>     Error 1
>     make[1]: Leaving directory
> `${HOME}/projects/moose/libmesh/build'
>
>
>
> This error means the Metis that comes with PETSc is conflicting
> with the one that is in libMesh.
>
> To fix this, you have to configure libmesh with --with-metis=PETSc
>
> Normally you don't have to do this with the PETSc that comes in
> the MOOSE package because we don't build it with Metis, but if
> you install your own PETSc with your own configure flags you
> have to be a bit more careful.
>
>
> John,
>
> PETSc from the moose-environment package did come built with Metis.
> It's here
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so/
> <http://libmetis.so/>, for example. Am I reading you correctly?
>
> I chipped in because I encounter a similar problem a while ago with
> a conflict between a system Metis and the moose-provided one. Here's
> the error message when linking unit_tests-dbg
>
>
> ```
> CXXLD    unit_tests-dbg
> /usr/bin/ld: gk_cur_jbufs: TLS definition in
> /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
> non-TLS definition in
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
> section .data
> /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
> collect2: error: ld returned 1 exit status
> ```
>
> I did try `--enable-metis --with-metis=PETSc` but got the same error.
>
>
> Hmm... I admit I have not tested our configure scripts in an environment
> that has a Metis in /usr/lib, that is a pretty non-standard configuration.

This Metis is a dependency of some other program installed by the
debian/ubuntu package management system.

>
> In your case, the conflict is not between libMesh's Metis and PETSc's
> Metis, but between PETSc's Metis and the one in /usr/lib. This *could*
> happen if you somehow have /usr/lib in your link line
> before /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib, but we do our
> best to make sure that doesn't happen.

I suppose these is the options line passed to the linker

```
libmesh_optional_LIBS='-lz -Wl,-rpath,/opt/moose/tbb44_20150728/lib
-L/opt/moose/tbb44_20150728/lib -ltbb -ltbbmalloc
-L/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib
-Wl,-rpath,/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib
-L/opt/moose/mpich-3.2/gcc-7.2.0/lib
-L/opt/moose/gcc-7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0
-L/opt/moose/gcc-7.2.0/lib64 -L/opt/moose/gcc-7.2.0/lib -lpetsc
-lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord
-lparmetis -lmetis -lHYPRE -lscalapack -lflapack -lfblas -lX11 -lmpifort
-lgfortran -lquadmath -lmpicxx -lstdc++ -lm
-Wl,-rpath,/opt/moose/gcc-7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0
-Wl,-rpath,/opt/moose/gcc-7.2.0/lib64
-Wl,-rpath,/opt/moose/gcc-7.2.0/lib
-Wl,-rpath,/opt/moose/mpich-3.2/gcc-7.2.0/lib -lmpi -lgomp -lgcc_s
-lpthread -ldl -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
-Wl,-rpath,/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu'
```

and therefore it should find the Metis from PETSc first before searching
/usr/lib/x86_64-linux-gnu but apparently it didn't. Complete config.log
(gzipped) attached.

Regards,
-- st
config.log.gz

Peterson, JW

unread,
Jan 30, 2018, 10:26:15 AM1/30/18
to moose-users
On Tue, Jan 30, 2018 at 4:52 AM, <luksh...@gmail.com> wrote:

    PETSc from the moose-environment package did come built with Metis.
    It's here
    /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so/


    I chipped in because I encounter a similar problem a while ago with
    a conflict between a system Metis and the moose-provided one. Here's
    the error message when linking unit_tests-dbg

    ```
    CXXLD    unit_tests-dbg
    /usr/bin/ld: gk_cur_jbufs: TLS definition in
    /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
    non-TLS definition in
    /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
    section .data
    /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
    collect2: error: ld returned 1 exit status
    ```

    I did try `--enable-metis --with-metis=PETSc` but got the same error.


You mistyped the option. According to your config.log file, you used "--with-petsc=PETSc", but it needs to be "--with-metis=PETSc".

This may still not fix the problem, but I don't see any other obvious issues (link line looks OK).

--
John

John Haase

unread,
Jan 30, 2018, 11:01:40 AM1/30/18
to moose...@googlegroups.com
John,

Anything pop out on my config files?

Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
libmesh-build-contrib-netcdf-v4-config.log
libmesh-build-config.log

Peterson, JW

unread,
Jan 30, 2018, 1:10:14 PM1/30/18
to moose-users
On Mon, Jan 29, 2018 at 9:09 PM, John Haase <jha...@nd.edu> wrote:
John,

I had not saved the config files because I recompiled with the old version of petsc. However, I tried to recompile moose with the new petsc. Attached are the two config.log's that I found in my moose folder after the attempt. The names are the paths from ~/projects/moose.

Looks OK to me. Are you still getting the exact same linker error

> /usr/bin/ld: gk_jbufs: TLS definition in contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss mismatches non-TLS definition in ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so section .bss
> ${HOME}/projects/petsc/petsc/arch-linux2-c-opt/lib/libmetis.so: error adding symbols: Bad value

from this configuration, or is there some new problem now?

--
John

John Haase

unread,
Jan 30, 2018, 3:36:13 PM1/30/18
to moose...@googlegroups.com
It is still the same error.

Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

Peterson, JW

unread,
Jan 30, 2018, 4:01:16 PM1/30/18
to moose-users
On Tue, Jan 30, 2018 at 1:35 PM, John Haase <jha...@nd.edu> wrote:
It is still the same error.

OK, first make sure the PETSc you are using is really built with Metis. Look in the file ${PETSC_DIR}/include/petscconf.h and see what it sets for the

PETSC_HAVE_METIS 
PETSC_HAVE_PARMETIS

#defines.

Then make sure that the Metis in contrib/metis is *not* being built by libmesh. Look in the $MOOSE_DIR/libmesh/build/contrib/Makefile. If Metis is *not* being built by libmesh, you should see a bunch of commented out lines like the following:

#am__append_99 = metis
#am__append_100 = metis/libopt.la
#am__append_101 = metis/libdbg.la
#am__append_102 = metis/libdevel.la
#am__append_103 = metis/libprof.la
#am__append_104 = metis/liboprof.la
#am__append_105 = parmetis
#am__append_106 = parmetis/libopt.la
#am__append_107 = parmetis/libdbg.la
#am__append_108 = parmetis/libdevel.la
#am__append_109 = parmetis/libprof.la
#am__append_110 = parmetis/liboprof.la

If both of the above things are correct, then it is possible that there is an old contrib/.libs/libcontrib_dbg.a sitting around that still has the mismatched symbols in it and you need to blow away your entire libmesh build directory, and try building again.

If none of that fixes the problem, we probably won't be able to solve it without having access to the offending system.

--
John

John Haase

unread,
Jan 30, 2018, 6:37:46 PM1/30/18
to moose...@googlegroups.com
John,

I don't have ${PETSC_DIR}/include/petscconf.h. However, I do have a ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h. So I will assume those are equivalent please let me know if that is an issue.

#ifndef PETSC_HAVE_METIS
#define PETSC_HAVE_METIS 1
#endif

#ifndef PETSC_HAVE_PARMETIS
#define PETSC_HAVE_PARMETIS 1
#endif

In $MOOSE_DIR/libmesh/build/contrib/Makefile the lines were not commented out, so I commented them out

am__append_100 = metis
am__append_101 = metis/libopt.la
am__append_102 = metis/libdbg.la
am__append_103 = metis/libdevel.la
am__append_104 = metis/libprof.la
am__append_105 = metis/liboprof.la
am__append_106 = parmetis
am__append_107 = parmetis/libopt.la
am__append_108 = parmetis/libdbg.la
am__append_109 = parmetis/libdevel.la
am__append_110 = parmetis/libprof.la
am__append_111 = parmetis/liboprof.la

Deleting the libmesh folder did not solve the problem.

/usr/bin/ld: gk_jbufs: TLS definition in contrib/.libs/libcontrib_dbg.a(libdbg_la-error.o) section .tbss mismatches non-TLS definition in $HOME/projects/petsc/arch-linux2-c-opt/lib/libmetis.so section .bss
$HOME/projects/petsc/arch-linux2-c-opt/lib/libmetis.so: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[1]: *** [libmesh_dbg.la] Error 1
make[1]: Leaving directory `$HOME/projects/moose/libmesh/build'
make: *** [all-recursive] Error 1


Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

luksh...@gmail.com

unread,
Jan 31, 2018, 9:57:31 AM1/31/18
to moose...@googlegroups.com
On Tuesday, January 30, 2018 11:25 PM, Peterson, JW wrote:
>
>
> On Tue, Jan 30, 2018 at 4:52 AM, <luksh...@gmail.com
> <mailto:luksh...@gmail.com>> wrote:
>
>
>     PETSc from the moose-environment package did come built
> with Metis.
>     It's here
>
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so/
> <http://libmetis.so/>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libmetis.so_&d=DwIFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=_xYTkKhT7Uoxyv7QjGq6Rz7jy4vPwjDwCsDP55WDDmc&m=KP8bdCUFzlzzvfOVRzMGp9WoAkhwMjlNAZOaWSY_n2k&s=AnxoPLmJkUVZXVZ_j7Q5p_zksjUVGiU9bTk7rd-7YD0&e=
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__libmetis.so_&d=DwIFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=_xYTkKhT7Uoxyv7QjGq6Rz7jy4vPwjDwCsDP55WDDmc&m=KP8bdCUFzlzzvfOVRzMGp9WoAkhwMjlNAZOaWSY_n2k&s=AnxoPLmJkUVZXVZ_j7Q5p_zksjUVGiU9bTk7rd-7YD0&e=>>,
> for example. Am I reading you correctly?
>
>     I chipped in because I encounter a similar problem a while
> ago with
>     a conflict between a system Metis and the moose-provided
> one. Here's
>     the error message when linking unit_tests-dbg
>
>     ```
>     CXXLD    unit_tests-dbg
>     /usr/bin/ld: gk_cur_jbufs: TLS definition in
>     /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata
> mismatches
>     non-TLS definition in
>     /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
>     section .data
>     /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding
> symbols: Bad value
>     collect2: error: ld returned 1 exit status
>     ```
>
>     I did try `--enable-metis --with-metis=PETSc` but got the
> same error.
>
>
>
> You mistyped the option. According to your config.log file, you used
> "--with-petsc=PETSc", but it needs to be "--with-metis=PETSc".

My bad. :-(

>
> This may still not fix the problem, but I don't see any other obvious
> issues (link line looks OK).

Tried again but got the same error. Gzipped config.log attached.

It seems the error occurred when linking unit_tests-dbg. Is there a
--disable-unit_tests option or something similar to see if it can be
narrowed down.

Regards,
-- st
config.log.20180131.gz

Peterson, JW

unread,
Jan 31, 2018, 10:18:57 AM1/31/18
to moose-users
On Tue, Jan 30, 2018 at 4:37 PM, John Haase <jha...@nd.edu> wrote:
John,

I don't have ${PETSC_DIR}/include/petscconf.h. However, I do have a ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h. So I will assume those are equivalent please let me know if that is an issue.

We actually search in both locations, so yeah that's fine. 

 

#ifndef PETSC_HAVE_METIS
#define PETSC_HAVE_METIS 1
#endif

#ifndef PETSC_HAVE_PARMETIS
#define PETSC_HAVE_PARMETIS 1
#endif

OK, that confirms your PETSc was indeed built with Metis and ParMetis


 
In $MOOSE_DIR/libmesh/build/contrib/Makefile the lines were not commented out, so I commented them out

am__append_100 = metis
am__append_101 = metis/libopt.la
am__append_102 = metis/libdbg.la
am__append_103 = metis/libdevel.la
am__append_104 = metis/libprof.la
am__append_105 = metis/liboprof.la
am__append_106 = parmetis
am__append_107 = parmetis/libopt.la
am__append_108 = parmetis/libdbg.la
am__append_109 = parmetis/libdevel.la
am__append_110 = parmetis/libprof.la
am__append_111 = parmetis/liboprof.la

If the lines *weren't* commented out, then that is the problem: libmesh is still building its own copy of Metis for some reason, and not detecting that PETSc has its own.

BTW: I don't think you can just manually comment out these lines before building, automake-generated Makefiles are very complicated and should never be edited by hand...



In the same shell where you have been configuring/building libmesh, what does running the following command do?

cat ${PETSC_DIR}/include/petscconf.h ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h 2>/dev/null | grep -c PETSC_HAVE_METIS

There may be some problem with the handling of ${PETSC_ARCH} in our configure scripts, or the value may not be set right in your environment, or something else...

--
John

Peterson, JW

unread,
Jan 31, 2018, 10:28:34 AM1/31/18
to moose-users
<<< Found PETSc 3.7.6 installation in /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt ... >>>

Are you also using a $PETSC_ARCH? Can you look for those same lines I asked John Haase to look for in $MOOSE_DIR/libmesh/build/contrib/Makefile and petscconf.h? It's possible that you guys are having the same issue.

 

It seems the error occurred when linking unit_tests-dbg. Is there a --disable-unit_tests option or something similar to see if it can be narrowed down.

I think configuring with --disable-cppunit will do this.

--
John

luksh...@gmail.com

unread,
Feb 1, 2018, 8:48:22 AM2/1/18
to moose...@googlegroups.com
On Wednesday, January 31, 2018 11:28 PM, Peterson, JW wrote:
>
>
> On Wed, Jan 31, 2018 at 7:57 AM, <luksh...@gmail.com
> <mailto:luksh...@gmail.com>> wrote:
>
> On Tuesday, January 30, 2018 11:25 PM, Peterson, JW wrote:
>
>
>
> On Tue, Jan 30, 2018 at 4:52 AM, <luksh...@gmail.com
> <mailto:luksh...@gmail.com> <mailto:luksh...@gmail.com
> <mailto:luksh...@gmail.com>>> wrote:
>
> This may still not fix the problem, but I don't see any other
> obvious issues (link line looks OK).
>
>
> Tried again but got the same error. Gzipped config.log attached.
>
>
> <<< Found PETSc 3.7.6 installation in
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt ... >>>
>
> Are you also using a $PETSC_ARCH? Can you look for those same lines I

No, it's not set. The PETSc I used comes with the moose-environment
package (ubuntu xenial) and it has metis support. I don't have any other
PETSc in the system.

> asked John Haase to look for in
> $MOOSE_DIR/libmesh/build/contrib/Makefile and petscconf.h? It's possible
> that you guys are having the same issue.

I think John Hasse's problem is that his PETSc is built with
metis/parmetis support but when libmesh is configured, somehow it also
enables metis (which comes with libmesh) and hence the collision during
linking. He has only 1 metis under PETSc (before building libmesh).

My case is that I have 2 - the moose-environment PETSc (with metis
support) and at the same time a system-installed metis in the standard
/usr hierarchy.

Maybe getting the configure options of --enable-metis and --with-metis
clarified will move us a step forward. Please check if my understanding
is correct, with questions/comments interposed.

1) --enable-metis (but --with-metis not set)
Build libmesh with metis support. Metis will be searched (and linked)
based on the logic of the linker (gcc/ld for both of us).

Comment: The linker should be able to find the correct metis (if
there're more than one) based on its search logic but it didn't.

2) --disable-metis
libmesh will not be built with metis support and will not search for it
at all *and* will not build the metis in the contrib/ sub-directory as well.

Comment: Linking should always work (even if there are more than one
metis in the search path, but it didn't).

Q: What is the default if it's not used?

3) --with-metis=internal
Searching for metis is modified and libmesh will build and use the metis
under the contrib/ subdirectory.

Q. Does using it implies --enable-metis?
Q. How will the search order be affected?

4) --with-metis=PETSc
Do not build metis under contrib/ but modify the search order to look
for it under PETSc first.

Q. Does using it implies --enable-metis?
Q. How will the search order be affected?

I think there're should a way to alter the search order
(environmental-modules is used to find the correct PETSc, for example),
only that I don't fully understand how the (gcc/ld) manual.

Grateful for any enlightenment. :-)

>
>
> It seems the error occurred when linking unit_tests-dbg. Is there a
> --disable-unit_tests option or something similar to see if it can be
> narrowed down.
>
>
> I think configuring with --disable-cppunit will do this.

I've wiped out old working trees but I'll try building with this option
and report back (also the answers to questions you post to John Hasse).

Thanks very much,
-- st

John Haase

unread,
Feb 1, 2018, 11:13:56 AM2/1/18
to moose...@googlegroups.com
John,

The output of the command

cat ${PETSC_DIR}/include/petscconf.h ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h 2>/dev/null | grep -c PETSC_HAVE_METIS

is 2.


Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

Peterson, JW

unread,
Feb 1, 2018, 11:51:09 AM2/1/18
to moose-users
On Thu, Feb 1, 2018 at 9:13 AM, John Haase <jha...@nd.edu> wrote:
John,

The output of the command

cat ${PETSC_DIR}/include/petscconf.h ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h 2>/dev/null | grep -c PETSC_HAVE_METIS

is 2.

Well, this makes no sense to me at all then.

If the above command works for you in the shell, it should also work during configure, and therefore libmesh should not build its own Metis. 

But that contradicts the Makefile lines you showed us previously, so something is inconsistent.

I don't think I can debug this further without having access to the system where it is happening.

--
John

Peterson, JW

unread,
Feb 1, 2018, 12:50:33 PM2/1/18
to moose-users
On Thu, Feb 1, 2018 at 6:47 AM, <luksh...@gmail.com> wrote:
On Wednesday, January 31, 2018 11:28 PM, Peterson, JW wrote:


On Wed, Jan 31, 2018 at 7:57 AM, <luksh...@gmail.com <mailto:luksh...@gmail.com>> wrote:

    On Tuesday, January 30, 2018 11:25 PM, Peterson, JW wrote:



        On Tue, Jan 30, 2018 at 4:52 AM, <luksh...@gmail.com
        <mailto:luksh...@gmail.com> <mailto:luksh...@gmail.com
        <mailto:luksh...@gmail.com>>> wrote:

        This may still not fix the problem, but I don't see any other
        obvious issues (link line looks OK).


    Tried again but got the same error. Gzipped config.log attached.


<<< Found PETSc 3.7.6 installation in /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt ... >>>

Are you also using a $PETSC_ARCH? Can you look for those same lines I

No, it's not set. The PETSc I used comes with the moose-environment package (ubuntu xenial) and it has metis support. I don't have any other PETSc in the system.

asked John Haase to look for in $MOOSE_DIR/libmesh/build/contrib/Makefile and petscconf.h? It's possible that you guys are having the same issue.

I think John Hasse's problem is that his PETSc is built with metis/parmetis support but when libmesh is configured, somehow it also enables metis (which comes with libmesh) and hence the collision during linking. He has only 1 metis under PETSc (before building libmesh).

My case is that I have 2 - the moose-environment PETSc (with metis support) and at the same time a system-installed metis in the standard /usr hierarchy.


If this is really the issue, then short of uninstalling the metis from /usr/lib, I don't know how to fix the problem. Your link line looked OK to me, that was my only idea for what could be the problem.


 
Maybe getting the configure options of --enable-metis and --with-metis clarified will move us a step forward. Please check if my understanding is correct, with questions/comments interposed.

1) --enable-metis (but --with-metis not set)
Build libmesh with metis support. Metis will be searched (and linked) based on the logic of the linker (gcc/ld for both of us).

Comment: The linker should be able to find the correct metis (if there're more than one) based on its search logic but it didn't.

Yes, but I just verified that if you have a PETSc-with-Metis installed, you *do not* need to explicitly specify --with-metis=PETSc, it should happen automatically.

 

2) --disable-metis
libmesh will not be built with metis support and will not search        for it at all *and* will not build the metis in the contrib/ sub-directory as well.

Comment: Linking should always work (even if there are more than one metis in the search path, but it didn't).

Right, --disable-metis at the libmesh level just means you won't be able to use the MetisPartitioner. PETSc will still have Metis and there can still potentially be a conflict between PETSc's Metis and the one in /usr/lib (as you are seeing).

 
Q: What is the default if it's not used?

--enable-metis is the default, and --with-metis=PETSc is automatically set if the proper #defines are detected in petscconf.h.

 

3) --with-metis=internal
Searching for metis is modified and libmesh will build and use the metis under the contrib/ subdirectory.

This should really never be explicitly specified, I don't think. 
 

Q. Does using it implies --enable-metis?
Q. How will the search order be affected?

4) --with-metis=PETSc
Do not build metis under contrib/ but modify the search order to look for it under PETSc first.

Q. Does using it implies --enable-metis?
Q. How will the search order be affected?

If by search order you are referring to the order of libraries on the link line, that is roughly controlled by the order in which external and contributed packages are tested for and found by libmesh's configure.

The one modification we do is, after building the initial string of "-L/foo -lbar" tokens, we *move* all the "-L/usr/lib" directives to the *end* of the link line. You can see this in your config.log file if you search for the libmesh_optional_LIBS line: all the /usr/lib/x86_64-linux-gnu stuff is at the end, and in particular they are *after* -L/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib, so that's why I'm stumped by the link error.

Can you try running ldd on /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.3.7.6.so (just guessing at the filename here). Maybe the wrong Metis is already linked in at that level?

--
John

luksh...@gmail.com

unread,
Feb 2, 2018, 2:06:23 AM2/2/18
to moose...@googlegroups.com
On Friday, February 02, 2018 01:50 AM, Peterson, JW wrote:

[snapped]
>
> <<< Found PETSc 3.7.6 installation in
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt ... >>>
>
> Are you also using a $PETSC_ARCH? Can you look for those same
> lines I
>
>
> No, it's not set. The PETSc I used comes with the moose-environment
> package (ubuntu xenial) and it has metis support. I don't have any
> other PETSc in the system.
>
> asked John Haase to look for in
> $MOOSE_DIR/libmesh/build/contrib/Makefile and petscconf.h? It's
> possible that you guys are having the same issue.
>
>
> I think John Hasse's problem is that his PETSc is built with
> metis/parmetis support but when libmesh is configured, somehow it
> also enables metis (which comes with libmesh) and hence the
> collision during linking. He has only 1 metis under PETSc (before
> building libmesh).
>
> My case is that I have 2 - the moose-environment PETSc (with metis
> support) and at the same time a system-installed metis in the
> standard /usr hierarchy.
>
>
>
> If this is really the issue, then short of uninstalling the metis from
> /usr/lib, I don't know how to fix the problem. Your link line looked OK
> to me, that was my only idea for what could be the problem.

Yes, that worked. Linking libmesh with my own PETSc using the system
metis and other libraries also works.

BTW, for moose, is gcc-7 really needed? Will gcc-5.4 (the default in
ubuntu xenial) do?
Thanks for a clear explanation.

>
> If by search order you are referring to the order of libraries on the
> link line, that is roughly controlled by the order in which external and
> contributed packages are tested for and found by libmesh's configure.
>
> The one modification we do is, after building the initial string of
> "-L/foo -lbar" tokens, we *move* all the "-L/usr/lib" directives to the
> *end* of the link line. You can see this in your config.log file if you
> search for the libmesh_optional_LIBS line: all the
> /usr/lib/x86_64-linux-gnu stuff is at the end, and in particular they
> are *after* -L/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib, so
> that's why I'm stumped by the link error.

Yes, I am puzzled too. That should make the linker search the PETSc
location first before the system defaults as man ld says.

```
The directories are searched in the order in which they are specified on
the command line. Directories specified on the command line are searched
before the default directories. All -L options apply to all -l options,
regardless of the order in which the options appear.
```
I'm a bit uneasy about the last sentence. Maybe somewhere in the
configuring to build some binaries, the one in /usr/lib is somehow
found. I don't know how it happened or why it should happen, though.

>
> Can you try running ldd on
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.3.7.6.so
> <http://libpetsc.3.7.6.so> (just guessing at the filename here). Maybe
> the wrong Metis is already linked in at that level?

```
$ ldd /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so

linux-vdso.so.1 => (0x00007ffd1bbec000)
libsuperlu_dist.so.5 =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libsuperlu_dist.so.5
(0x00007f6e456ff000)
libparmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
(0x00007f6e454c1000)
libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
(0x00007f6e45249000)
libmpicxx.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpicxx.so.12
(0x00007f6e45023000)
libstdc++.so.6 => /opt/moose/gcc-7.2.0/lib64/libstdc++.so.6
(0x00007f6e44ca3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6e4499a000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f6e44660000)
libmpifort.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpifort.so.12
(0x00007f6e44426000)
libgfortran.so.4 => /opt/moose/gcc-7.2.0/lib64/libgfortran.so.4
(0x00007f6e44055000)
libquadmath.so.0 => /opt/moose/gcc-7.2.0/lib64/libquadmath.so.0
(0x00007f6e43e17000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6e43c13000)
libmpi.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpi.so.12
(0x00007f6e4368b000)
libgomp.so.1 => /opt/moose/gcc-7.2.0/lib64/libgomp.so.1
(0x00007f6e4345e000)
libgcc_s.so.1 => /opt/moose/gcc-7.2.0/lib64/libgcc_s.so.1
(0x00007f6e43248000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007f6e4302b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6e42c61000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6e42a59000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6e46e09000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f6e42837000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f6e42633000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
(0x00007f6e4242d000)
```

Perfectly normal, and before even sourcing moose_profile. Is the PETSc
location hardwired into the optional libraries when you build PETSc?

I'll try to clone a new moose and test the --disable-cppunit option and
see if I get anything different.

Is there an example/test in libmesh that uses metis so that I can have
an additional test as well?

Regards,
-- st

luksh...@gmail.com

unread,
Feb 2, 2018, 2:43:13 AM2/2/18
to moose...@googlegroups.com
Just to make sure, can you do an alternative grep on the PETSc you built
using something like

```
grep METIS\ 1 ${PETSC_DIR}/include/petscconf.h
${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h 2>/dev/null
```

On the PETSc provided by moose which comes with metis, it shows

```
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/include/petscconf.h:#define
PETSC_HAVE_PARMETIS 1
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/include/petscconf.h:#define
PETSC_HAVE_METIS 1
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/include/petscconf.h:#define
PETSC_HAVE_LIBMETIS 1
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/include/petscconf.h:#define
PETSC_HAVE_LIBPARMETIS 1
```

Can you try using ldd as well to see if it's really linked to metis?
Something like

ldd ${PETSC_DIR}/lib/libpetsc.so
ldd ${PETSC_DIR}/${PETSC_ARCH}/lib/libpetsc.so

Regards
-- st

Jed Brown

unread,
Feb 2, 2018, 1:57:03 PM2/2/18
to Peterson, JW, moose-users
"Peterson, JW" <jw.pe...@inl.gov> writes:

> Can you try running ldd on
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.3.7.6.so (just
> guessing at the filename here). Maybe the wrong Metis is already linked in
> at that level?

I suspect it's something along these lines where you're getting the bad
version indirectly. You can use

readelf -d libfoo.so

or

objdump -p libfoo.so

to see the direct dependencies of an executable or library (ldd includes
all the transitive dependencies and doesn't distinguish). The above
commands also tell you about RUNPATH or RPATH which tell you where that
library searches for its dependencies (searched before LD_LIBRARY_PATH
which comes before default paths). The lddtree tool (in pax-utils) is
also very handy.

https://codeyarns.com/2015/12/16/how-to-view-hierarchy-of-shared-library-dependencies-using-lddtree/

Jed Brown

unread,
Feb 2, 2018, 1:57:03 PM2/2/18
to luksh...@gmail.com, moose...@googlegroups.com
luksh...@gmail.com writes:

> Yes, I am puzzled too. That should make the linker search the PETSc
> location first before the system defaults as man ld says.
>
> ```
> The directories are searched in the order in which they are specified on
> the command line. Directories specified on the command line are searched
> before the default directories. All -L options apply to all -l options,
> regardless of the order in which the options appear.
> ```
> I'm a bit uneasy about the last sentence. Maybe somewhere in the
> configuring to build some binaries, the one in /usr/lib is somehow
> found. I don't know how it happened or why it should happen, though.

The -L/the/path option applies to link (build) time while RPATH
(-Wl,-rpath,/the/path) is stored in the shared object and used at load
(run) time before searching LD_LIBRARY_PATH. Use objdump or readelf to
tell what information was written into the shared object when it was
built. Use ldd or lddtree to see how it is resolved at load (run) time.

luksh...@gmail.com

unread,
Feb 5, 2018, 12:46:06 AM2/5/18
to moose...@googlegroups.com
On Friday, February 02, 2018 12:06 PM, Jed Brown wrote:

Thanks! I have a lot of questions ... :-)

>> Can you try running ldd on
>> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.3.7.6.so (just
>> guessing at the filename here). Maybe the wrong Metis is already linked in
>> at that level?
>
> I suspect it's something along these lines where you're getting the bad
> version indirectly. You can use
>
> readelf -d libfoo.so

```
$ readelf -d /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so

...

0x000000000000001d (RUNPATH) Library runpath:
[/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib:/opt/moose/mpich-3.2/gcc-7.2.0/lib:/opt/moose/gcc-7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0:/opt/moose/gcc-7.2.0/lib64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/moose/gcc-7.2.0/lib]
```

The /lib/x86_64-linux-gnu and /usr/lib/x86_64-linux-gnu come after
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib. Order seems OK?

> to see the direct dependencies of an executable or library (ldd includes
> all the transitive dependencies and doesn't distinguish). The above
> commands also tell you about RUNPATH or RPATH which tell you where that
> library searches for its dependencies (searched before LD_LIBRARY_PATH
> which comes before default paths). The lddtree tool (in pax-utils) is
> also very handy.
```

$ lddtree /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so

libpetsc.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so
(interpreter => none)
libsuperlu_dist.so.5 =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libsuperlu_dist.so.5
libparmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
libmpicxx.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpicxx.so.12
librt.so.1 => not found
libstdc++.so.6 => /opt/moose/gcc-7.2.0/lib64/libstdc++.so.6
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6
libxcb.so.1 => not found
libmpifort.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpifort.so.12
libgfortran.so.4 => /opt/moose/gcc-7.2.0/lib64/libgfortran.so.4
libquadmath.so.0 => /opt/moose/gcc-7.2.0/lib64/libquadmath.so.0
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
libmpi.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpi.so.12
libgomp.so.1 => /opt/moose/gcc-7.2.0/lib64/libgomp.so.1
libgcc_s.so.1 => /opt/moose/gcc-7.2.0/lib64/libgcc_s.so.1
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
```

For comparison, here's the ldd result

```
$ ldd /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so
linux-vdso.so.1 => (0x00007ffcbcfd5000)
libsuperlu_dist.so.5 =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libsuperlu_dist.so.5
(0x00007efdb7a56000)
libparmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
(0x00007efdb7818000)
libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
(0x00007efdb75a0000)
libmpicxx.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpicxx.so.12
(0x00007efdb737a000)
libstdc++.so.6 => /opt/moose/gcc-7.2.0/lib64/libstdc++.so.6
(0x00007efdb6ffa000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efdb6cf1000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007efdb69b7000)
libmpifort.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpifort.so.12
(0x00007efdb677d000)
libgfortran.so.4 => /opt/moose/gcc-7.2.0/lib64/libgfortran.so.4
(0x00007efdb63ac000)
libquadmath.so.0 => /opt/moose/gcc-7.2.0/lib64/libquadmath.so.0
(0x00007efdb616e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efdb5f6a000)
libmpi.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpi.so.12
(0x00007efdb59e2000)
libgomp.so.1 => /opt/moose/gcc-7.2.0/lib64/libgomp.so.1
(0x00007efdb57b5000)
libgcc_s.so.1 => /opt/moose/gcc-7.2.0/lib64/libgcc_s.so.1
(0x00007efdb559f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007efdb5382000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efdb4fb8000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efdb4db0000)
/lib64/ld-linux-x86-64.so.2 (0x00007efdb9160000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007efdb4b8e000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007efdb498a000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
(0x00007efdb4784000)
```

Why lddtree can't find some of the (lower level) libraries but all are
found by ldd?

What does 'interpreter => none' mean? Is it harmful?

Finally, a question about the error message when building libmesh

```
CXXLD unit_tests-dbg
CXX src/systems/libmesh_oprof_la-dg_fem_context.lo
CXX src/systems/libmesh_oprof_la-diff_context.lo
/usr/bin/ld: gk_cur_jbufs: TLS definition in
/usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
non-TLS definition in
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
/usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:1368: recipe for target 'unit_tests-dbg' failed
make[1]: *** [unit_tests-dbg] Error 1
make[1]: *** Waiting for unfinished jobs....
```

I don't understand the error message. What's the difference between a
TLS defintion and a non-TLS one? Readelf shows both a .tdata and .data
sections in my system libmetis.so but only .data section in libmetis.so
provided by the moose-environment package.

Regards,
-- st

Peterson, JW

unread,
Feb 5, 2018, 11:35:34 AM2/5/18
to moose-users
On Sun, Feb 4, 2018 at 10:45 PM, <luksh...@gmail.com> wrote:

Finally, a question about the error message when building libmesh

```
  CXXLD    unit_tests-dbg
  CXX      src/systems/libmesh_oprof_la-dg_fem_context.lo
  CXX      src/systems/libmesh_oprof_la-diff_context.lo
/usr/bin/ld: gk_cur_jbufs: TLS definition in /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches non-TLS definition in /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
/usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:1368: recipe for target 'unit_tests-dbg' failed
make[1]: *** [unit_tests-dbg] Error 1
make[1]: *** Waiting for unfinished jobs....
```

Does the uni_tests-opt executable actually link successfully for you, and the error is only with the dbg mode executable? 


 
I don't understand the error message. What's the difference between a TLS defintion and a non-TLS one? Readelf shows both a .tdata and .data sections in my system libmetis.so but only .data section in libmetis.so provided by the moose-environment package.

TLS is an abbreviation for "thread local storage".  A variable can be flagged as using thread local storage by prefixing it with "__thread" or some other compiler-specific directive.

The problematic variable, gk_cur_jbufs, is defined in GKlib/gk_externs.h. The PETSc version of this header defines the "metis__thread" specifier to refer to the compiler-specific variant of TLS:

extern metis__thread int gk_cur_jbufs;

And I just realized that PETSc always disables TLS in their Metis builds; the cmake test in externalpackages/git.metis/GKlib/GKlibSystem.cmake is completely commented out:

set(GKlib_TLS "")
# Custom check for TLS.
#foreach(tls_name "__thread" "__declspec(thread)")
#  check_c_source_runs("${tls_name} int x; int main() {x=0; return x;}" HAVE${tls_name})
#  if(HAVE${tls_name})
#    set(GKlib_TLS "${tls_name}")
#    message(STATUS "checking for ${tls_name} thread-local storage - found")
#    break()
#  else()
#    message(STATUS "checking for ${tls_name} thread-local storage - not found")
#  endif(HAVE${tls_name})
#endforeach()

so we should probably just do the same in libMesh.

That won't fix the issue with Ubuntu or other system Metis installs declaring a TLS specifier, and us accidentally linking to it, however...

--
John

Jed Brown

unread,
Feb 5, 2018, 12:22:25 PM2/5/18
to luksh...@gmail.com, moose...@googlegroups.com
> libxcb.so.1 => /usr/lib/x86_64-linux-gnu /libxcb.so.1 (0x00007efdb4b8e000)
> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007efdb498a000)
> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
> (0x00007efdb4784000)
> ```
>
> Why lddtree can't find some of the (lower level) libraries but all are
> found by ldd?

lddtree just reads the ELF files instead of executing (thus running code
in ld-linux-x86-64.so.2). I don't know why lddtree isn't resolving
libxcb.so.1, but that's why libXau.so.6 isn't found.

> What does 'interpreter => none' mean? Is it harmful?

The ELF INTERP header is normally only set for executables. It is
possible to build libraries that are also executables (libc.so.* is
usually an exception; you can also run it directly), in which case they
must specify their dynamic linker.

> Finally, a question about the error message when building libmesh
>
> ```
> CXXLD unit_tests-dbg
> CXX src/systems/libmesh_oprof_la-dg_fem_context.lo
> CXX src/systems/libmesh_oprof_la-diff_context.lo
> /usr/bin/ld: gk_cur_jbufs: TLS definition in
> /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
> non-TLS definition in
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
> /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
> collect2: error: ld returned 1 exit status
> Makefile:1368: recipe for target 'unit_tests-dbg' failed
> make[1]: *** [unit_tests-dbg] Error 1
> make[1]: *** Waiting for unfinished jobs....
> ```
>
> I don't understand the error message. What's the difference between a
> TLS defintion and a non-TLS one? Readelf shows both a .tdata and .data
> sections in my system libmetis.so but only .data section in libmetis.so
> provided by the moose-environment package.

TLS means Thread Local Storage. One version was built with threading
support and the other without.

I can't tell from what I've seen so far how
/usr/lib/x86_64-linux-gnu/libmetis.so.5 is being found.

luksh...@gmail.com

unread,
Feb 6, 2018, 12:55:29 AM2/6/18
to moose...@googlegroups.com
On Tuesday, February 06, 2018 12:35 AM, Peterson, JW wrote:
>
>
> On Sun, Feb 4, 2018 at 10:45 PM, <luksh...@gmail.com
> <mailto:luksh...@gmail.com>> wrote:
>
>
> Finally, a question about the error message when building libmesh
>
> ```
>   CXXLD    unit_tests-dbg
>   CXX      src/systems/libmesh_oprof_la-dg_fem_context.lo
>   CXX      src/systems/libmesh_oprof_la-diff_context.lo
> /usr/bin/ld: gk_cur_jbufs: TLS definition in
> /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
> non-TLS definition in
> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
> section .data
> /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
> collect2: error: ld returned 1 exit status
> Makefile:1368: recipe for target 'unit_tests-dbg' failed
> make[1]: *** [unit_tests-dbg] Error 1
> make[1]: *** Waiting for unfinished jobs....
> ```
>
>
> Does the uni_tests-opt executable actually link successfully for you,
> and the error is only with the dbg mode executable?

No, not in the failed builds. I can't find it also in a successful
build, though.
Thanks for the explanation!

I see that "-Wl,-rpath/usr/lib/x86_64-linux-gnu" etc are specified for
the system library directories as well. Would removing them help to
avoid the accidental linking?

Please look at my reply to Jed that I'll re-post to the list in a moment
for some additional information. (I hit the wrong button in that reply.)

Regards,
-- st

luksh...@gmail.com

unread,
Feb 6, 2018, 12:57:17 AM2/6/18
to moose-users
On Monday, February 05, 2018 10:45 PM, Jed Brown wrote:

Thanks for the explanation!

[snipped]

>> Finally, a question about the error message when building libmesh
>>
>> ```
>> CXXLD unit_tests-dbg
>> CXX src/systems/libmesh_oprof_la-dg_fem_context.lo
>> CXX src/systems/libmesh_oprof_la-diff_context.lo
>> /usr/bin/ld: gk_cur_jbufs: TLS definition in
>> /usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
>> non-TLS definition in
>> /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
>> /usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
>> collect2: error: ld returned 1 exit status
>> Makefile:1368: recipe for target 'unit_tests-dbg' failed
>> make[1]: *** [unit_tests-dbg] Error 1
>> make[1]: *** Waiting for unfinished jobs....
>> ```
>>
>> I don't understand the error message. What's the difference between a
>> TLS defintion and a non-TLS one? Readelf shows both a .tdata and .data
>> sections in my system libmetis.so but only .data section in libmetis.so
>> provided by the moose-environment package.
>
> TLS means Thread Local Storage. One version was built with threading
> support and the other without.
>
> I can't tell from what I've seen so far how
> /usr/lib/x86_64-linux-gnu/libmetis.so.5 is being found.
>

I think I found it. While trying to build libmesh again adding the
option --disable-cppunit, the compilation again broke down somewhere
further with a similar error:

```
CXXLD fparser_parse-opt
CXXLD getpot_parse-opt
/usr/bin/ld: gk_cur_jbufs: TLS definition in
/usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
non-TLS definition in
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
/usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:11020: recipe for target 'fparser_parse-opt' failed
make[1]: *** [fparser_parse-opt] Error 1
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: gk_cur_jbufs: TLS definition in
/usr/lib/x86_64-linux-gnu/libmetis.so.5 section .tdata mismatches
non-TLS definition in
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so section .data
/usr/lib/x86_64-linux-gnu/libmetis.so.5: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:11038: recipe for target 'getpot_parse-opt' failed
make[1]: *** [getpot_parse-opt] Error 1
```

Running lddtree on the libraries built in .libs/ showed

```
$ lddtree libmesh/build/.libs/libmesh_opt.so



libmesh_opt.so => ../moose.git/libmesh/build/.libs/libmesh_opt.so
(interpreter => none)

libnetcdf.so.11 =>
/home/srv/w/fem/moose/git/00build/metis-conflict/20180205/moose.git/libmesh/build/contrib/netcdf/v4/liblib/.libs/libnetcdf.so.11

libcurl-gnutls.so.4 => /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4

libidn.so.11 => not found

librtmp.so.1 => not found

libnettle.so.6 => not found

libgnutls.so.30 => not found

libgssapi_krb5.so.2 => not found

liblber-2.4.so.2 => not found

libldap_r-2.4.so.2 => not found

libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1

libtbb.so.2 => /usr/lib/x86_64-linux-gnu/libtbb.so.2

libtbbmalloc.so.2 => /usr/lib/x86_64-linux-gnu/libtbbmalloc.so.2

libpetsc.so.3.7 =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libpetsc.so.3.7

libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so

libsuperlu_dist.so.5 =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libsuperlu_dist.so.5

libparmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so

libmetis.so.5 => /usr/lib/x86_64-linux-gnu/libmetis.so.5

ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2

libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6

libxcb.so.1 => not found

libmpifort.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpifort.so.12

libgfortran.so.4 => /opt/moose/gcc-7.2.0/lib/../lib64/libgfortran.so.4

libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0

libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1

libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2

libmpicxx.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpicxx.so.12

libmpi.so.12 => /opt/moose/mpich-3.2/gcc-7.2.0/lib/libmpi.so.12

librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1

libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0

libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6

libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6

libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6

libgcc_s.so.1 => /opt/moose/gcc-7.2.0/lib/../lib64/libgcc_s.so.1

```

It did find the petsc-provided metis first but later, it hit the system
metis at /usr/lib/x86_64-linux-gnu. I was using these additional options
on the commandline for update_and_rebuild_libmesh.sh

--disable-slepc --disable-eigen --with-metis=PETSc --disable-cppunit

It seems the configure script tries to search for metis in the default
places even with --with-metis=PETSc turned on.

Regards,
-- st

Peterson, JW

unread,
Feb 6, 2018, 1:16:13 PM2/6/18
to moose-users
Libmesh's configure *doesn't search* for Metis at all (you can check out the test logic in libmesh/m4/metis.m4) . If PETSc doesn't have Metis, then we use the one in contrib.

I don't know if the ordering of lddtree tells us anything, but it just occurred to me that PETSc's *Parmetis* could be the culprit here. It may be what is linking in the system Metis instead of PETSc's Metis by accident...

I'll take a closer look at PETSc's Parmetis build scripts...

--
John

Jed Brown

unread,
Feb 6, 2018, 6:01:53 PM2/6/18
to Peterson, JW, moose-users
"Peterson, JW" <jw.pe...@inl.gov> writes:

>> It seems the configure script tries to search for metis in the default
>> places even with --with-metis=PETSc turned on.
>>
>
> Libmesh's configure *doesn't search* for Metis at all (you can check out
> the test logic in libmesh/m4/metis.m4) . If PETSc doesn't have Metis, then
> we use the one in contrib.
>
> I don't know if the ordering of lddtree tells us anything, but it just
> occurred to me that PETSc's *Parmetis* could be the culprit here. It may be
> what is linking in the system Metis instead of PETSc's Metis by accident...
>
> I'll take a closer look at PETSc's Parmetis build scripts...

You can run ldd or lddtree on that libparmetis.so. It didn't look like
that was the culprit from my parsing of the parent email, but it's easy
to misread...

John

unread,
Feb 6, 2018, 7:41:47 PM2/6/18
to moose-users
Hi All,

I got Moose working. I was setting the environmental variables in the terminal because I wanted to keep other simulation running, which turned out to be a problem.

However, even after installing I am not getting the forced iterations. In my input file I have

petsc_options_iname = '-pc_type -pc_factor_mat_solver_package -snes_force_iteration'
petsc_options_value = 'lu superlu_dist 1'

Time Step 94, time = 3.43864
                dt = 0.04
 0 Nonlinear |R| = 1.027560e-08
 1 Nonlinear |R| = 3.808505e-14
 Solve Converged!

Outlier Variable Residual Norms:
  potential: 2.968572e-14

Postprocessor Values:
+----------------+----------------------+------------------------+----------------------------+
| time           | Full_EmissionCurrent | Native_EmissionCurrent | Thermionic_EmissionCurrent |
+----------------+----------------------+------------------------+----------------------------+
:                :                      :                        :                            :
|   2.878642e+00 |         1.134771e-02 |           1.134671e-02 |               1.134771e-02 |
|   2.918642e+00 |         1.133180e-02 |           1.133080e-02 |               1.133180e-02 |
|   2.958642e+00 |         1.132109e-02 |           1.132009e-02 |               1.132109e-02 |
|   2.998642e+00 |         1.131387e-02 |           1.131287e-02 |               1.131387e-02 |
|   3.038642e+00 |         1.130900e-02 |           1.130800e-02 |               1.130900e-02 |
|   3.078642e+00 |         1.130571e-02 |           1.130471e-02 |               1.130571e-02 |
|   3.118642e+00 |         1.130349e-02 |           1.130250e-02 |               1.130349e-02 |
|   3.158642e+00 |         1.130200e-02 |           1.130100e-02 |               1.130200e-02 |
|   3.198642e+00 |         1.130099e-02 |           1.129999e-02 |               1.130099e-02 |
|   3.238642e+00 |         1.130030e-02 |           1.129930e-02 |               1.130030e-02 |
|   3.278642e+00 |         1.129984e-02 |           1.129884e-02 |               1.129984e-02 |
|   3.318642e+00 |         1.129953e-02 |           1.129853e-02 |               1.129953e-02 |
|   3.358642e+00 |         1.129932e-02 |           1.129832e-02 |               1.129932e-02 |
|   3.398642e+00 |         1.129918e-02 |           1.129818e-02 |               1.129918e-02 |
|   3.438642e+00 |         1.129908e-02 |           1.129808e-02 |               1.129908e-02 |
+----------------+----------------------+------------------------+----------------------------+


Time Step 95, time = 3.47864
                dt = 0.04
 0 Nonlinear |R| = 6.942181e-09
 Solve Converged!

Outlier Variable Residual Norms:
  potential: 4.908863e-09
  native_potential: 4.908863e-09

Postprocessor Values:
+----------------+----------------------+------------------------+----------------------------+
| time           | Full_EmissionCurrent | Native_EmissionCurrent | Thermionic_EmissionCurrent |
+----------------+----------------------+------------------------+----------------------------+
:                :                      :                        :                            :
|   2.918642e+00 |         1.133180e-02 |           1.133080e-02 |               1.133180e-02 |
|   2.958642e+00 |         1.132109e-02 |           1.132009e-02 |               1.132109e-02 |
|   2.998642e+00 |         1.131387e-02 |           1.131287e-02 |               1.131387e-02 |
|   3.038642e+00 |         1.130900e-02 |           1.130800e-02 |               1.130900e-02 |
|   3.078642e+00 |         1.130571e-02 |           1.130471e-02 |               1.130571e-02 |
|   3.118642e+00 |         1.130349e-02 |           1.130250e-02 |               1.130349e-02 |
|   3.158642e+00 |         1.130200e-02 |           1.130100e-02 |               1.130200e-02 |
|   3.198642e+00 |         1.130099e-02 |           1.129999e-02 |               1.130099e-02 |
|   3.238642e+00 |         1.130030e-02 |           1.129930e-02 |               1.130030e-02 |
|   3.278642e+00 |         1.129984e-02 |           1.129884e-02 |               1.129984e-02 |
|   3.318642e+00 |         1.129953e-02 |           1.129853e-02 |               1.129953e-02 |
|   3.358642e+00 |         1.129932e-02 |           1.129832e-02 |               1.129932e-02 |
|   3.398642e+00 |         1.129918e-02 |           1.129818e-02 |               1.129918e-02 |
|   3.438642e+00 |         1.129908e-02 |           1.129808e-02 |               1.129908e-02 |
|   3.478642e+00 |         1.129908e-02 |           1.129808e-02 |               1.129908e-02 |
+----------------+----------------------+------------------------+----------------------------+


Time Step 96, time = 3.51864
                dt = 0.04
 0 Nonlinear |R| = 1.163231e-08
 1 Nonlinear |R| = 3.236214e-14
 Solve Converged!

Outlier Variable Residual Norms:
  potential: 2.542211e-14

Postprocessor Values:
+----------------+----------------------+------------------------+----------------------------+
| time           | Full_EmissionCurrent | Native_EmissionCurrent | Thermionic_EmissionCurrent |
+----------------+----------------------+------------------------+----------------------------+
:                :                      :                        :                            :
|   2.958642e+00 |         1.132109e-02 |           1.132009e-02 |               1.132109e-02 |
|   2.998642e+00 |         1.131387e-02 |           1.131287e-02 |               1.131387e-02 |
|   3.038642e+00 |         1.130900e-02 |           1.130800e-02 |               1.130900e-02 |
|   3.078642e+00 |         1.130571e-02 |           1.130471e-02 |               1.130571e-02 |
|   3.118642e+00 |         1.130349e-02 |           1.130250e-02 |               1.130349e-02 |
|   3.158642e+00 |         1.130200e-02 |           1.130100e-02 |               1.130200e-02 |
|   3.198642e+00 |         1.130099e-02 |           1.129999e-02 |               1.130099e-02 |
|   3.238642e+00 |         1.130030e-02 |           1.129930e-02 |               1.130030e-02 |
|   3.278642e+00 |         1.129984e-02 |           1.129884e-02 |               1.129984e-02 |
|   3.318642e+00 |         1.129953e-02 |           1.129853e-02 |               1.129953e-02 |
|   3.358642e+00 |         1.129932e-02 |           1.129832e-02 |               1.129932e-02 |
|   3.398642e+00 |         1.129918e-02 |           1.129818e-02 |               1.129918e-02 |
|   3.438642e+00 |         1.129908e-02 |           1.129808e-02 |               1.129908e-02 |
|   3.478642e+00 |         1.129908e-02 |           1.129808e-02 |               1.129908e-02 |
|   3.518642e+00 |         1.129897e-02 |           1.129797e-02 |               1.129897e-02 |
+----------------+----------------------+------------------------+----------------------------+

luksh...@gmail.com

unread,
Feb 6, 2018, 11:09:38 PM2/6/18
to moose...@googlegroups.com
```
$ ldd /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
linux-vdso.so.1 => (0x00007ffc569c3000)
libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
(0x00007f44179f1000)
...

$ lddtree /opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
libparmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libparmetis.so
(interpreter => none)
libmetis.so =>
/opt/moose/petsc-3.7.6/mpich-3.2_gcc-7.2.0-opt/lib/libmetis.so
...
```

So the PETSc's Parmetis is linked to it's own metis, as designed.

I've taken the liberty to rename the subject of this thread as it is now
incompatible with the original (and glad that John Hasse has solved his
linking problem).

Regards,
-- st

Peterson, JW

unread,
Feb 7, 2018, 9:58:02 AM2/7/18
to moose-users
On Tue, Feb 6, 2018 at 5:41 PM, John <haas...@gmail.com> wrote:
Hi All,

I got Moose working. I was setting the environmental variables in the terminal because I wanted to keep other simulation running, which turned out to be a problem.

Wait, what? Can you please elaborate on this, because it might help someone else with the same problem in the future...

--
John

John Haase

unread,
Feb 9, 2018, 12:23:36 AM2/9/18
to moose...@googlegroups.com
@John,

From the command line I input

PETSC_DIR=${HOME}/projects/petsc/ PETSC_ARCH=arch-linux2-c-opt ; MOOSE_DIR=${HOME}/projects/moose/

and got install errors.


Regards,

John R. Haase

--
You received this message because you are subscribed to a topic in the Google Groups "moose-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/moose-users/Xnujh5FUJ4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.

Kong, Fande

unread,
Feb 19, 2018, 11:52:07 AM2/19/18
to moose...@googlegroups.com
On Tue, Feb 6, 2018 at 5:41 PM, John <haas...@gmail.com> wrote:
Hi All,

I got Moose working. I was setting the environmental variables in the terminal because I wanted to keep other simulation running, which turned out to be a problem.

However, even after installing I am not getting the forced iterations. In my input file I have

We decided to have this feature in next PETSc (PETSc-3.8.4).   This makes us more easily maintain moose.

Fande,

 

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages