PETSc options for the cases with distributed mesh

175 views
Skip to first unread message

Yangyiwei Yang

unread,
Sep 21, 2018, 4:40:13 AM9/21/18
to moose-users
Dear MOOSE users

Recently I'm dealing with cases coupled with Cahn-Hilliard/Allen-Cahn/thermal conducting. It works perfectly in 2D and 3D mesh with small size. Then I come to the mesh file over 400 MB, and I try the "Split/Distributed Mesh" with 100 processors (2400 MB memory per process). Unfortunately, I encountered some problems.

Here's the status I had:


Parallelism:
  Num Processors:          100
  Num Threads:             1

Mesh: 
  Parallel Type:           distributed
  Mesh Dimension:          3
  Spatial Dimension:       3
  Nodes:                   
    Total:                 2065551
    Local:                 22774
  Elems:                   
    Total:                 2000000
    Local:                 20006
  Num Subdomains:          1
  Num Partitions:          100
  Partitioner:             parmetis

Nonlinear System:
  Num DOFs:                18589959
  Num Local DOFs:          204966
  Variables:               { "c" "w" "T" "gr0" "gr1" "gr2" "gr3" "gr4" "gr5" } 
  Finite Element Types:    "LAGRANGE" 
  Approximation Orders:    "FIRST" 

Auxiliary System:
  Num DOFs:                10065551
  Num Local DOFs:          102798
  Variables:               "bnds" { "var_indices" "unique_grains" } { "M" "dM/dT" } 
  Finite Element Types:    "LAGRANGE" "MONOMIAL" "MONOMIAL" 
  Approximation Orders:    "FIRST" "CONSTANT" "CONSTANT" 

Relationship Managers:
  Geometric                : GrainTrackerHaloRM (2 layers)

Execution Information:
  Executioner:             Transient
  TimeStepper:             IterationAdaptiveDT
  Solver Mode:             Preconditioned JFNK


Here's some problem I encountered:

1) When I use 

  petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type -sub_pc_type -pc_asm_overlap -pc_factor_mat_solver_package'
  petsc_options_value = 'asm          1201                         preonly             ilu                   4                         superlu_dist'

then comes the following error or directly core dumped

 1 Nonlinear |R| = [32m1.362830e+05 [39m
[1537473700.924691] [hpb0315:14918:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473700.996639] [hpb0315:14911:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473701.109700] [hpb0315:14927:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473719.355914] [hpb0313:16349:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473719.460046] [hpb0313:16368:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473719.786913] [hpb0313:16352:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473720.138977] [hpb0313:16369:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473720.255803] [hpb0313:16365:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473720.800708] [hpb0313:16369:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473721.175852] [hpb0313:16366:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473721.181278] [hpb0313:16366:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473722.047582] [hpb0313:16347:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473722.816282] [hpb0313:16367:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473723.035251] [hpb0313:16353:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473723.497716] [hpb0313:16353:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473736.753293] [hpb0313:16364:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473736.854541] [hpb0313:16364:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473737.164661] [hpb0313:16359:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537473739.069100] [hpb0313:16355:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
      0 Linear |R| = [32m1.362830e+05 [39m
when I use 

  petsc_options_iname = '-pc_type -ksp_gmres_restart  -sub_pc_type  -sub_ksp_type  -pc_asm_overlap -pc_factor_mat_solver_package'
  petsc_options_value = 'asm       1201                ksp           preonly        4               mumps'

problem is the same.

2) When I use 

  petsc_options_iname = '-pc_type -pc_factor_mat_solver_package -ksp_gmres_restart'
  petsc_options_value = ' ksp       mumps                         1201'

It runs and converges well at the beginning, then It slows down after few time steps and eventually stops without any output and log, no core files are generated.


I tried modifying the parameters and other preconditioning option, the problem is much the same. So I don't know where I did wrong or there is actually suitable PETSc option to deal with such problem with large mesh. I would like to hear your response.

Sincerely,
Yang

Cody Permann

unread,
Sep 21, 2018, 11:41:39 AM9/21/18
to moose...@googlegroups.com
Why are you using a direct solver for the Cahn-Hillard equation? This is a very easy problem to precondition so you can just use "asm". The direct solver will use a lot more memory and probably won't improve your convergence much if at all.  It's likely you'll need a few more order parameters as well if you are running some sort of polycrystal simulation. Generally you'll need 8 in 2D.

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/moose-users/3001e27a-f146-4d4b-a5d5-9baa3f84ec87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yangyiwei Yang

unread,
Sep 21, 2018, 11:54:13 AM9/21/18
to moose-users
Dear Cody

Thanks very much for your reply! Yes I'm dealing with some sintering related cases with multiple particles (grains). Since they are just equal in size, and we don't need to see them having coalescence. I use an algorithm to reduce the amount of orientation order parameters and still well distributed among particles.

I'll try only the asm and see whether it works.

Best regards
Yang

Cody Permann

unread,
Sep 21, 2018, 11:58:33 AM9/21/18
to moose...@googlegroups.com
Thanks for sharing. Feel free to share videos or screen shots of your results when you get your simulations running!

Yangyiwei Yang

unread,
Sep 25, 2018, 5:12:06 AM9/25/18
to moose-users
Dear Cody
I‘m afraid that problem still exists using ASM (default sub_pc_type is ilu), the error message was 

[1537866246.311887] [hpb0223:23910:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866258.061171] [hpb0213:3018 :0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866267.473720] [hpb0229:16941:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866267.704965] [hpb0229:16934:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866268.893111] [hpb0229:16934:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866270.750484] [hpb0229:16936:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866275.082610] [hpb0229:16938:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866276.408986] [hpb0229:16938:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument
[1537866285.135036] [hpb0229:16937:0]        knem_ep.c:84   UCX  ERROR KNEM inline copy failed, err = -1 Invalid argument

seems that came from MPICH.

Yang

Yangyiwei Yang

unread,
Sep 25, 2018, 5:20:10 AM9/25/18
to moose-users
This is the console log with information. Preconditioning block and Executioner block are as follows

[Executioner]
  type = Transient
  solve_type = 'PJFNK'  # Preconditioned Jacobian-free Newton–Krylov
  # solve_type = 'NEWTON'
  l_max_its = 30
  l_tol = 1e-4
  nl_max_its = 50
  nl_abs_tol = 1e-5
  nl_rel_tol = 1e-06
  end_time = 10000
  # petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type -sub_pc_type -pc_asm_overlap -pc_factor_mat_solver_package'
  # petsc_options_value = 'asm      601                preonly       ilu          4  superlu_dist'

  petsc_options_iname = '-pc_type -ksp_gmres_restart -pc_asm_overlap'
  petsc_options_value = 'asm      1201                 4'

  # petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type -sub_pc_type -pc_asm_overlap'
  # petsc_options_value = 'asm      31                  preonly       ilu          2'


  [./TimeStepper]
    type = IterationAdaptiveDT
    dt = 1
    # dt_max = 10
    cutback_factor = 0.8
    growth_factor = 1.5
    optimal_iterations = 5
  [../]
  [./Adaptivity]
    initial_adaptivity = 3 
    refine_fraction = 0.9 
    coarsen_fraction = 0.1 
    max_h_level = 3 
  [../]
[]

[Preconditioning]
  [./coupled]
    type = SMP
    full = true
  [../]
[]
moose.log

walkand...@gmail.com

unread,
Sep 25, 2018, 5:32:47 AM9/25/18
to moose-users
Thank you Cody,

if I want solve the problem where the Cahn-Hilliard equation coupled with Mechanics, for example the Cahn-Hilliard equation coupled with hyperelastic material(Large deformation), then which kind of precondition is preferred or suggested? Currently, I'm using -pc_type=lu with suplerlu_dist(or mumps). I also tried -pc_type=asm, but it seems asm is not so efficient for such kind of coupled problem?

Best regards
Yang

在 2018年9月21日星期五 UTC+2下午5:41:39,Cody Permann写道:

Derek Gaston

unread,
Sep 25, 2018, 10:45:14 AM9/25/18
to MOOSE

Yangyiwei Yang

unread,
Sep 25, 2018, 10:51:07 AM9/25/18
to moose-users
Dear Derek

Thanks for your reply. Problem is that I used the split Cahn-Hilliard which was said to be terribly operated with Hypre/Boomeramg. Or you actually suggest me to switch to direct Cahn-Hilliard?

Best regard
Yang

Fande Kong

unread,
Sep 25, 2018, 11:10:26 AM9/25/18
to moose...@googlegroups.com
You could try fieldsplit to separate CH and HM, and each field may be handed by using hypre or asm. 

Fande,

On Tue, Sep 25, 2018 at 3:32 AM <walkand...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages