dealii installation issue on NERSC Cori

89 views
Skip to first unread message

Aaditya Lakshmanan

unread,
Oct 22, 2020, 9:52:21 PM10/22/20
to deal.II User Group
Hi Everyone,
   I have been trying to install deal.ii on the NERSC Cori System and after a seemingly successful cmake and make install, make test yields failure for all the tests setup. I briefly detail the procedure I followed and the outputs and errors obtained.

As a pre-requisite I installed p4est using the setup script(with some modifications) and petsc manually with the configure.py script and appropriate settings. The following modules were loaded :

Currently Loaded Modulefiles:
  1) modules/3.2.11.4              5) craype-network-aries          9) PrgEnv-intel/6.0.5           13) valgrind/3.15.0
  2) altd/2.0                      6) craype/2.6.2                 10) craype-mic-knl               14) zlib/1.2.11
  3) darshan/3.1.7                 7) pmi/5.0.14                   11) cray-mpich/7.7.14            15) cmake/3.14.4
  4) intel/19.0.3.199              8) atp/2.1.3                    12) craype-hugepages2M           16) cray-hdf5-parallel/1.10.5.2


After setting the environment variables P4EST_DIR, PETSC_DIR and PETSC_ARCH, I compiled dealii-9.1.1 as follows :

cmake -DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DDEAL_II_WITH_MPI=ON -DDEAL_II_WITH_PETSC=ON -DDEAL_II_WITH_P4EST=ON -DDEAL_II_WITH_LAPACK=ON  -DCMAKE_INSTALL_PREFIX=$CPFE_PACKAGES/dealii_install  ../dealii

which ran successfully. I have attached the file detailed.log. Then installing via

make -j8 install 

ran without issues. I have attached the output from the above command in the file make_output.txt. Then running the tests as 

make -j4 test 

reports failure for all the tests. I have attached the entire output from the above command in the file test_output.txt and an example error below :

 2/11 Test  #2: step.release .....................***Failed    0.69 sec
gmake[6]: *** read jobs pipe: Bad file descriptor.  Stop.
gmake[6]: *** Waiting for unfinished jobs....
gmake[5]: *** [CMakeFiles/Makefile2:10861: tests/quick_tests/CMakeFiles/step.release.run.dir/rule] Error 2
gmake[4]: *** [Makefile:3713: step.release.run] Error 2
Test step.release: BUILD
===============================   OUTPUT BEGIN  ===============================
step.release: BUILD failed. Output:
[  1%] Built target obj_boost_iostreams_release


step.release: ******    BUILD failed    *******

===============================    OUTPUT END   ===============================
Expected stage PASSED - aborting
CMake Error at /global/project/projectdirs/m2360/packagesCPFE/dealii/cmake/scripts/run_test.cmake:140 (MESSAGE):
  *** abort

I  am unable to understand what the issue might. Any insight on this will be appreciated. Thank you.

Best,
Aaditya

detailed.log
make_output.txt
test_output.txt

Wolfgang Bangerth

unread,
Oct 22, 2020, 11:29:38 PM10/22/20
to dea...@googlegroups.com

Aaditya,
is NERSC Cori a machine where the front-end node runs a different
processor/system than the compute nodes? If so, you're building a library for
the compute nodes, and the tests are also built for the compute nodes. But
you're trying to run the test executables on the front-end node -- which might
help explain the error you see.

Best
W.


On 10/22/20 7:52 PM, Aaditya Lakshmanan wrote:
> Hi Everyone,
>    I have been trying to install deal.ii on the NERSC Cori System and after a
> seemingly successful cmake and make install, make test yields failure for all
> the tests setup. I briefly detail the procedure I followed and the outputs and
> errors obtained.
>
> As a pre-requisite I installed p4est using the setup script(with some
> modifications) and petsc manually with the configure.py script and appropriate
> settings. The following modules were loaded :
>
> Currently Loaded Modulefiles:
>   1) modules/3.2.11.4              5) craype-network-aries          9)
> PrgEnv-intel/6.0.5           13) valgrind/3.15.0
>   2) altd/2.0                      6) craype/2.6.2                 10)
> craype-mic-knl               14) zlib/1.2.11
>   3) darshan/3.1.7                 7) pmi/5.0.14                   11)
> cray-mpich/7.7.14            15) cmake/3.14.4
>   4) intel/19.0.3.199              8) atp/2.1.3                    12)
> craype-hugepages2M           16) cray-hdf5-parallel/1.10.5.2
>
>
> After setting the environment variables P4EST_DIR, PETSC_DIR and PETSC_ARCH, I
> compiled dealii-9.1.1 as follows :
>
> *cmake -DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment -DCMAKE_C_COMPILER=cc
> -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DDEAL_II_WITH_MPI=ON
> -DDEAL_II_WITH_PETSC=ON -DDEAL_II_WITH_P4EST=ON -DDEAL_II_WITH_LAPACK=ON
> -DCMAKE_INSTALL_PREFIX=$CPFE_PACKAGES/dealii_install  ../dealii
> *
>
> which ran successfully. I have attached the file *detailed.log*. Then
> installing via
>
> *make -j8 install *
>
> ran without issues. I have attached the output from the above command in the
> file *make_output.txt*. Then running the tests as
>
> *make -j4 test *
> *
> *
> reports failure for all the tests. I have attached the entire output from the
> above command in the file test_output.txt and an example error below :
>
> 2/11 Test  #2: step.release .....................***Failed    0.69 sec
> gmake[6]: *** read jobs pipe: Bad file descriptor.  Stop.
> gmake[6]: *** Waiting for unfinished jobs....
> gmake[5]: *** [CMakeFiles/Makefile2:10861:
> tests/quick_tests/CMakeFiles/step.release.run.dir/rule] Error 2
> gmake[4]: *** [Makefile:3713: step.release.run] Error 2
> Test step.release: BUILD
> ===============================   OUTPUT BEGIN  ===============================
> step.release: BUILD failed. Output:
> [  1%] Built target obj_boost_iostreams_release
>
>
> step.release: ******    BUILD failed    *******
>
> ===============================    OUTPUT END   ===============================
> Expected stage PASSED - aborting
> CMake Error at
> /global/project/projectdirs/m2360/packagesCPFE/dealii/cmake/scripts/run_test.cmake:140
> (MESSAGE):
>   *** abort
>
> I  am unable to understand what the issue might. Any insight on this will be
> appreciated. Thank you.
>
> Best,
> Aaditya
>
> --
> The deal.II project is located at http://www.dealii.org/
> <https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.dealii.org%2F&data=04%7C01%7CWolfgang.Bangerth%40colostate.edu%7C67682f15035d469a23df08d876f649b8%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C637390147477029066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=snjl%2Fkw1PYq4xwYh0SGB03l6XvsgS8%2FfWr5sdHO4aI0%3D&reserved=0>
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fforum%2Fdealii%3Fhl%3Den&data=04%7C01%7CWolfgang.Bangerth%40colostate.edu%7C67682f15035d469a23df08d876f649b8%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C637390147477039065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QD7qw6euanb6yVAY0KMhGpNc0%2B0APSrikkenzbG4wGU%3D&reserved=0>
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+un...@googlegroups.com
> <mailto:dealii+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/ec2fe941-4ba0-498f-8a9f-009b48022977n%40googlegroups.com
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fdealii%2Fec2fe941-4ba0-498f-8a9f-009b48022977n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7CWolfgang.Bangerth%40colostate.edu%7C67682f15035d469a23df08d876f649b8%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C637390147477039065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bEIaI%2BRLikBbmu8bt%2BL3QzA%2B%2Fs4nJLVFa%2BIfKLlVblE%3D&reserved=0>.


--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Aaditya Lakshmanan

unread,
Oct 23, 2020, 2:00:00 AM10/23/20
to deal.II User Group
Hi Wolfgang,
   Thanks for your suggestion. I just checked that the Cori login nodes are Haswell processors while the compute nodes on which I wish to run simulations eventually are KNL processors(certain modules are also with that intention). I will use idev to access the compute nodes interactively and try the build, installation and testing phases again. 

Best,
Aaditya

Aaditya Lakshmanan

unread,
Oct 23, 2020, 5:05:10 PM10/23/20
to deal.II User Group
Hi Wolfgang,
   I tried compiling, installing and running the deal.ii tests interactively on a KNL compute node. Both compilation and installation were seemingly successful but all the tests failed. I have attached the output from make test in the file test_output.txt. An example error looks like the following :
2/11 Test  #1: step.debug .......................***Failed  265.99 sec
gmake[6]: *** write jobserver: Invalid argument.  Stop.
gmake[6]: *** Waiting for unfinished jobs....
gmake[6]: *** write jobserver: Invalid argument.  Stop.
gmake[5]: *** [CMakeFiles/Makefile2:10829: tests/quick_tests/CMakeFiles/step.debug.run.dir/rule] Error 2
gmake[4]: *** [Makefile:3700: step.debug.run] Error 2
Test step.debug: BUILD
===============================   OUTPUT BEGIN  ===============================
step.debug: BUILD failed. Output:
[  0%] Built target obj_sundials_inst
[ 21%] Built target obj_physics_elasticity_inst
[ 30%] Built target obj_dofs_inst
[ 25%] Built target obj_matrix_free_inst
[ 23%] Built target obj_amd_global_debug
[ 34%] Built target obj_umfpack_DI_TRIPLET_MAP_NOX_debug
[ 30%] Built target obj_differentiation_ad_inst
[ 34%] Built target obj_multigrid_inst
[ 59%] Built target obj_physics_inst
[ 59%] Built target obj_non_matching_inst
[ 59%] Built target obj_gmsh_inst
[ 34%] Built target obj_distributed_inst
[ 34%] Built target kill-step.debug-OK
[ 34%] Built target obj_muparser_debug
[ 34%] Built target obj_umfpack_DL_SOLVE_debug
[ 59%] Built target obj_numerics_inst
[ 59%] Built target obj_boost_system_debug
[ 59%] Built target obj_umfpack_DL_STORE_debug
[ 59%] Built target obj_umfpack_DL_TRIPLET_MAP_NOX_debug
[ 59%] Built target obj_umfpack_DL_TRIPLET_MAP_X_debug
[ 59%] Built target obj_umfpack_DI_TSOLVE_debug
[ 78%] Built target obj_base_inst
[ 78%] Built target obj_grid_inst
[ 59%] Built target obj_umfpack_DI_SOLVE_debug
[ 34%] Built target obj_amd_int_debug
[ 59%] Built target obj_algorithms_inst
[ 59%] Built target obj_meshworker_inst
[ 59%] Built target obj_hp_inst
[ 78%] Built target obj_particle_inst
[ 59%] Built target obj_differentiation_sd_inst
[ 59%] Built target obj_opencascade_inst
[ 78%] Built target obj_lac_inst
[ 59%] Built target obj_amd_long_debug
[ 59%] Built target obj_umfpack_DL_TSOLVE_debug
[ 59%] Built target obj_umfpack_DI_STORE_debug
[ 59%] Built target obj_umfpack_DL_TRIPLET_NOMAP_X_debug
[ 59%] Built target obj_umfpack_DI_TRIPLET_MAP_X_debug
[ 59%] Built target obj_umfpack_DI_TRIPLET_NOMAP_X_debug
[ 59%] Built target obj_umfpack_GENERIC_debug
[ 78%] Built target obj_umfpack_DL_ASSEMBLE_debug
[ 59%] Built target obj_umfpack_DI_ASSEMBLE_debug
[ 84%] Built target obj_fe_inst


step.debug: ******    BUILD failed    *******

===============================    OUTPUT END   ===============================
Expected stage PASSED - aborting
CMake Error at /global/project/projectdirs/m2360/packagesCPFE/dealii/cmake/scripts/run_test.cmake:140 (MESSAGE):
  *** abort


Best,
Aaditya
test_output.txt

Timo Heister

unread,
Oct 24, 2020, 10:38:49 AM10/24/20
to dea...@googlegroups.com
Running the tests seems to have triggered a rebuild of deal.iII. Can you try running some of the tutorial steps instead? 

To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/b8878835-5d8a-4c18-92d8-dce463e0a4d8n%40googlegroups.com.

Aaditya Lakshmanan

unread,
Oct 24, 2020, 5:43:50 PM10/24/20
to deal.II User Group
Hi Timo,
    Thank you for your suggestion. I repeated the installation process and this time I didn't run the quick_tests. Rather I changed to examples/step-1 directory in the deal.ii installation and executed the following :

cmake .  
make 

which exited with the following error :

Scanning dependencies of target step-1
[ 50%] Building CXX object CMakeFiles/step-1.dir/step-1.cc.o
[100%] Linking CXX executable step-1
/usr/bin/ld: /global/project/projectdirs/m2360/packagesCPFE/dealii_install/lib/libdeal_II.g.so: file not recognized: file truncated
make[2]: *** [CMakeFiles/step-1.dir/build.make:95: step-1] Error 1
make[1]: *** [CMakeFiles/Makefile2:137: CMakeFiles/step-1.dir/all] Error 2
make: *** [Makefile:84: all] Error 2


I am not sure why this is the case since installation completed without issues. I have attached the files detailed.log and make_output.txt(output from make -j8 install on the KNL compute node).

Best,
Aaditya
detailed.log
make_output.txt

Aaditya Lakshmanan

unread,
Oct 25, 2020, 12:39:13 AM10/25/20
to deal.II User Group
Hi Timo, 
    I tried another compilation and installation of deal.ii without PETSC, P4EST and LAPACK(by setting the corresponding DEAL_II_WITH_PACKAGENAME=OFF). After successfully completing installation I tried the examples step-1 and step-2 with cmake . ; make ; make run, and both of them ran successfully. However, switching back to the build directory and running make test resulted in failure of all quick tests. Seems like there might then be some issue with the LAPACK, P4EST or PETSC installation I am using. I will try the installation process with different subsets of these packages and try to pinpoint the issue. What do you think?

Timo Heister

unread,
Oct 25, 2020, 2:10:21 PM10/25/20
to dea...@googlegroups.com
> Seems like there might then be some issue with the LAPACK, P4EST or PETSC installation I am using.

I doubt it, when looking at this error:

>> /usr/bin/ld: /global/project/projectdirs/m2360/packagesCPFE/dealii_install/lib/libdeal_II.g.so: file not recognized: file truncated

This sounds like the linking of the .so file failed. Can you take a
look at this file (check the size?)? You can try deleting the file and
running "make install" again. Maybe the last linker step failed?
If this does not help, maybe try the release mode examples as well.




--
Timo Heister
http://www.math.clemson.edu/~heister/

Aaditya Lakshmanan

unread,
Oct 25, 2020, 7:25:41 PM10/25/20
to deal.II User Group
Hi Timo,
   I cleaned the previous installation and started with a fresh build. This time after a compilation and installation(with MPI, LAPACK, P4EST and PETSC), trying to run examples/step-1 in the installation directory :

cmake .
make -j16

ran successfully after which 

make run 

resulted in the following output :

[ 66%] Built target step-1
[100%] Run step-1 with Debug configuration

OpenBLAS Warning : The number of CPU/Cores(272) is beyond the limit(256). Terminated.
make[3]: *** [CMakeFiles/run.dir/build.make:58: CMakeFiles/run] Error 1
make[2]: *** [CMakeFiles/Makefile2:174: CMakeFiles/run.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:181: CMakeFiles/run.dir/rule] Error 2
make: *** [Makefile:157: run] Error 2

Regarding your questions and suggestions, the libdeal_ii.g.so file is about 1.6GB. Do you know what the above error signifies? Thank you.

Best,
Aaditya

Timo Heister

unread,
Oct 26, 2020, 9:01:44 AM10/26/20
to dea...@googlegroups.com
Can you try setting

export OMP_NUM_THREADS=1

before running any of the examples?

If you are planning on running MPI-only codes, you don't need to use
OpenMP in your BLAS implementation.
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/5facbeb5-33f3-4474-8ae0-bca848c20b6an%40googlegroups.com.

Aaditya Lakshmanan

unread,
Oct 26, 2020, 11:27:42 AM10/26/20
to deal.II User Group
Hi Timo,
   Thanks for you suggestion. I tested step-1 and step-2 again and they ran successfully after I set the number of OpenMP threads to 1. I only plan plan on running MPI codes without any multi-threading. The BLAS implementation I am using is a part of the Intel MKL library which is already present within one of the default modules which loads intel compilers and intel implementations of other commonly used libraries. Would you recommend using another BLAS implementation or install one manually?  

Installing deal.ii on the Cori system was definitely a bit convoluted since even the original cmake and make install had to be run on the compute node, and that was an extremely slow process taking a few hours to complete even with make -j64 install(unlike the login nodes where the entire thing would be done in about 30 minutes). In one of their(NERSC) presentations(https://www.nersc.gov/assets/Uploads/06-Compiling-Codes.pdf, slide 14) they do recommend swapping a haswell module with mic-knl module and compiling on login nodes whenever possible(slide 20). That's what I had tried earlier and did so again yesterday, and then running step-1(interactively on a compute node) yielded the following error :

make[3]: *** No rule to make target '/usr/lib64/libopenblas.so', needed by 'step-1'. Stop.
make[2]: *** [CMakeFiles/Makefile2:137: CMakeFiles/step-1.dir/all] Error 2

make[1]: *** [CMakeFiles/Makefile2:181: CMakeFiles/run.dir/rule] Error 2
make: *** [Makefile:157: run] Error 2 

the reason for which I don't know. I will probably go through the entire installation process that worked for me once again and document it. Eventually I need deal.ii as a prerequisite for another package that is built on it. Thank you for your help.

Best,
Aaditya

Timo Heister

unread,
Oct 26, 2020, 1:25:42 PM10/26/20
to dea...@googlegroups.com
> Thanks for you suggestion. I tested step-1 and step-2 again and they ran successfully after I set the number of OpenMP threads to 1. I only plan plan on running MPI codes without any multi-threading. The BLAS implementation I am using is a part of the Intel MKL library which is already present within one of the default modules which loads intel compilers and intel implementations of other commonly used libraries. Would you recommend using another BLAS implementation or install one manually?

No, just make sure you set OMP_NUM_THREADS=1 in your job script.

> Installing deal.ii on the Cori system was definitely a bit convoluted since even the original cmake and make install had to be run on the compute node, and that was an extremely slow process taking a few hours to complete even with make -j64 install(unlike the login nodes where the entire thing would be done in about 30 minutes). In one of their(NERSC) presentations(https://www.nersc.gov/assets/Uploads/06-Compiling-Codes.pdf, slide 14) they do recommend swapping a haswell module with mic-knl module and compiling on login nodes whenever possible(slide 20). That's what I had tried earlier and did so again yesterday, and then running step-1(interactively on a compute node) yielded the following error :

Yes, while I have no experience with working on system like this, I am
not surprised that this is challenging.

> make[3]: *** No rule to make target '/usr/lib64/libopenblas.so', needed by 'step-1'. Stop.

That means this library is missing on the compute node.
Reply all
Reply to author
Forward
0 new messages