[1515] GA on infiniband cluster.

36 views
Skip to first unread message

jeff....@pnl.gov

unread,
Sep 3, 2010, 5:43:18 AM9/3/10
to hpct...@googlegroups.com
"Entered on 10/26/2009 at 11:34:27 PDT (GMT-0700) by Abhinav Vishnu:

Can you please let us know, if you are still facing the problem.

Thanks,

:- Abhinav


Entered on 07/29/2009 at 22:07:24 PDT (GMT-0700) by Abhinav Vishnu:

Hi,

We have recently release ga-4.2. Can you please download this and let us know, if you are still facing the problem?

Thanks and best regards,

:- Abhinav

Entered on 04/09/2009 at 10:27:19 PDT (GMT-0700) by EMAIL_HIDDEN:

Hi,

Thanks for the logs and details. The issue is probably due to compiling GA/ARMCI icc. The OFED
version is new enough and is supported by ARMCI. Could you please try compiling with gcc and
ifort? If problems persist, please attach the new make log.

Thanks,
Sriram.K

Entered on 04/09/2009 at 00:27:06 PDT (GMT-0700) by EMAIL_HIDDEN:

Hi,
Thank you for your answer. You can find as attached files the script used
for building GA (slightly different from the already sent (ifort and icc
instead of mpif90 and mpicc but that produces the same error)) and the make
log

Compiler: ifort and icc version 11.0

mpicc:
icc -I/share/apps/openmpi/intel/openmpi-1.2.8/include -pthread
-L/share/apps/openmpi/intel/openmpi-1.2.8/lib -lmpi -lopen-rte -lopen-pal
-ldl -Wl,--export-dynamic -lnsl -lutil

mpif90:
ifort -I/share/apps/openmpi/intel/openmpi-1.2.8/include
-I/share/apps/openmpi/intel/openmpi-1.2.8/lib
-L/share/apps/openmpi/intel/openmpi-1.2.8/lib -lmpi_f90 -lmpi_f77 -lmpi
-lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil

OFED-1.3.1

Best regards,
Jean-Pierre Dognon

---------------------------------------------------------------------
Jean-Pierre Dognon
CEA/SACLAY
DSM/IRAMIS/SIS2M
Laboratoire Claude Fr�jacques (CEA-CNRS URA 331)
Bat.125
91191 GIF SUR YVETTE CEDEX
FRANCE
Phone: 33 1 69 08 37 14
Fax: 33 1 69 08 66 40
mailto: EMAIL_HIDDEN
---------------------------------------------------------------------

> From: HPC Tools Support <EMAIL_HIDDEN>
> Reply-To: <EMAIL_HIDDEN>
> Date: Wed, 8 Apr 2009 11:17 -0700
> To: Jean-Pierre Dognon <EMAIL_HIDDEN>
> Subject: GA on infiniband cluster. ISSUE=1515 PROJ=19
>
> [Duplicate message snipped]

Entered on 04/08/2009 at 11:17:45 PDT (GMT-0700) by Sriram Krishnamoorthy:

Hi,

Sorry for the delayed response. Could you please send us the make log? Also, can you tell me
what is the version of OFED installed on the machine, and the compilers (you can get the
compiler details as mpicc -show, etc.).

Thanks,
Sriram.K

Entered on 04/03/2009 at 08:03:05 PDT (GMT-0700) by EMAIL_HIDDEN:


Hi,

I am a user of Molpro quantum chemistry software package. I cannot run jobs
on several nodes on my new cluster (AMD, Centos, OpenMPI 1.2.8, infiniband
OpenIB, Intel compilers). There is no problem on 1 node (8 cores). After
debugging with the molpro support, we conclude to a problem with my
compilation of GA 4.1.1.

Testing with:
mpirun -host compute-0-13,compute-0-13,compute-0-12,compute-0-12 -np 4
./global/testing/test.x

produces the following error (no error on 1 node):

ARMCI configured for 2 cluster nodes. Network protocol is 'OpenIB Verbs
API'.
(rank:0 hostname:compute-0-13.local pid:28879):ARMCI DASSERT fail.
signaltrap.c:SigSegvHandler():301 cond:0
[compute-0-13.local:28879] MPI_ABORT invoked on rank 0 in communicator
MPI_COMM_WORLD with errorcode 0
(rank:2 hostname:compute-0-12.local pid:8727):ARMCI DASSERT fail.
signaltrap.c:SigSegvHandler():301 cond:0
[compute-0-12.local:08727] MPI_ABORT invoked on rank 2 in communicator
MPI_COMM_WORLD with errorcode 0
Last System Error Message from Task 1:: Operation now in progress
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libintlc.so.5 00002B3B7BCC0A41 Unknown Unknown Unknown
libintlc.so.5 00002B3B7BCBFA15 Unknown Unknown Unknown
libifcore.so.5 00002B3B7B977CF3 Unknown Unknown Unknown
libifcore.so.5 00002B3B7B8F08FE Unknown Unknown Unknown
libifcore.so.5 00002B3B7B900019 Unknown Unknown Unknown
test.x 00000000004F2500 Unknown Unknown Unknown
test.x 00000000004E2B2A Unknown Unknown Unknown
test.x 00000000004F2A8C Unknown Unknown Unknown
libc.so.6 0000003869C301B0 Unknown Unknown Unknown
libpthread.so.0 000000386A80DF82 Unknown Unknown Unknown
libopen-pal.so.0 00002B3B7C32DB47 Unknown Unknown Unknown
libopen-pal.so.0 00002B3B7C32E8C4 Unknown Unknown Unknown
libopen-pal.so.0 00002B3B7C32BCB8 Unknown Unknown Unknown
libopen-pal.so.0 00002B3B7C326E0B Unknown Unknown Unknown
libmpi.so.0 00002B3B7AEC2175 Unknown Unknown Unknown
libmpi.so.0 00002B3B7AF02142 Unknown Unknown Unknown
libmpi.so.0 00002B3B7AF06D7C Unknown Unknown Unknown
libmpi.so.0 00002B3B7AED577F Unknown Unknown Unknown
test.x 000000000050E17D Unknown Unknown Unknown
test.x 0000000000507D32 Unknown Unknown Unknown
test.x 00000000004F8917 Unknown Unknown Unknown
test.x 00000000004E08C5 Unknown Unknown Unknown
test.x 0000000000515913 Unknown Unknown Unknown
test.x 0000000000515134 Unknown Unknown Unknown
test.x 0000000000514D42 Unknown Unknown Unknown
test.x 0000000000402E54 Unknown Unknown Unknown
test.x 0000000000402DFC Unknown Unknown Unknown
libc.so.6 0000003869C1D8B4 Unknown Unknown Unknown
test.x 0000000000402D09 Unknown Unknown Unknown
Last System Error Message from Task 3:: Operation now in progress
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libintlc.so.5 00002AE194E89A41 Unknown Unknown Unknown
libintlc.so.5 00002AE194E88A15 Unknown Unknown Unknown
libifcore.so.5 00002AE194B40CF3 Unknown Unknown Unknown
libifcore.so.5 00002AE194AB98FE Unknown Unknown Unknown
libifcore.so.5 00002AE194AC9019 Unknown Unknown Unknown
test.x 00000000004F2500 Unknown Unknown Unknown
test.x 00000000004E2B2A Unknown Unknown Unknown
test.x 00000000004F2A8C Unknown Unknown Unknown
libc.so.6 00000039B1C301B0 Unknown Unknown Unknown
libc.so.6 00000039B1C305B0 Unknown Unknown Unknown
libopen-pal.so.0 00002AE1954F7844 Unknown Unknown Unknown
libopen-pal.so.0 00002AE1954F4CB8 Unknown Unknown Unknown
libopen-pal.so.0 00002AE1954EFE0B Unknown Unknown Unknown
libmpi.so.0 00002AE19408B175 Unknown Unknown Unknown
libmpi.so.0 00002AE1940CB142 Unknown Unknown Unknown
libmpi.so.0 00002AE1940CFD7C Unknown Unknown Unknown
libmpi.so.0 00002AE19409E77F Unknown Unknown Unknown
test.x 000000000050E17D Unknown Unknown Unknown
test.x 0000000000507D32 Unknown Unknown Unknown
test.x 00000000004F8917 Unknown Unknown Unknown
test.x 00000000004E08C5 Unknown Unknown Unknown
test.x 0000000000515913 Unknown Unknown Unknown
test.x 0000000000515134 Unknown Unknown Unknown
test.x 0000000000514D42 Unknown Unknown Unknown
test.x 0000000000402E54 Unknown Unknown Unknown
test.x 0000000000402DFC Unknown Unknown Unknown
libc.so.6 00000039B1C1D8B4 Unknown Unknown Unknown
test.x 0000000000402D09 Unknown Unknown Unknown

My script used to build GA is given as an attached file.

Could you help me?
Thank you very much in advance,
Best regards,
Jean-Pierre Dognon
---------------------------------------------------------------------
Jean-Pierre Dognon
CEA/SACLAY
DSM/IRAMIS/SIS2M
Laboratoire Claude Fr�jacques (CEA-CNRS URA 331)
Bat.125
91191 GIF SUR YVETTE CEDEX
FRANCE
Phone: 33 1 69 08 37 14
Fax: 33 1 69 08 66 40
mailto: EMAIL_HIDDEN
---------------------------------------------------------------------"

pott...@gmail.com

unread,
Apr 29, 2019, 3:49:26 PM4/29/19
to hpctools
oo
Reply all
Reply to author
Forward
0 new messages