Re: [dirac-users] Lucita memory failure (update)

56 views
Skip to first unread message

Peterson, Kirk

unread,
Sep 2, 2021, 6:07:49 PM9/2/21
to dirac...@googlegroups.com

All,

 

so in some desperation, I compiled Dirac21 with ifort and gcc.  This is a combination that I believe has worked well in the past.  Everything looks great but unfortunately Lucita again crashes, but now seemingly very differently.  Below is part of the output and traceback.

 

Any ideas?

 

best regards,  -Kirk

 

 

   ==> allocation of two CI vectors and one resolution vector <==

 

   current available free memory in double words:       799892201

   allocate two CI vectors each of length:                   7577

   allocate resolution vector of length:                     5776

   ==============================================================

 

DIRAC pam run in /home/kipeters/Dirac21/test/lucita_large

 

====  below this line is the stderr stream  ====

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Image              PC                Routine            Line        Source            

dirac.x            0000000001A90464  Unknown               Unknown  Unknown

libpthread-2.12.s  0000003ACCA0F7E0  Unknown               Unknown  Unknown

dirac.x            00000000013229AC  getstr_totsm_spgp        1508  strings.F

dirac.x            0000000001319FC1  gasdias_                 1140  diagonal.F

dirac.x            0000000001319A99  gasdiat_                 1359  diagonal.F

dirac.x            00000000009E6287  gasci_                    499  program.F

 

From: "'Peterson, Kirk' via dirac-users" <dirac...@googlegroups.com>
Reply-To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Date: Wednesday, September 1, 2021 at 7:05 AM
To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Subject: Re: [dirac-users] Lucita memory failure

 

Dear Hans Jørgen,

 

thanks, this makes a lot of sense and explains my current error message for the i*4 version I was trying.  I guess the original question remains. Is it possible to get a functioning Lucita with the gnu compiler suite (with int64 which is my normal build)?

 

best regards,

 

-Kirk

 

From: 'Hans Jørgen Aagaard Jensen' via dirac-users <dirac...@googlegroups.com>
Reply-To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Date: Wednesday, September 1, 2021 at 1:06 AM
To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Subject: Re: [dirac-users] Lucita memory failure

 

Dear Kirk,

 

On many current computers you need int64 to run LUCITA and LUCIAREL (incl. when called from KRCI), even for small tests.  The reason is that in the original pre-Dirac versions all memory allocations for arrays were from a WORK array in a common block. In order not to have this static common block allocation making a lot of memory not usable for the rest of Dirac, we made an ad hoc solution. We calculated the off-set in memory between the two work arrays, allocated WORK(1) in the common block and added this off-set to all WORK(Ksomething) use in LUCI*. Nowadays this off-set is often bigger than 2**31, the largest number in integer*4, this is why one usually needs int64 to run LUCITA or LUCIAREL.

 

The clean solution would be to rewrite the memory allocation in the LUCITA and LUCIAREL, but that is a lot of work which has so far not been the top priority for any of the developers.

 

Regards, Hans Jørgen.

 

Fra: "'Peterson, Kirk' via dirac-users" <dirac...@googlegroups.com>
Svar til: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Dato: tirsdag den 31. august 2021 kl. 22.49
Til: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Emne: Re: [dirac-users] Lucita memory failure

 

Dear Dirac experts,

 

so I built a integer*4 version of Dirac17 to see if this would give me a functioning Lucita code.  Now when I run the lucita_short test job I get:

 

Information about the restricted kinetic balance scheme:

* Default RKB projection:

   1: Pre-projection in scalar basis

   2: Removal of unphysical solutions (via diagonalization of free particle Hamiltonian)

controlled stop: only int64

 

FATAL ERROR in test_lucita_wrk_space_offset: memory offset (dynamic memory - static memory) is too big for i*4

K_OFFSET and KBASE_LUCITA:       5889449601119          1049438303

 

I don't think it's actually a memory issue - here is the top of the output:

 

  ** interface to 32-bit integer MPI enabled **

 

DIRAC serial starts by allocating 100000000 words (    762.94 MB -  0.745 GB)

of memory    out of the allowed maximum of 200000000 words (   1525.88 MB -  1.490 GB)

 

 

Can anyone shed some light on this?

 

regards,  -Kirk

 

From: "'Peterson, Kirk' via dirac-users" <dirac...@googlegroups.com>
Reply-To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Date: Friday, August 27, 2021 at 5:32 PM
To: "dirac...@googlegroups.com" <dirac...@googlegroups.com>
Subject: [dirac-users] Lucita memory failure

 

Hi all,

 

this is resurrecting a previous thread (under a new subject title), but now for Dirac21.  Does anyone have a functioning version of Lucita? Both lucita_large and lucita_short test jobs give me a familiar error:

 

  Identifier  Start of free memory

  =================================

   -INI--               5885249840561

 

  Master node : --- SEVERE ERROR, PROGRAM WILL BE ABORTED ---

 

Date and time (Linux) : Fri Aug 27 17:26:20 2021

*** error in MEMMAN: memory corrupted. ***

 

I've had this issue since moving to Dirac19 and got around it by using an old build of Dirac17.  Unfortunately I accidentally deleted that build the other day doing some housekeeping....

 

For Dirac21, I'm compiling a 64-bit integer version with Gnu 10.2.1 and OpenMPI 4.0.2 .  Perhaps it's just a matter of building a i*4 version?

 

thanks in advance,

 

-Kirk

--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/MWHPR01MB323142DFAD40EEB6FDB5B801D6C99%40MWHPR01MB3231.prod.exchangelabs.com.

--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/7438893B-9BFD-4BF2-99B9-D68C5AB942A0%40wsu.edu.

--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/D4DE8215-07B1-47F1-9E35-15704ED0A30F%40sdu.dk.

--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/A9EC2A86-3121-41EE-ABBF-B1E8CD38EFC3%40wsu.edu.

Peterson, Kirk

unread,
Sep 5, 2021, 12:55:43 AM9/5/21
to dirac...@googlegroups.com

All,

 

just to more or less close this thread, I finally got a working version of Lucita.  It did, however, involve going back to Dirac17 but I also had to use the Intel fortran compiler (but still gcc for c and c++).  Using gnu throughout resulted in the same errors as in Dirac21 (and Dirac19).

 

best regards,

Visscher, L.

unread,
Sep 9, 2021, 8:12:37 AM9/9/21
to dirac...@googlegroups.com
Dear Kirk,

I have a pragmatic fix that solves the issue on my mac. Hope to convince my fellow developers to include this in the release-patch that will soon come out. If not, I'll mail it to you separately so that you can try whether it works for you.

The memory handling in these codes is really outdated but as there are no active developers of this part, it is hard to make an elegant fix.

best regards,

Luuk


Visscher, L.

unread,
Sep 13, 2021, 2:54:14 AM9/13/21
to dirac...@googlegroups.com
Dear Kirk,

Update on this issue: the fix that I've implemented will be part of the patch that we plan to release soon. This will make lucita work with Dirac21, albeit only with 64 bit integers. 
I am optimistic that this revision of the memory handling will make it easier to also make again a 32 bit version, but this will require more time than I have to spend on gthis issue. Hope the current solution will work for you and others.

best regards,

Luuk

Peterson, Kirk

unread,
Sep 13, 2021, 10:17:23 AM9/13/21
to dirac...@googlegroups.com

Dear Luuk,

 

thanks, this is great news. Personally I don't have much interest in a 32-bit version, so that's not a priority for us.

 

best regards,

 

-Kirk

Reply all
Reply to author
Forward
0 new messages