[nmag-users] Segfault error

50 views
Skip to first unread message

Ilya Dubitskiy

unread,
May 4, 2014, 2:33:23 PM5/4/14
to nmag-...@lists.soton.ac.uk
Dear all,

I have used Nmag for several months and had a lot of fun due to large possibilities Nmag provides.

Now I am  trying to simulate rather big structure (700 nm per side). Unfortunately after populating BEM matrix was ended I got a Segmentation fault.

So I just considered rather simple geometry (cube) in order to reveal the reason of the problem.
I attached my .geo file to the letter.
Mesh was built in Netgen 4.9.13 with following settings: max mesh size = 0.42, mesh granularity = very fine.
As a result I obtained mesh which consisted of approximately 419e3 nodes, 2300e3 volume elements and 64e3 surface elements.

Then I tried to load this mesh to Nmag (please see script attached below). But I got Segmentation fault (please see log file).
This segfault error occurs regardless of  whether or not I use Hlib.
For the mesh consisting of 360e3 nodes Nmag works well.

I run Nmag on Ubuntu virtual machine. 28 GB of RAM are allocated for it.
So I think that it is enough memory even for such a large mesh. It is interesting that segfault error occurs after all matrices are built.

I read that in early Nmag release (Release 0.1 beta (5834))  diagnostic debug code that caused segfault for large meshes was disabled. 
May be I encountered problem which has the same origin. Or may be I just compiled Nmag incorrectly (please see bug-report.tgz).

Did anyone encounter the same problem?

Thank you in advance.

Sincerely,

Ilya Dubitskiy,

PhD student,
Faculty of Physics
Saint-Petersburg State University
tel. +7(812)4284588
fax +7(812)4287200, +7(812)4284588
Address: Petrodvorets, Ulyanovskaya str., 1,
Saint-Petersburg, Russia, 198504
bar_relax_test_without_hlib.py
bar_relax_test_without_hlib_log.log
bar28.28_28.28_28.28.geo
bug-report.tgz

fangohr

unread,
May 6, 2014, 5:42:40 PM5/6/14
to Ilya Dubitskiy, Hans Fangohr, nmag-...@lists.soton.ac.uk
Dear Ilya,

On 4 May 2014, at 19:33, Ilya Dubitskiy <i.dub...@phys.spbu.ru> wrote:

Dear all,

I have used Nmag for several months and had a lot of fun due to large possibilities Nmag provides.

Now I am  trying to simulate rather big structure (700 nm per side). Unfortunately after populating BEM matrix was ended I got a Segmentation fault.

So I just considered rather simple geometry (cube) in order to reveal the reason of the problem.
I attached my .geo file to the letter.
Mesh was built in Netgen 4.9.13 with following settings: max mesh size = 0.42, mesh granularity = very fine.
As a result I obtained mesh which consisted of approximately 419e3 nodes, 2300e3 volume elements and 64e3 surface elements.


I didn’t have a chance to run your files (have we got any other volunteers who could do this?), but will provide some comments that may help anyway.


Then I tried to load this mesh to Nmag (please see script attached below). But I got Segmentation fault (please see log file).
This segfault error occurs regardless of  whether or not I use Hlib.
For the mesh consisting of 360e3 nodes Nmag works well.

So the mesh where it works, has 360e6 nodes and the failing mesh has 419e3? This may suggest that you do run out of memory: even the volume nodes (i.e. the ‘nodes’) take some memory. 

What is the size of the job when the simulation with 360e6 nodes is fully set up? (The columns "VIRT    RES  in the output of the ‘top’ command would be useful.)

If it is close to your 28GB (in particular the ‘RES’), then this is the problem.


I run Nmag on Ubuntu virtual machine. 28 GB of RAM are allocated for it.
So I think that it is enough memory even for such a large mesh. It is interesting that segfault error occurs after all matrices are built.

The BEM needs exactly 8 bytes * n_s^2, where n_s is the number of surface nodes (see http://nmag.soton.ac.uk/nmag/0.2/manual/html/executables.html#memory-requirements-of-boundary-element-matrix). 

So in your example with 64e3 surface nodes, this is 8 * (64e3)^2 ~ 32GB if I am not mistaken? So this alone would push the memory requirements above your 28GB?

Note that you can run "nmeshpp -i” on a given nmesh file, and it will tell how much RAM is required for the BEM matrix for this mesh.

Hope this is useful.

Best wishes,

Hans



I read that in early Nmag release (Release 0.1 beta (5834))  diagnostic debug code that caused segfault for large meshes was disabled. 
May be I encountered problem which has the same origin. Or may be I just compiled Nmag incorrectly (please see bug-report.tgz).

Did anyone encounter the same problem?

Thank you in advance.

Sincerely,

Ilya Dubitskiy,

PhD student,
Faculty of Physics
Saint-Petersburg State University
tel. +7(812)4284588
fax +7(812)4287200, +7(812)4284588
Address: Petrodvorets, Ulyanovskaya str., 1,
Saint-Petersburg, Russia, 198504
<bar_relax_test_without_hlib.py><bar_relax_test_without_hlib_log.log><bar28.28_28.28_28.28.geo><bug-report.tgz>

Ilya Dubitskiy

unread,
May 12, 2014, 12:15:58 PM5/12/14
to fangohr, nmag-...@lists.soton.ac.uk
Dear Hans,

Thank you for your answer and advices!


"
What is the size of the job when the simulation with 360e6 nodes is fully set up? (The columns "VIRT    RES  in the output of the ‘top’ command would be useful.)
"

The size of the job for simulation with 360e3 nodes is equal to 14.4 GB (VIRT) or 11 GB (RES).

"
The BEM needs exactly 8 bytes * n_s^2, where n_s is the number of surface nodes (see http://nmag.soton.ac.uk/nmag/0.2/manual/html/executables.html#memory-requirements-of-boundary-element-matrix). 
 
So in your example with 64e3 surface nodes, this is 8 * (64e3)^2 ~ 32GB if I am not mistaken? So this alone would push the memory requirements above your 28GB?
"
I use Hlib for matrix compression so even for mesh which contains 419e3 nodes compressed BEM size is equal to approximately 400 MB.

I notice that segfault error occurs after the following line in the log file "INFO Populating BEM took 105.888495 seconds.". At this stage the size of the job is 9.5 GB (VIRT) or 9.0 GB (RES).
It is a bit puzzling since as I understand memory is only needed for compiling linalg_machine and populating BEM. But may be some additional memory is used by Nmag later. If it is the case then segfault error may occur due to memory overflow. If it is not then things look strange.

Sincerely,

Ilya

fangohr

unread,
May 12, 2014, 3:16:36 PM5/12/14
to Ilya Dubitskiy, Hans Fangohr, nmag-...@lists.soton.ac.uk
Dear Ilya,


On 12 May 2014, at 17:15, Ilya Dubitskiy <i.dub...@phys.spbu.ru> wrote:

Dear Hans,

Thank you for your answer and advices!


"
What is the size of the job when the simulation with 360e6 nodes is fully set up? (The columns "VIRT    RES  in the output of the ‘top’ command would be useful.)
"

The size of the job for simulation with 360e3 nodes is equal to 14.4 GB (VIRT) or 11 GB (RES).

… so that should be okay with your 28GB available.


"
The BEM needs exactly 8 bytes * n_s^2, where n_s is the number of surface nodes (see http://nmag.soton.ac.uk/nmag/0.2/manual/html/executables.html#memory-requirements-of-boundary-element-matrix). 
 
So in your example with 64e3 surface nodes, this is 8 * (64e3)^2 ~ 32GB if I am not mistaken? So this alone would push the memory requirements above your 28GB?
"
I use Hlib for matrix compression so even for mesh which contains 419e3 nodes compressed BEM size is equal to approximately 400 MB.

Okay, so that is not the problem then.

I notice that segfault error occurs after the following line in the log file "INFO Populating BEM took 105.888495 seconds.". At this stage the size of the job is 9.5 GB (VIRT) or 9.0 GB (RES).
It is a bit puzzling since as I understand memory is only needed for compiling linalg_machine and populating BEM. But may be some additional memory is used by Nmag later. If it is the case then segfault error may occur due to memory overflow. If it is not then things look strange.

Yes, that’s right. There are other steps that require additional RAM, including for example computation of the Jacobi matrix (used to accelerate time integration).

You could try to run the script so that it produces additional logging information. For example, use

nsim YOURSCRIPT.py --loglevel debug


This may reveal in more detail what’s happening, and whether additional memory allocation might be the problem. If nothing else, it gives some idea which parts of the code execute. However, printing the debug messages may slow down execution.

Regards,

Hans

Ilya Dubitskiy

unread,
May 13, 2014, 12:55:32 PM5/13/14
to fangohr, nmag-...@lists.soton.ac.uk
Dear Hans,

"
Yes, that’s right. There are other steps that require additional RAM, including for example computation of the Jacobi matrix (used to accelerate time integration).

You could try to run the script so that it produces additional logging information. For example, use

nsim YOURSCRIPT.py --loglevel debug


This may reveal in more detail what’s happening, and whether additional memory allocation might be the problem. If nothing else, it gives some idea which parts of the code execute. However, printing the debug messages may slow down execution.
"

Thank you I will try it!

Sincerely,

Ilya
Reply all
Reply to author
Forward
0 new messages