Segmentation fault (core dumped) error

3,998 views
Skip to first unread message

Ehsan

unread,
Mar 4, 2016, 9:31:23 AM3/4/16
to deal.II User Group
Hi,

I changed step-14 to solve my problem.
When I run the code in 2D it is OK and it runs without any error.
But in 3D, after a few refinements I receive below error:
/var/spool/slurmd/job472819/slurm_script: line 13: 13182 Segmentation fault (core dumped) ./step14_changed

The number of the refinements which code works fine, depend on the parameters of refine_and_coarsen_fixed_fraction (for example for top_fraction = 0.1 it refines 25 times without any problem and in 26th refinement it returns above error and for top_fraction = 0.8 it works fine only for 3 refinements.).
I don't have any idea why it works fine for 25 refinements and at the 26th it crashes. I ran same code for 2D with much more Dofs without any problem, so I don't think the error is related to the number of Dofs.
I googled this error and there it is mentioned that this error is is somehow raised by trying to used an uninitialised memory or access a memory without having access to to it.
Why this problem does not raise in first 25 refinements ?!!

Best regards.
Ehsan

Bruno Turcksin

unread,
Mar 4, 2016, 10:35:52 AM3/4/16
to deal.II User Group
Ehsan,

it could be a problem with the memory. When you are in 3d the number of dofs increases very fast. So you may think that you are not using too many dofs but when you do the refinement the number of dofs explodes and you run out of memory. However, the best solution would be to run your code in gdb using the debug mode and use the backtrace to see where the error comes from. You would do:
gdb ./step-14
run
bt
This will show you where the code crashes.

Best,

Bruno

Ehsan

unread,
Mar 4, 2016, 10:46:09 AM3/4/16
to deal.II User Group
Bruno,

I checked the number of the Dofs.
in the 26th refinement which this error occurs, the n-DoFs is around 7x10^4 but I previously ran same code in 2D with about 2x10^6 DoFs.

Thanks.
Ehsan

Ehsan

unread,
Mar 7, 2016, 6:07:01 AM3/7/16
to deal.II User Group
Hello,

I checked CPU and memory usage and when the code crashes, it only uses 0.2 of total available memory and 0.05 of total allocated CPUs.
So it has plenty of memory and CPU to use.

I don't think this error relates to insufficient memory.

Thanks.
Ehsan

Wolfgang Bangerth

unread,
Mar 7, 2016, 7:24:21 AM3/7/16
to dea...@googlegroups.com
On 03/07/2016 05:07 AM, Ehsan wrote:
>
>
> I checked CPU and memory usage and when the code crashes, it only uses 0.2 of
> total available memory and 0.05 of total allocated CPUs.
> So it has plenty of memory and CPU to use.
>
> I don't think this error relates to insufficient memory.

It's hard to tell. Can you run the program in a debugger and get a backtrace
to see where the problem happens?

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@math.tamu.edu
www: http://www.math.tamu.edu/~bangerth/

Guido Kanschat

unread,
Mar 14, 2016, 3:17:35 PM3/14/16
to dea...@googlegroups.com
Did you run in debug mode?



--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Prof. Dr. Guido Kanschat
Interdisziplinäres Zentrum für Wissenschaftliches Rechnen
Universität Heidelberg
Im Neuenheimer Feld 368, 69120 Heidelberg

Ehsan

unread,
May 20, 2016, 8:47:28 AM5/20/16
to deal.II User Group
Yes I run the code in gdb and I receive below error:

Program received signal SIGSEGV, Segmentation fault.
dealii::SparsityPattern::reinit (this=this@entry=0x7fffffffb498, m=m@entry=829992, n=n@entry=829992, row_lengths=...) at /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:369
369       std::fill_n (&colnums[0], vec_len, invalid_entry);


How can I check if this error is due to the memory shortage or not?

Bruno Turcksin

unread,
May 20, 2016, 8:57:29 AM5/20/16
to dea...@googlegroups.com
Ehsan,

2016-05-20 8:47 GMT-04:00 Ehsan <rabizad...@gmail.com>:
> Yes I run the code in gdb and I receive below error:
>
> Program received signal SIGSEGV, Segmentation fault.
> dealii::SparsityPattern::reinit (this=this@entry=0x7fffffffb498,
> m=m@entry=829992, n=n@entry=829992, row_lengths=...) at
> /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:369
> 369 std::fill_n (&colnums[0], vec_len, invalid_entry);
>
> How can I check if this error is due to the memory shortage or not?
We need to see the entire backtrace. One line is not enough to
understand what's happening.

Best,

Bruno

Ehsan

unread,
May 20, 2016, 9:12:16 AM5/20/16
to deal.II User Group
Here is entire backtrace:

(gdb) bt
#0  dealii::SparsityPattern::reinit (this=this@entry=0x7fffffffb498, m=m@entry=829992, n=n@entry=829992, row_lengths=...) at /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:369
#1  0x00007ffff5cb5fdc in dealii::SparsityPattern::reinit (this=this@entry=0x7fffffffb498, m=m@entry=829992, n=n@entry=829992, row_lengths=...)
    at /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:564
#2  0x00007ffff5cb614b in dealii::SparsityPattern::reinit (this=this@entry=0x7fffffffb498, m=829992, n=829992, max_per_row=8032)
    at /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:251
#3  0x00000000004a9330 in Step14::LaplaceSolver::Solver<3>::LinearSystem::LinearSystem (this=0x7fffffffb3f0, dof_handler=...) at /deal_II/J2/modified_step_14.cc:693
#4  0x00000000004acd22 in Step14::LaplaceSolver::Solver<3>::solve_problem (this=0x89d1c0) at /deal_II/J2/modified_step_14.cc:447
#5  0x00000000004ace37 in Step14::LaplaceSolver::DualSolver<3>::solve_problem (this=<optimized out>) at /deal_II/J2/modified_step_14.cc:1743
#6  0x00000000004ace56 in Step14::LaplaceSolver::WeightedResidual<3>::solve_dual_problem (this=<optimized out>) at /deal_II/J2/modified_step_14.cc:2179
#7  0x0000000000474dac in operator()<, void> (__object=..., this=<optimized out>) at /usr/include/c++/4.8.2/functional:588
#8  __call<void, 0ul> (__args=<optimized out>, this=<optimized out>) at /usr/include/c++/4.8.2/functional:1296
#9  operator()<, void> (this=<optimized out>) at /usr/include/c++/4.8.2/functional:1355
#10 std::_Function_handler<void (), std::_Bind<std::_Mem_fn<void (Step14::LaplaceSolver::WeightedResidual<3>::*)()> (std::reference_wrapper<Step14::LaplaceSolver::WeightedResidual<3> >)> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8.2/functional:2071
#11 0x0000000000476976 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/4.8.2/functional:2471
#12 0x000000000047698a in call (function=...) at /deal_II_install_dir/include/deal.II/base/thread_management.h:756
#13 dealii::Threads::internal::TaskEntryPoint<void>::execute (this=<optimized out>) at /deal_II_install_dir/include/deal.II/base/thread_management.h:2689
#14 0x00007ffff306cfb1 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x8f1080, parent=..., child=child@entry=0x0)
    at /deal_II/dealii-8.3.0/bundled/tbb41_20130401oss/src/tbb/custom_scheduler.h:455
#15 0x00007ffff306df6e in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::wait_for_all (this=<optimized out>, parent=..., child=0x0)
    at /deal_II/dealii-8.3.0/bundled/tbb41_20130401oss/src/tbb/custom_scheduler.h:89
#16 0x000000000047f0db in wait_for_all (this=<optimized out>) at /deal_II_install_dir/include/deal.II/bundled/tbb/task.h:717
#17 join (this=<optimized out>) at /deal_II_install_dir/include/deal.II/base/thread_management.h:2888
#18 dealii::Threads::Task<void>::join (this=this@entry=0x2570660) at /deal_II_install_dir/include/deal.II/base/thread_management.h:3003
#19 0x000000000048b12a in join_all (this=0x7fffffffb8b0) at /deal_II_install_dir/include/deal.II/base/thread_management.h:4011
#20 Step14::LaplaceSolver::WeightedResidual<3>::solve_problem (this=0x89ce50) at /deal_II/J2/modified_step_14.cc:2144
#21 0x00000000004a6e78 in Step14::Framework<3>::run (descriptor=...) at /deal_II/J2/modified_step_14.cc:3004
#22 0x0000000000471b16 in main () at /deal_II/J2/modified_step_14.cc:3072


Thanks
Ehsan

Bruno Turcksin

unread,
May 20, 2016, 9:27:04 AM5/20/16
to dea...@googlegroups.com
2016-05-20 9:12 GMT-04:00 Ehsan <rabizad...@gmail.com>:
> #2 0x00007ffff5cb614b in dealii::SparsityPattern::reinit
> (this=this@entry=0x7fffffffb498, m=829992, n=829992, max_per_row=8032)
> at /deal_II/dealii-8.3.0/source/lac/sparsity_pattern.cc:251
It could be a memory problem. You try to create a matrix with 829,992
rows and up to 8,032 elements per row. This means 829,992*8,032 =
6,666,495,744 elements or 6,666,495,744 * 8 = 53,331,965,952 bytes =
53 Gb If you are only using one processor, you won't have enough
memory. However max_per_row seems excessively large, you can probably
find a sharper bound.

Best,

Bruno

Martin Kronbichler

unread,
May 20, 2016, 9:38:33 AM5/20/16
to dea...@googlegroups.com
Hi Bruno,
If I look at the current source code of SparsityPattern, I think there
is indeed a bug: We define the variable max_vec_len to be of type
size_type (= types::global_dof_index) whereas it actually should be
std::size_t. We can have 32 bit integers but still more than 4G entries.
I think that could be a problem here. Do you have time to have a look (I
think one needs to check all the functions in SparsityPattern and the
interface to SparseMatrix whether all types are correct)?

Best,
Martin

Ehsan

unread,
May 20, 2016, 9:41:43 AM5/20/16
to deal.II User Group
But I allocated more than 500 Gb RAN !!

How does this sparsity_pattern.reinit work?
Does it first calculate the required memory and compares it with available one and then if it is not enough returns error?
OR
it starts to allocate and when it faces the memory shortage it returns the error?

Thanks.
Ehsan

Bruno Turcksin

unread,
May 20, 2016, 9:49:16 AM5/20/16
to dea...@googlegroups.com
Martin,

2016-05-20 9:38 GMT-04:00 Martin Kronbichler <kronbichl...@gmail.com>:

> If I look at the current source code of SparsityPattern, I think there is
> indeed a bug: We define the variable max_vec_len to be of type size_type (=
> types::global_dof_index) whereas it actually should be std::size_t. We can
> have 32 bit integers but still more than 4G entries. I think that could be a
> problem here. Do you have time to have a look (I think one needs to check
> all the functions in SparsityPattern and the interface to SparseMatrix
> whether all types are correct)?
yeah, I will look into it.

Best,

Bruno

Bruno Turcksin

unread,
May 20, 2016, 9:56:25 AM5/20/16
to dea...@googlegroups.com
Ehsan,

2016-05-20 9:41 GMT-04:00 Ehsan <rabizad...@gmail.com>:
> it starts to allocate and when it faces the memory shortage it returns the
> error
This is what he does but you may have a hit a bug in deal.II. You have
less than 4 billions degrees of freedom but you try to create a matrix
with more than 4 billions elements and because by default we use
unsigned int in deal.II you get an overflow error and the code does
weird thing. The easy fix is to recompile deal.II with
-DDEAL_II_WITH_64BIT_INDICES=ON then instead of using unsigned int we
use unsigned long long int in the library and you won't get any
overflow problem.

Best,

Bruno

Bruno Turcksin

unread,
May 20, 2016, 11:55:30 AM5/20/16
to dea...@googlegroups.com
Ehsan,

looking more at the code I don't think that this is your problem. Is
your code parallel or serial? You said that you allocated 500GB of ram
but you probably don't have a node with 500GB of ram so you need to
use several nodes -> you need your code to be parallel.

Best,

Bruno

Bruno Turcksin

unread,
May 20, 2016, 12:01:14 PM5/20/16
to dea...@googlegroups.com
2016-05-20 11:55 GMT-04:00 Bruno Turcksin <bruno.t...@gmail.com>:
> looking more at the code I don't think that this is your problem. Is
> your code parallel or serial? You said that you allocated 500GB of ram
> but you probably don't have a node with 500GB of ram so you need to
> use several nodes -> you need your code to be parallel.
Scratch that. It could be your problem actually.

Sorry,

Bruno

Ehsan

unread,
May 23, 2016, 9:39:30 AM5/23/16
to deal.II User Group
Dear Bruno,
I did what you have recommended and the problem is solved.
Thanks.
Ehsan
Reply all
Reply to author
Forward
0 new messages