segmentation fault in init_grid

519 views
Skip to first unread message

Alexis Berny

unread,
Feb 26, 2018, 12:04:32 PM2/26/18
to basilisk-fr
Dear Basilisk user,

I had a very strange bug recently. I'm working on a remote server via ssh, with the latest version of basilisk. When I want to launch a simulation, I get a segmentation fault error. First, I thought it was coming from my function, but using gdb I found out it was in the init_grid function.

Here is the beginning of my code:

#define L0 10

int  main(int argc, char const *argv[]) {

  size (L0);
  origin (-L0/2., 0.);
  init_grid (1 << (9));
}

I've tried this code on a Mac, on Ubuntu 16.04 and no problem.

Here is the message I get while using the makefile:

aberny@renoir $ make test.tst

[test.tst]

/poisson/temporaires/aberny/basilisk/src/grid/tree.h:206:error: Program received signal SIGSEGV, Segmentation fault.

/poisson/temporaires/aberny/basilisk/src/Makefile.defs:49 : la recette pour la cible « test.tst » a échouée

make: *** [test.tst] Erreur 1


If anyone has a clue of what is going on, it would be very helpful.


Best

Wojciech Aniszewski

unread,
Feb 26, 2018, 12:07:17 PM2/26/18
to Alexis Berny, basilisk-fr
Do you compile with -O2 ?
> --
> You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
> To post to this group, send email to basil...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/260f4f74-dc8b-446a-ac9d-898e7b548dcc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
Wojciech (Wojtek) ANISZEWSKI
[Fr: vôitek anichévsky]
[Eng: voyteck aanishevsky]

Post-doctoral Researcher
Sorbonne University
Institut ∂'Alembert

www:
[in English:] http://www.coria.fr/spip.php?auteur1606
[in Polish:] http://nauka-polska.pl/dhtml/raporty/ludzieNauki?rtype=opis&objectId=240452&lang=pl

/^..^\ ,-------------------------------------,
( (••) ) ►►►►| My public GPG key ID: AC66485E |
(|)_._(|)~ | please use email encryption! |
`-------------------------------------"
signature.asc

Alexis BERNY

unread,
Feb 26, 2018, 12:09:02 PM2/26/18
to anisz...@dalembert.upmc.fr, basilisk-fr

Le 26 févr. 2018 à 18:07, Wojciech Aniszewski <anisz...@dalembert.upmc.fr> a écrit :

Do you compile with -O2 ?

yes, I do

-- 
Alexis BERNY




Wojciech Aniszewski

unread,
Feb 26, 2018, 12:10:38 PM2/26/18
to Alexis BERNY, anisz...@dalembert.upmc.fr, basilisk-fr
Can you check with -O1
> --
> You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
> To post to this group, send email to basil...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/CF559542-8929-4E81-8B3F-BFB08B7E6E4F%40gmail.com.
signature.asc

Alexis BERNY

unread,
Feb 26, 2018, 12:15:55 PM2/26/18
to anisz...@dalembert.upmc.fr, basilisk-fr
Le 26 févr. 2018 à 18:10, Wojciech Aniszewski <anisz...@dalembert.upmc.fr> a écrit :

Can you check with -O1
It’s working.

Thanks.
-- 
Alexis BERNY




Wouter Mostert

unread,
Feb 26, 2018, 12:27:50 PM2/26/18
to basilisk-fr
Does anyone know why in general the change of an optimization flag could make the difference of dis/appearance of segfaults?

Stephane Popinet

unread,
Feb 26, 2018, 12:34:16 PM2/26/18
to basil...@googlegroups.com
> I had a very strange bug recently. I'm working on a remote server via
> ssh, with the latest version of basilisk.

How did you install/upgrade the latest version of basilisk on this
server? Did you do:

cd $BASILISK
darcs pull
make clean
make

Stephane

Alexis Berny

unread,
Feb 27, 2018, 3:10:02 AM2/27/18
to basilisk-fr
Yes, I did.

I first updated my basilisk version with the command you gave.
Since it wasn't working, I deleted and reinstall basilisk.
I also try to precompile my code on a local computer (where basilisk is correctly installed) with "qcc -source -grid=quadtree", but I had the same segmentation fault when compiling on the distant server (with gcc -O2)

Christoph Buchner

unread,
Feb 27, 2018, 3:30:21 AM2/27/18
to basilisk-fr
Just some ideas from googling around:
One old forum post suggests to use "-O2 -fno-strict-aliasing" to see if a strict aliasing problem causes this.
If not, the list of gcc optimizations that get switched on with O2 can be found here, someone could switch them on one-by-one to identify the culprit?
Otherwise, is valgrind being run on this code to find problems?

Disclaimer: My strong C/C++ days are far behind me, so I may be mistaken here about the best course to identify the problem cause.

Best,
Christoph

Stephane Popinet

unread,
Feb 27, 2018, 4:20:06 AM2/27/18
to basil...@googlegroups.com
> I first updated my basilisk version with the command you gave.
> Since it wasn't working, I deleted and reinstall basilisk.

What do you mean by "it wasn't working"? Do you mean it gave you the
"segmentation fault" you mention?

> I also try to precompile my code on a local computer (where basilisk is
> correctly installed) with "qcc -source -grid=quadtree", but I had the
> same segmentation fault when compiling on the distant server (with gcc -O2)

Did you try to compile the source file produced by "qcc -source", using
exactly the same command on the local system as on the distant server?

Did it produce the same seg fault?

If it didn't, then the only differences between the local machine and
the server are the compilers, what are their versions?

Stephane

Wojciech Aniszewski

unread,
Feb 27, 2018, 5:30:48 AM2/27/18
to Stephane Popinet, basil...@googlegroups.com

I humbly suggest this might have nothing to do with the remote system.
I can reproduce it on my laptop, here's a 'reproduce it' setup. the unitsquare.c is a chopped-down casefile with a now-nonsense
flow defined, just enough to get Basilisk going. The 'compile_openMP_run.sh' script should
be ran to reproduce the bug (give it an argument with number of threads, say

./compile_openMP_run.sh 1

if you want to make the code work okay, just change "-O2" to "-O1" or remove optimisation flags
alltogether.

Here's my OS spec (debian testing):
Linux orion 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux

here's my gcc introducing itself:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 7.3.0-3' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Debian 7.3.0-3)

and finally, here's the state of basilisk:
$darcs log
patch 864f9cbc438ab4cda1101b27edea40218e294809
Author: Stephane Popinet <pop...@basilisk.fr>
Date: Fri Feb 16 17:36:45 CET 2018
tagged release 18-02-16


also:

wojciech@orion:~/fortran/basilisk$ darcs whatsnew
hunk ./src/config.gcc 25
-OPENGLIBS = -lfb_osmesa -lGLU -lOSMesa
-# OPENGLIBS = -lfb_glx -lGLU -lGLEW -lGL -lX11
+#OPENGLIBS = -lfb_osmesa -lGLU -lOSMesa
+ OPENGLIBS = -lfb_glx -lGLU -lGLEW -lGL -lX11

Hmm I see I manipulated the config to pre-set my configuration to a X11, interactive session.
Haven't checked without openmp however, yet.

it *might* be actually Christoph Buchner who's right with some optimisations changing recently...

best
w
> --
> You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
> To post to this group, send email to basil...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/298de511-edfa-09a1-6fc4-7180575f0343%40basilisk.fr.
compile_openMP_run.sh
unitsquare.c
signature.asc

Stephane Popinet

unread,
Feb 27, 2018, 5:50:01 AM2/27/18
to anisz...@dalembert.upmc.fr, basil...@googlegroups.com
Ok, thanks for the test.

It seems to run fine on my system (Debian 8.10), which uses an older
version of gcc (4.9.2), at least until I stopped it after >1000 steps.

This is a serious issue since the workaround (less optimization) must
have a large impact on performance.

I will see if I can install a more recent version of gcc to check this
out, meanwhile anything you can do to pinpoint where the problem comes
from would be most useful.

Stephane

----

uname -a
Linux aldebaran 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1
(2018-01-08) x86_64 GNU/Linux

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10'
--with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.9 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre
--enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64
--with-arch-directory=amd64
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc
--enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)

Alexis Berny

unread,
Feb 27, 2018, 9:14:27 AM2/27/18
to basilisk-fr
> What do you mean by "it wasn't working"? Do you mean it gave you the 
> "segmentation fault" you mention? 

Indeed

> Did you try to compile the source file produced by "qcc -source", using 
> exactly the same command on the local system as on the distant server? 

Yes


> Did it produce the same seg fault? 

No

> If it didn't, then the only differences between the local machine and 
> the server are the compilers, what are their versions? 

On my local machine:

alexisberny@ubuntu $ gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper

Target: x86_64-linux-gnu

Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.9' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu

Thread model: posix

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)


On the distant server: 

aberny@renoir $ gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper

Target: x86_64-linux-gnu

Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu

Thread model: posix

gcc version 6.3.0 20170516 (Debian 6.3.0-18)

Stephane Popinet

unread,
Feb 27, 2018, 10:15:11 AM2/27/18
to basil...@googlegroups.com
Ok, thanks. I have upgraded my version of gcc to 6.3.0 and I can
reproduce the bug. The minimal example to reproduce this is just:

int main() {
init_grid (1);
}

compiled with:

qcc -g -O2 test.c -o test -lm

which gives within gdb:

(gdb) run
Starting program: /home/popinet/basilisk_1_0/src/test/test

Program received signal SIGSEGV, Segmentation fault.
layer_add_row (l=<optimized out>, i=<optimized out>, j=<optimized out>)
at /home/popinet/basilisk_1_0/src/grid/tree.h:206
206 refarray (l->m[i], l->len, sizeof(char *));

Using valgrind gives:

valgrind --tool=memcheck ./test

==28371== Invalid read of size 4
==28371== at 0x10AC45: refarray (tree.h:93)
==28371== by 0x10AC45: layer_add_row.isra.9 (tree.h:206)
==28371== by 0x1145FA: init_grid (tree.h:1634)
==28371== by 0x108F52: main (unitsquare1.c:2)
==28371== Address 0x28 is not stack'd, malloc'd or (recently) free'd

and indeed the error goes away when optimisation is turned off.

Note that this is an example of the kind of a bug report I would have liked.

cheers,

Stephane

Stephane Popinet

unread,
Feb 27, 2018, 12:36:45 PM2/27/18
to basil...@googlegroups.com
Hi Alexis et al,

It seems that the problem is caused by a possible bug in gcc > 6 (I am
cautious since compiler bugs are much rarer than coding errors...). I
have filed a bug report here if you are interested:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84600

I have found a workaround (as suggested in the bug report), which will
be in the next release.

cheers,

Stephane

Wojciech Aniszewski

unread,
Feb 27, 2018, 3:43:36 PM2/27/18
to Stephane Popinet, basil...@googlegroups.com
By the way-announcement.
This bug also causes bview to function not properly. It will hang indefinetely when trying to restore from dumpfiles.
(Up to now, we didn't realize the connection to gcc version and SP couldn't reproduce this bview behaviour as well.)
So you might be - as I am - left with a bunch of unopenable files. As a temporary workaround (i.e. before bugfixes)
you can recompile entire basilisk stack with changes in Makefiles:

CFLAGS += -O2 switched to CFLAGS += -O1

this will make bview act normally (again).
regards
w
> --
> You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
> To post to this group, send email to basil...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/583fb411-f501-9a31-a1f2-f463695ed6d9%40basilisk.fr.
signature.asc

Stephane Popinet

unread,
Feb 28, 2018, 4:39:29 AM2/28/18
to basil...@googlegroups.com
The gcc people have commented on by bug report (in remarkably short
time). Apparently this is not a bug in gcc but some strange quirk in the
semantics of pointer types in C. See

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84600

if you are interested.

I believe that one can still debate whether having a different
"undefined behaviour" when using function inlining and/or changing
compiler versions is a bug or a "feature" of gcc...

Stephane

Wojciech Aniszewski

unread,
Feb 28, 2018, 6:46:13 AM2/28/18
to Stephane Popinet, basil...@googlegroups.com
Good to know.

Torvalds says the kernel uses -fno-strict-aliasing
(https://stackoverflow.com/questions/2958633/gcc-strict-aliasing-and-horror-stories#2959468)
so you're not alone on this here...

Meanwhile, I'm in the process of switching my codes to: -O3 -fno-strict-aliasing, works on the laptop so far.
Is any of the O3 level optimisations "dangerous" for the code, by the way?

w
> --
> You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
> To post to this group, send email to basil...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/4850322b-233c-ebd4-09d6-b95171200a3e%40basilisk.fr.
> For more options, visit https://groups.google.com/d/optout.


signature.asc

Stephane Popinet

unread,
Feb 28, 2018, 6:54:03 AM2/28/18
to basil...@googlegroups.com
> Meanwhile, I'm in the process of switching my codes to: -O3 -fno-strict-aliasing, works on the laptop so far.
> Is any of the O3 level optimisations "dangerous" for the code, by the way?

Note that the patch I am about to push will not require
-fno-strict-aliasing. Actually strict aliasing is a good thing to have
since it can lead to better code optimisation.

As for -O3 I don't know if it is "dangerous", but when I tested it a
while back it made little difference in performance.

Stephane

Ali Shareef

unread,
Oct 10, 2019, 1:26:24 PM10/10/19
to basilisk-fr
Hi Stephane

I am trying to use the basilisk. I was able to compile the model (it did not give any errors) I tried to run the following code (attached) it gives me the segmentation fault. For the model compilation, I used the -O2 and also I used the same options to run this code too. I installed the latest basilisk using darcs. I am working on Ubuntu 18 with gcc 7.4.0

I have also tried with different options I found in this thread with no avail and giving the same error.

I would appreciate if someone can tell me what am I doing wrong here. I am very new to this.
Hul2Dcp.c

Stephane Popinet

unread,
Oct 11, 2019, 2:51:25 AM10/11/19
to basil...@googlegroups.com
Hi Ali,

The code you attached cannot possibly compile. I would expect something
like:

#include "saint-venant.h"
etc.

at the top.

cheers,

Stephane

Ali Shareef

unread,
Oct 11, 2019, 2:55:53 PM10/11/19
to Stephane Popinet, basil...@googlegroups.com
Hi Stephane

I have tried it with 
#include "green-naghdi.h"
#include "terrain.h"
#include "saint-venant.h"

but it still gives me the segmentation fault. Do you think that this could be still something to do with the latest ubuntu compiler or something with my code? The error I keep getting is
feof.c:35:error: Program received signal SIGSEGV, Segmentation fault

which is certainly not very helpful. Any suggestions or ideas to get over this?

Regards
Ali


--
You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.


--
"There are no unanswered prayers although sometimes the answer is NO"
Reply all
Reply to author
Forward
0 new messages