Segmentation fault

521 views
Skip to first unread message

tsup...@gmail.com

unread,
Oct 8, 2013, 10:40:39 PM10/8/13
to mpas-atmos...@googlegroups.com
I've downloaded and compiled MPAS 1.5 on Mac OS X 10.6.8, and got part of the way through the first (static) run of init_atmosphere_model when I got a segmentation fault. I'm running the coarsest resolution, and it was taking up less memory than Chrome on my laptop, so I don't think it's a memory space issue.

On a related note, is there any way to tell MPAS to read coarser geo data than the 30sec data? Because I'm only running at 240 km resolution, 30-arc-second geo data seems like overkill. It takes ~20 minutes to go through one set of 30-arc-second files for the entire globe, so it took almost an hour and a half to get to that error.

The last several lines of the log.0000.err file:

BEGIN INTERPOLATION OF STATISTICAL FIELDS FOR GRAVITY WAVE DRAG OVER OROGRAPHY
min MeshD = 1.0000000000000000
max MeshD = 1.0000000000000000
min dcEdge = 200090.77843043293
max dcEdge = 254288.69858080952
dir_gwdo = orogwd_2deg

/Users/tsupinie/wrf/geog/orogwd_2deg/con/00001-00180.00001-00090
--- end interpolate CON
/Users/tsupinie/wrf/geog/orogwd_2deg/oa1/00001-00180.00001-00090
--- end interpolate OA1
/Users/tsupinie/wrf/geog/orogwd_2deg/oa2/00001-00180.00001-00090
--- end interpolate OA2
/Users/tsupinie/wrf/geog/orogwd_2deg/oa3/00001-00180.00001-00090
--- end interpolate OA3
/Users/tsupinie/wrf/geog/orogwd_2deg/oa4/00001-00180.00001-00090
--- end interpolate OA4
/Users/tsupinie/wrf/geog/orogwd_2deg/ol1/00001-00180.00001-00090
--- end interpolate OL1
/Users/tsupinie/wrf/geog/orogwd_2deg/ol2/00001-00180.00001-00090
--- end interpolate OL2
/Users/tsupinie/wrf/geog/orogwd_2deg/ol3/00001-00180.00001-00090
--- end interpolate OL3
/Users/tsupinie/wrf/geog/orogwd_2deg/ol4/00001-00180.00001-00090
--- end interpolate OL4
/Users/tsupinie/wrf/geog/orogwd_2deg/var/00001-00180.00001-00090
--- end interpolate VAR2D

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x1008e4e8d
#1 0x1008e539b
#2 0x7fff888431b9
#3 0x1000dd11f
#4 0x100103e94
#5 0x1001df71d

I don't know if I'll get any particularly useful responses, since I suspect it has to do with the fact that I'm running this on OS X, and a lot of people prefer Linux. But I thought I'd ask anyway.

Thanks!
Tim Supinie

Laura Fowler

unread,
Oct 9, 2013, 3:16:29 PM10/9/13
to tsup...@gmail.com, mpas-atmos...@googlegroups.com
Hi Tim:
Thanks for your e-mail. I did not have any issue to create the static
file for the 10242 mesh using pgi and OSX 10.7.5. I compiled and ran
MPAS 1.5 with the DEBUG option on. Which compiler are you using?
Thanks,
Laura
> --
> You received this message because you are subscribed to the Google
>Groups "MPAS-Atmosphere Help" group.
> To unsubscribe from this group and stop receiving emails from it,
>send an email to mpas-atmosphere-...@googlegroups.com.
>For more options, visit https://groups.google.com/groups/opt_out.

!----------------------------------------------------
Laura D. Fowler
Mesoscale and Microscale Meteorology Division (MMM)
National Center for Atmospheric Research
P.O. Box 3000, Boulder CO 80307-3000

e-mail: la...@ucar.edu
phone : 303-497-1628

!----------------------------------------------------

Tim Supinie

unread,
Oct 9, 2013, 3:24:36 PM10/9/13
to Laura Fowler, mpas-atmos...@googlegroups.com
I used GNU compilers for everything (gcc 4.8).  Maybe I'll try recompiling with the DEBUG option on and see if that gets me any more useful information.

Thanks!
Tim


To unsubscribe from this group and stop receiving emails from it, send an email to mpas-atmosphere-help+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Tim Supinie

unread,
Oct 9, 2013, 5:14:08 PM10/9/13
to Laura Fowler, mpas-atmos...@googlegroups.com
Huh.  I just re-ran it with the DEBUG option on, and it worked fine.

Thanks for your help!
Tim

Laura Fowler

unread,
Oct 9, 2013, 6:12:14 PM10/9/13
to Tim Supinie, Laura Fowler, mpas-atmos...@googlegroups.com
Hi Tim:
I am glad that it ran but you should be able to use the full
optimization as well.
Laura
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/con/00001-00180.**00001-00090
>>>> --- end interpolate CON
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/oa1/00001-00180.**00001-00090
>>>> --- end interpolate OA1
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/oa2/00001-00180.**00001-00090
>>>> --- end interpolate OA2
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/oa3/00001-00180.**00001-00090
>>>> --- end interpolate OA3
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/oa4/00001-00180.**00001-00090
>>>> --- end interpolate OA4
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/ol1/00001-00180.**00001-00090
>>>> --- end interpolate OL1
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/ol2/00001-00180.**00001-00090
>>>> --- end interpolate OL2
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/ol3/00001-00180.**00001-00090
>>>> --- end interpolate OL3
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/ol4/00001-00180.**00001-00090
>>>> --- end interpolate OL4
>>>> /Users/tsupinie/wrf/geog/**orogwd_2deg/var/00001-00180.**00001-00090
>>>> --- end interpolate VAR2D
>>>>
>>>> Program received signal SIGSEGV: Segmentation fault - invalid memory
>>>> reference.
>>>>
>>>> Backtrace for this error:
>>>> #0 0x1008e4e8d
>>>> #1 0x1008e539b
>>>> #2 0x7fff888431b9
>>>> #3 0x1000dd11f
>>>> #4 0x100103e94
>>>> #5 0x1001df71d
>>>>
>>>> I don't know if I'll get any particularly useful responses, since I
>>>> suspect it has to do with the fact that I'm running this on OS X,
>>>>and a lot
>>>> of people prefer Linux. But I thought I'd ask anyway.
>>>>
>>>> Thanks!
>>>> Tim Supinie
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "MPAS-Atmosphere Help" group.
>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>send
>>>> an email to
>>>>mpas-atmosphere-help+**unsub...@googlegroups.com<mpas-atmosphere-help%2Bunsu...@googlegroups.com>
>>>> .
>>>>
>>>> For more options, visit
>>>>https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>> .
>>>>
>>>
>>> !-----------------------------**-----------------------
>>> Laura D. Fowler
>>> Mesoscale and Microscale Meteorology Division (MMM)
>>> National Center for Atmospheric Research
>>> P.O. Box 3000, Boulder CO 80307-3000
>>>
>>> e-mail: la...@ucar.edu
>>> phone : 303-497-1628
>>>
>>> !-----------------------------**-----------------------

du...@ucar.edu

unread,
Oct 17, 2013, 7:48:41 PM10/17/13
to mpas-atmos...@googlegroups.com, Tim Supinie, Laura Fowler
Hi, Tim.

I'll give a try with gcc 4.8 and see whether I can reproduce the segfault. If there is some underlying bug in the code (rather than in the optimizations that gfortran is doing, for example), it would be nice to get it fixed.

To answer your question about coarser resolutions of static fields, there isn't currently any code in the init_atmosphere core to read data other than 30", which we do realize is overkill in most cases. The ability to use coarser data is something that will probably come along in the not-to-distant future.

Regards,
Michael

du...@ucar.edu

unread,
Oct 17, 2013, 9:26:59 PM10/17/13
to mpas-atmos...@googlegroups.com, Tim Supinie, Laura Fowler
Hi, Tim.

I was able to successfully run init_atmosphere_model on a Linux system (x86_64) with the gcc 4.8.0 compilers for the x1.10242 mesh. If I'm able to get gcc 4.8 on a Mac, I'll give another try. 

Are you by chance using OpenMPI on your system? We've had other users who have had problems with OpenMPI when writing NetCDF output files, and from the log output that you provided, it seems possible that the executable had reached the point where it started to write the static file.

Regards,
Michael

Tim Supinie

unread,
Oct 18, 2013, 12:36:04 PM10/18/13
to du...@ucar.edu, mpas-atmos...@googlegroups.com, Laura Fowler
Yeah, after some messing around with compiling with and without debug options and running various stages of the init_atmosphere_model, it seems like the segmentation fault occurs in the middle of writing out whatever NetCDF file is being written.  I'm using MPICH on my machine.

Thanks!
Tim

du...@ucar.edu

unread,
Nov 5, 2013, 8:04:21 PM11/5/13
to mpas-atmos...@googlegroups.com, du...@ucar.edu, Laura Fowler, tsup...@gmail.com
Hi, Tim.

We can see whether the segfaults are happening in the Parallel-NetCDF code (rather than in the MPAS code) by making a few small code modifications. In src/framework/mpas_io_input.F and src/framework/mpas_io_output.F, you can change MPAS_IO_PNETCDF to MPAS_IO_NETCDF in a total of three places; then, you can re-compile MPAS and see whether you still have problems. At least with OpenMPI, there's apparently some problem in writing buffered output through the Parallel-NetCDF library via PIO, and switching to the serial NetCDF library avoids the issue at the expense of slower I/O. It might be interesting to see if we've found an example of the same behavior with MPICH.

Regards,
Michael

enkiho...@gmail.com

unread,
Mar 23, 2017, 1:33:25 PM3/23/17
to MPAS-Atmosphere Help, du...@ucar.edu, la...@ucar.edu, tsup...@gmail.com
Hi -
Just started experimenting with MPAS. I see this thread is a few years old, but it seems close to what I encountered. I ran in to this problem with the following configuration:
MPAS 5.0
netcdf-4.1.3
PIO 1.9.23
mpich 3.04 (but was run single threaded per instructions)
gfortran/gcc 4.8.5 on Scientific Linux 7.3

Worked fine at 120km, successful runs with live FILE* from our daily GARW runs, cool graphics using NCL), but init_atmo crashed using either the 60km and 30km meshes with errors in the pnetcdf libraries, ncmpii_mgetput (nonblocking.c), called ultimately from mpas_io.F.

Based on the advice upthread, edited mpas_io.F in the region of lines 259-276 to set pio_iotype to PIO_iotype_netcdf rather than PIO_iotype_pnetcdf (and changed pio_mode appropriately). That "fixed" the problem . . . will keep poking at it, but it seems to be in the pnetcdf/pio world. Made a brief detour into PIO2 but had trouble with clean compile and was impatient so reverted to netcdf4.1.3/pio1.9.23. Note this only seems to happen in init_atmosphere_model, so far the model itself seems to work fine on the unedited code.

Thanks,

Chuck

Chuck Watson

Reply all
Reply to author
Forward
0 new messages