Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

multi-threaded FFTW segmentation fault with gfortran

265 views
Skip to first unread message

mpbro

unread,
Dec 18, 2007, 12:15:07 PM12/18/07
to
I'm setting up the requisite buildsystem to benchmark a variety of FFT
and compiler options. Specific to the purpose of this post, I'm
testing 3D, single precision, real-to-complex-even FFTW
(sfftw_plan_dft_r2c_3d). I'm comparing ifort to gfortran, and single-
threaded to multi-threaded. I'm running an Intel quad-core x86_64
machine with Fedora Core 7.

To make a long story short: in gfortran, any use of FFTW's thread
manipulation functions (sfftw_init_threads, sfftw_plan_with_nthreads,
sfftw_cleanup_threads) causes a segmentation fault.

First let me show the main program:

!-------------------------------------------------------------------------------
program FFTW_Test

use system_time_mod

implicit none

include 'fftw3.f'

integer :: i, j, k, n1, n2, n3,
nthreads, stat
integer*8 :: planf, planb
type(timer) :: t
character(len=100) :: ffttype
real, dimension(:,:,:), allocatable :: data

#ifdef FFTW
ffttype = 'FFTW'
#elif MKL
ffttype = 'MKL'
#else
ffttype = 'UNKNOWN'
#endif

n1=750
n2=750
n3=750

call system_time_init()

!-----------------------------------------------------------------------------
! 3D in-place, single-threaded, real-to-complex FFT
!-----------------------------------------------------------------------------
write(0,*)
'================================================================='
write(0,*) '3D in-place, single-threaded, real-to-complex FFT with
',ffttype
write(0,*) 'Array size=',n1,n2,n3

allocate( data(2*(n1/2+1),n2,n3) )

do i=1,n1
do j=1,n2
do k=1,n3
data(i,j,k) = 1.0*(i+j+k)
end do
end do
end do

write(0,*) ' data(25:27,25,25)
',data(25,25,25),data(26,25,25),&
data(27,25,25)

#ifdef MULTI
nthreads = 4
#else
nthreads = 1
#endif

write(0,*) 'nthreads=',nthreads

#ifdef FFTW
write(0,*) "Using FFTW multi-threading"
call sfftw_init_threads(stat)
call sfftw_plan_with_nthreads(nthreads)
#elif MKL
write(0,*) "Using MKL multi-threading"
call mkl_set_num_threads(nthreads)
#endif

call sfftw_plan_dft_r2c_3d(planf, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_plan_dft_c2r_3d(planb, n1, n2, n3, data, data,
FFTW_ESTIMATE);

call start_timer( t )
call sfftw_execute(planf)
call sfftw_execute(planb)
call stop_timer( t )

write(0,*) 'FFT^{-1}[FFT[data(25:27,25,25)]]',data(25,25,25)/
(n1*n2*n3), &
data(26,25,25)/
(n1*n2*n3), &
data(27,25,25)/
(n1*n2*n3)

write(0,*) 'elapsed time:',t%telapsed

call sfftw_destroy_plan(planf)
call sfftw_destroy_plan(planb)
#ifdef FFTW
call sfftw_cleanup_threads()
#endif
deallocate( data )

call exit(0)

end program FFTW_Test
!-------------------------------------------------------------------------------

My apologies for the mangled whitespace and proliferation of
preprocessor directives. The system_time_mod module is simply a
(compiler-dependent) wrapper around system_clock(). I can include the
source code if you are interested. I use the fpp preprocessor with
ifort and the -x f95-cpp-input preprocessor option with gfortran.

Here is how I build the executable using ifort:

----------------------------
fpp -Difort ../../Src/system_time.f90 > fpp_system_time.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB fpp_system_time.f90 -o system_time.o
fpp -DFFTW -DMULTI -I/usr/local/include FFTW_Test.f90 >
fpp_FFTW_Test.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB -I/usr/local/include fpp_FFTW_Test.f90 -o
FFTW_Test.o
ifort -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -static-
intel -vms -w -WB -I/usr/local/include FFTW_Test.o system_time.o -L/
usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -lpthread -lm -o
FFTW_native_multithreaded_ifort
----------------------------

And here is how I build the executable using gfortran:

----------------------------
gfortran -E -x f95-cpp-input -Dgfortran ../../Src/system_time.f90 >
fpp_system_time.f90
gfortran -c -O2 -static -m64 -w fpp_system_time.f90 -o system_time.o
gfortran -E -x f95-cpp-input -DFFTW -DMULTI -I/usr/local/include
FFTW_Test.f90 > fpp_FFTW_Test.f90
gfortran -c -O2 -static -m64 -w -I/usr/local/include fpp_FFTW_Test.f90
-o FFTW_Test.o
gfortran -O2 -static -m64 -w -I/usr/local/include FFTW_Test.o
system_time.o -L/usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -
lpthread -lm -o FFTW_native_multithreaded_gfortran
----------------------------

The ifort version works as expected, in both single-threaded and multi-
threaded mode. The gfortran version works only if I comment out the
three lines of FFTW thread manipulation code.

Would appreciate any insights. Please let me know if I have provided
sufficient information to recognize/diagnose the problem.

Regards,
Morgan

Steven G. Kargl

unread,
Dec 18, 2007, 1:24:59 PM12/18/07
to
In article <19c8c608-aff7-4743...@b40g2000prf.googlegroups.com>,

mpbro <morgan...@gmail.com> writes:
>
> To make a long story short: in gfortran, any use of FFTW's thread
> manipulation functions (sfftw_init_threads, sfftw_plan_with_nthreads,
> sfftw_cleanup_threads) causes a segmentation fault.
>

How was fftw built? With gcc or icc? What options?

What version of gfortran? What happens if you don't use -m64?

Have you tried -Wall -fbounds-check? Or -fbacktrace and/or -g?

--
Steve
http://troutmask.apl.washington.edu/~kargl/

mpbro

unread,
Dec 18, 2007, 1:44:34 PM12/18/07
to
Steve,

Thanks for your response...

On Dec 18, 12:24 pm, ka...@troutmask.apl.washington.edu (Steven G.


Kargl) wrote:
>
> How was fftw built? With gcc or icc? What options?

./configure --enable-single --enable-threads

I believe FFTW defaults to CC=gcc and F77=gfortran

> What version of gfortran? What happens if you don't use -m64?

GNU Fortran (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)

It looks like I am out-of-date. I see other references to 4.1.x being
buggy, so I should install 4.3. Mea culpa.

Removing -m64 doesn't seem to do anything.

> Have you tried -Wall -fbounds-check? Or -fbacktrace and/or -g?

I tried -Wall -fbounds-check and nothing different happened. The
backtrace option doesn't seem to be active for v4.1.2. I'll try some
other things before I get into the debugger. ;-)

Cheers,
Morgan

Steven G. Johnson

unread,
Dec 18, 2007, 2:02:06 PM12/18/07
to
On Dec 18, 12:15 pm, mpbro <morganpbr...@gmail.com> wrote:
> To make a long story short: in gfortran, any use of FFTW's thread
> manipulation functions (sfftw_init_threads, sfftw_plan_with_nthreads,
> sfftw_cleanup_threads) causes a segmentation fault.

Sorry if this thread is a bit off-topic for comp.lang.fortran.

I can't replicate your problem. Using FFTW 3.1.2 (built with gcc) and
gfortran 4.1.2 (in both cases using the Debian packages), the test
program below seems to run fine on both x86 and x86_64.

I haven't looked carefully at the rest of your program, but perhaps
there is a bug elsewhere in your code, e.g. you are passing the wrong
array sizes to FFTW, and it is purely fortuitous that it doesn't crash
when you don't call the threads code. If the code below works for you
and FFTW's "make check" passes, then this especially likely.

Regards,
Steven G. Johnson

program foo
integer ok
integer*8 plan
double complex data(1024)
call dfftw_init_threads(ok)
call dfftw_plan_with_nthreads(4)
call dfftw_plan_dft_1d(plan, 1024, data, data, -1, 0)
call dfftw_execute(plan)
call dfftw_destroy_plan(plan)
call dfftw_cleanup_threads
end

mpbro

unread,
Dec 18, 2007, 2:42:22 PM12/18/07
to
Professor Johnson,

Thanks for your timely response (to this and to other similar posts).

> Sorry if this thread is a bit off-topic for comp.lang.fortran.

I was torn between comp.dsp and comp.lang.fortran, but I am not sure
if it is an FFTW issue, a gfortran issue, or (most likely) user error.

> I can't replicate your problem. Using FFTW 3.1.2 (built with gcc) and
> gfortran 4.1.2 (in both cases using the Debian packages), the test
> program below seems to run fine on both x86 and x86_64.

Thank you for your program. I tested it (replacing dfftw... with
sfftw...) and it ran successfully with gfortran and ifort, using the
same compiler flags as my earlier test.

> I haven't looked carefully at the rest of your program, but perhaps
> there is a bug elsewhere in your code, e.g. you are passing the wrong
> array sizes to FFTW, and it is purely fortuitous that it doesn't crash
> when you don't call the threads code. If the code below works for you
> and FFTW's "make check" passes, then this especially likely.

I carefully read the FFTW documentation regarding array sizes in c2r/
r2c DFTs, so at least I believe I have gotten that right.

I apologize to the board for providing a klunky example earlier. Too
many degrees of freedom. Here is a stripped down, standalone version
of the earlier program which I provided:

!-------------------------------------------------------------------------------
program FFTW_Test_Stripped

implicit none

include 'fftw3.f'

integer :: i, j, k, n1, n2, n3,
nthreads, stat
integer*8 :: planf, planb

real, dimension(:,:,:), allocatable :: data

n1=750; n2=750; n3=750

allocate( data(2*(n1/2+1),n2,n3) )

do i=1,n1
do j=1,n2
do k=1,n3
data(i,j,k) = 1.0*(i+j+k)
end do
end do
end do

write(0,*) ' data(25:27,25,25)
',data(25,25,25),data(26,25,25),&
data(27,25,25)

nthreads = 4
call sfftw_init_threads(stat)
call sfftw_plan_with_nthreads(nthreads)


call sfftw_plan_dft_r2c_3d(planf, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_plan_dft_c2r_3d(planb, n1, n2, n3, data, data,
FFTW_ESTIMATE);

call sfftw_execute(planf)
call sfftw_execute(planb)

write(0,*) 'FFT^{-1}[FFT[data(25:27,25,25)]]',data(25,25,25)/
(n1*n2*n3), &
data(26,25,25)/
(n1*n2*n3), &
data(27,25,25)/
(n1*n2*n3)

call sfftw_destroy_plan(planf)
call sfftw_destroy_plan(planb)
call sfftw_cleanup_threads()

deallocate( data )

call exit(0)

end program FFTW_Test_Stripped
!-------------------------------------------------------------------------------

If you save the above program to FFTW_Test_stripped.f90, here is the
ifort compilation and execution:

----------


ifort -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -static-
intel \

-vms -w -WB -I/usr/local/include FFTW_Test_stripped.f90 \
-L/usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -lpthread -lm
\
-o prog

./prog
data(25:27,25,25) 75.00000 76.00000
77.00000
FFT^{-1}[FFT[data(25:27,25,25)]] 75.00010 76.00011
77.00010
----------

Here is the gfortran compilation and execution:

----------
gfortran -O2 -static -w -Wall -fbounds-check -I/usr/local/include \
FFTW_Test_stripped.f90 -L/usr/local/lib -lfftw3f -lfftw3 \
-lfftw3f_threads -lpthread -lm -o prog

./prog
make: *** [stripped_gfortran] Segmentation fault
----------

I am currently installing gcc 4.2.1. The above example was run with
gfortran 4.1.2.

Regards,
Morgan Brown

mpbro

unread,
Dec 18, 2007, 3:44:39 PM12/18/07
to
Compiling the aforementioned "stripped down" code with -g and running
in gdb doesn't yield any clues:

---------
(gdb) exec-file prog
(gdb) run
Starting program: /home/seispak/LINUX64/svn/DEV-morgan/Test/FFT/prog
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: shared library handler failed to enable breakpoint

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
---------

I haven't compiled FFTW with debug flags...

Steven G. Johnson

unread,
Dec 18, 2007, 4:24:44 PM12/18/07
to
Thanks for your test program; I don't see any problems with the code,
and I can replicate your segfault now. (I reduced the transform size
to 70x70x70, which makes the code run more quickly and doesn't change
the segfault for me.)

Oddly enough, the problem seems to be with the -static flag. When I
remove that flag, the segfault goes away for me! Specifically,

gfortran -I/usr/include fftwthrd.f90 -lfftw3f -lfftw3f_threads -
lpthread -lm -o prog
./prog

works for me but

gfortran -static -I/usr/include fftwthrd.f90 -lfftw3f -


lfftw3f_threads -lpthread -lm -o prog
./prog

segfaults on x86_64 with gfortran 4.1.2 (Debian GNU/Linux). However,
both versions work for me on x86 with the same version of gfortran.

I tried writing an "equivalent" C program to see if the same problem
occurred there (i.e. if it was purely an ld problem), but I couldn't
replicate the segfault.

So, it seems that there is some strange interaction between gfortran's
static linker and FFTW's thread library on x86_64??? I'm not sure how
to debug this, but it seems that it *may* be gfortran's problem after
all.

Regards,
Steven G. Johnson

Steven G. Johnson

unread,
Dec 18, 2007, 4:33:31 PM12/18/07
to
On Dec 18, 4:24 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> So, it seems that there is some strange interaction between gfortran's
> static linker and FFTW's thread library on x86_64??? I'm not sure how
> to debug this, but it seems that it *may* be gfortran's problem after
> all.

(By the way, if you want to save local variables in gfortran, the
correct flag is -fno-automatic, not -static. The -static flag is for
static linking.)

Steven G. Kargl

unread,
Dec 18, 2007, 4:43:57 PM12/18/07
to
In article <c3d5e235-dfb9-4369...@1g2000hsl.googlegroups.com>,

Well, the correct way to link in pthreads is to use the -pthreads
option instead of just adding -lpthread.

If you search the GCC email archives, you'll find that one can't/shouldn't
use -static with -pthreads because the static libc may not properly
deal with TLS. See

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30471

--
Steve
http://troutmask.apl.washington.edu/~kargl/

Steven G. Johnson

unread,
Dec 18, 2007, 5:05:24 PM12/18/07
to
On Dec 18, 4:43 pm, ka...@troutmask.apl.washington.edu (Steven G.

Kargl) wrote:
> > So, it seems that there is some strange interaction between gfortran's
> > static linker and FFTW's thread library on x86_64??? I'm not sure how
> > to debug this, but it seems that it *may* be gfortran's problem after
> > all.
>
> Well, the correct way to link in pthreads is to use the -pthreads
> option instead of just adding -lpthread.

I assume you mean -pthread, not -pthreads; thanks for the info on the
problem with -lpthread and -static, which I wasn't aware of. However,
unfortunately this does not fix the problem:

gfortran -static -pthread -I/usr/include fftwthrd.f90 -lfftw3f -
lfftw3f_threads -lm -o prog
./prog

still segfaults for me, and works without -static.

Regards,
Steven G. Johnson

Steven G. Johnson

unread,
Dec 18, 2007, 5:09:32 PM12/18/07
to
Ooops, I reread your message and the link, and I guess you are saying
that -static is not supported with the glibc pthread library no matter
how you link it.

The workaround listed on that page, using -Wl,--whole-archive -
lpthread -Wl,--no-whole-archive, does fix the segfault for me. So, I
guess this is an instance of the same glibc "feature".

Thanks again, this is good to know about.

Regards,
Steven G. Johnson

mpbro

unread,
Dec 18, 2007, 5:30:11 PM12/18/07
to
Thanks for a very informative series of posts.

Steven G. Johnson

unread,
Dec 18, 2007, 5:34:32 PM12/18/07
to
On Dec 18, 4:43 pm, ka...@troutmask.apl.washington.edu (Steven G.
Kargl) wrote:
> If you search the GCC email archives, you'll find that one can't/shouldn't
> use -static with -pthreads because the static libc may not properly
> deal with TLS. See
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30471

Any idea why it works okay when statically linking a C program with
FFTW threads on the same system? Has the C compiler implemented some
workaround for this that gfortran hasn't?

Regards,
Steven G. Johnson

Steven G. Kargl

unread,
Dec 18, 2007, 6:46:25 PM12/18/07
to
In article <ad2f8ffd-72f7-4930...@i12g2000prf.googlegroups.com>,

"Steven G. Johnson" <ste...@alum.mit.edu> writes:

I don't know. This is getting deep into parts of GCC that I never
ventured. It could be that you're just lucky with the C program
while gfortran is forcing some other memory access pattern. I suppose
that you might check the linker options with the -v options to see
if some linker magic is happening.

With regards to -pthreads versus -pthread, on some targets the former
is an alias for the latter. I can never remember which is preferred.

--
Steve
http://troutmask.apl.washington.edu/~kargl/

glen herrmannsfeldt

unread,
Dec 18, 2007, 7:41:32 PM12/18/07
to
Steven G. Johnson wrote:
(snip)

> double complex data(1024)

I have never seen this form before, bug g95 seems to accept it.
g95 also accepts complex*16, also non-standard but maybe
reasonably common.

-- glen

robert....@sun.com

unread,
Dec 19, 2007, 3:00:11 AM12/19/07
to

DOUBLE COMPLEX is also reasonably common. The original
AT&T f77 compiler supported it. It was available in other
compilers before the FORTRAN 77 standard was approved.

The extensions QUADRUPLE PRECISION and QUADRUPLE COMPLEX
are uncommon, even though many compilers support REAL*16
and COMPLEX*32. It might be that the length of the
keywords kept them from being implemented. I am unaware
of any compiler that implemented them.

Bob Corbett

glen herrmannsfeldt

unread,
Dec 19, 2007, 4:09:44 AM12/19/07
to
robert....@sun.com wrote:
> On Dec 18, 4:41 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

>>Steven G. Johnson wrote:

>>(snip)

>>> double complex data(1024)

>>I have never seen this form before, bug g95 seems to accept it.
>>g95 also accepts complex*16, also non-standard but maybe
>>reasonably common.

> DOUBLE COMPLEX is also reasonably common. The original
> AT&T f77 compiler supported it. It was available in other
> compilers before the FORTRAN 77 standard was approved.

COMPLEX*16 goes at least to OS/360 Fortran G and H.
REAL*16 and COMPLEX*32 to H Extended and the 360/85,
and I believe also in VS Fortran.

-- glen

Richard Maine

unread,
Dec 19, 2007, 11:14:38 AM12/19/07
to
<robert....@sun.com> wrote:

> On Dec 18, 4:41 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> > Steven G. Johnson wrote:
> >
> > (snip)
> >
> > > double complex data(1024)
> >
> > I have never seen this form before, bug g95 seems to accept it.
> > g95 also accepts complex*16, also non-standard but maybe
> > reasonably common.
>
> DOUBLE COMPLEX is also reasonably common. The original
> AT&T f77 compiler supported it. It was available in other
> compilers before the FORTRAN 77 standard was approved.

Yes. I used several compilers hat supported it. I don't recall names,
but I do recall finding it difficult to deal with making code that used
double precision complex (with any spelling) portable. Some compilers
supported the double precision complex syntax, others supported
complex*16, some supported both, and some didn't support anything of the
sort. Of course, all forms of double precision complex were non-standard
until f90.

I dealt with at least one fairly widely used library that used both
syntaxes, making for maximal portability problems. :-( I don't recall
for sure what library that was. Fftpack keeps comming to mind, but I
might be confusing this with its other coding problems (It had several).

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

Paul van Delst

unread,
Dec 19, 2007, 11:37:58 AM12/19/07
to
Richard Maine wrote:

[snip]

> Of course, all forms of double precision complex were non-standard
> until f90.

Huh. I did not know that.

All the early complex problems didn't suffer from precision issues then? :o)

cheers,

paulv

Jan Gerrit Kootstra

unread,
Dec 19, 2007, 12:01:21 PM12/19/07
to
Paul van Delst,


The computers would suffer from lack of memory to support such a
standard ;-)

A complex number is represent as a "special record" of 2 real numbers,
so it takes about 2 times as much memory space.

COMPILER BUILDERS will correct me on this.


Kind regards,


Jan Gerrit Kootstra

Paul van Delst

unread,
Dec 19, 2007, 12:03:49 PM12/19/07
to
Jan Gerrit Kootstra wrote:
> Paul van Delst wrote:
>> Richard Maine wrote:
>>
>> [snip]
>>
>>> Of course, all forms of double precision complex were non-standard
>>> until f90.
>>
>>
>> Huh. I did not know that.
>>
>> All the early complex problems didn't suffer from precision issues
>> then? :o)
>>
>> cheers,
>>
>> paulv
> Paul van Delst,
>
>
> The computers would suffer from lack of memory to support such a
> standard ;-)
>
> A complex number is represent as a "special record" of 2 real numbers,
> so it takes about 2 times as much memory space.

So do half the problem at a time. :o)

cheers,

paulv

Richard Maine

unread,
Dec 19, 2007, 12:31:29 PM12/19/07
to

It was a pain. Some codes just used one of the nonstandard forms. Other
codes avoided using complex type at all; they just used separate double
precision variables for the real and imaginary parts and expanded out
the necessary operations.

And then, of course, some of the preeminent scientific machines of the
era used 60-bit single precision floats anyway. Use of double precision
on the CDCs was rare (and often as not was a result of porting codes
originally written on other machines; the double precision wasn't realy
needed on the CDC, but was sometimes left in rather than trying to
convert the code).

I long thought it ironic that direct language support of complex was
touted as one of Fortran's strengths, yet before f90 that support was so
crippled that many codes had to choose between using nonstandard forms
or avoiding the use of the feature.

0 new messages