To make a long story short: in gfortran, any use of FFTW's thread
manipulation functions (sfftw_init_threads, sfftw_plan_with_nthreads,
sfftw_cleanup_threads) causes a segmentation fault.
First let me show the main program:
!-------------------------------------------------------------------------------
program FFTW_Test
use system_time_mod
implicit none
include 'fftw3.f'
integer :: i, j, k, n1, n2, n3,
nthreads, stat
integer*8 :: planf, planb
type(timer) :: t
character(len=100) :: ffttype
real, dimension(:,:,:), allocatable :: data
#ifdef FFTW
ffttype = 'FFTW'
#elif MKL
ffttype = 'MKL'
#else
ffttype = 'UNKNOWN'
#endif
n1=750
n2=750
n3=750
call system_time_init()
!-----------------------------------------------------------------------------
! 3D in-place, single-threaded, real-to-complex FFT
!-----------------------------------------------------------------------------
write(0,*)
'================================================================='
write(0,*) '3D in-place, single-threaded, real-to-complex FFT with
',ffttype
write(0,*) 'Array size=',n1,n2,n3
allocate( data(2*(n1/2+1),n2,n3) )
do i=1,n1
do j=1,n2
do k=1,n3
data(i,j,k) = 1.0*(i+j+k)
end do
end do
end do
write(0,*) ' data(25:27,25,25)
',data(25,25,25),data(26,25,25),&
data(27,25,25)
#ifdef MULTI
nthreads = 4
#else
nthreads = 1
#endif
write(0,*) 'nthreads=',nthreads
#ifdef FFTW
write(0,*) "Using FFTW multi-threading"
call sfftw_init_threads(stat)
call sfftw_plan_with_nthreads(nthreads)
#elif MKL
write(0,*) "Using MKL multi-threading"
call mkl_set_num_threads(nthreads)
#endif
call sfftw_plan_dft_r2c_3d(planf, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_plan_dft_c2r_3d(planb, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call start_timer( t )
call sfftw_execute(planf)
call sfftw_execute(planb)
call stop_timer( t )
write(0,*) 'FFT^{-1}[FFT[data(25:27,25,25)]]',data(25,25,25)/
(n1*n2*n3), &
data(26,25,25)/
(n1*n2*n3), &
data(27,25,25)/
(n1*n2*n3)
write(0,*) 'elapsed time:',t%telapsed
call sfftw_destroy_plan(planf)
call sfftw_destroy_plan(planb)
#ifdef FFTW
call sfftw_cleanup_threads()
#endif
deallocate( data )
call exit(0)
end program FFTW_Test
!-------------------------------------------------------------------------------
My apologies for the mangled whitespace and proliferation of
preprocessor directives. The system_time_mod module is simply a
(compiler-dependent) wrapper around system_clock(). I can include the
source code if you are interested. I use the fpp preprocessor with
ifort and the -x f95-cpp-input preprocessor option with gfortran.
Here is how I build the executable using ifort:
----------------------------
fpp -Difort ../../Src/system_time.f90 > fpp_system_time.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB fpp_system_time.f90 -o system_time.o
fpp -DFFTW -DMULTI -I/usr/local/include FFTW_Test.f90 >
fpp_FFTW_Test.f90
ifort -c -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -
static-intel -vms -w -WB -I/usr/local/include fpp_FFTW_Test.f90 -o
FFTW_Test.o
ifort -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -static-
intel -vms -w -WB -I/usr/local/include FFTW_Test.o system_time.o -L/
usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -lpthread -lm -o
FFTW_native_multithreaded_ifort
----------------------------
And here is how I build the executable using gfortran:
----------------------------
gfortran -E -x f95-cpp-input -Dgfortran ../../Src/system_time.f90 >
fpp_system_time.f90
gfortran -c -O2 -static -m64 -w fpp_system_time.f90 -o system_time.o
gfortran -E -x f95-cpp-input -DFFTW -DMULTI -I/usr/local/include
FFTW_Test.f90 > fpp_FFTW_Test.f90
gfortran -c -O2 -static -m64 -w -I/usr/local/include fpp_FFTW_Test.f90
-o FFTW_Test.o
gfortran -O2 -static -m64 -w -I/usr/local/include FFTW_Test.o
system_time.o -L/usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -
lpthread -lm -o FFTW_native_multithreaded_gfortran
----------------------------
The ifort version works as expected, in both single-threaded and multi-
threaded mode. The gfortran version works only if I comment out the
three lines of FFTW thread manipulation code.
Would appreciate any insights. Please let me know if I have provided
sufficient information to recognize/diagnose the problem.
Regards,
Morgan
How was fftw built? With gcc or icc? What options?
What version of gfortran? What happens if you don't use -m64?
Have you tried -Wall -fbounds-check? Or -fbacktrace and/or -g?
Thanks for your response...
On Dec 18, 12:24 pm, ka...@troutmask.apl.washington.edu (Steven G.
Kargl) wrote:
>
> How was fftw built? With gcc or icc? What options?
./configure --enable-single --enable-threads
I believe FFTW defaults to CC=gcc and F77=gfortran
> What version of gfortran? What happens if you don't use -m64?
GNU Fortran (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
It looks like I am out-of-date. I see other references to 4.1.x being
buggy, so I should install 4.3. Mea culpa.
Removing -m64 doesn't seem to do anything.
> Have you tried -Wall -fbounds-check? Or -fbacktrace and/or -g?
I tried -Wall -fbounds-check and nothing different happened. The
backtrace option doesn't seem to be active for v4.1.2. I'll try some
other things before I get into the debugger. ;-)
Cheers,
Morgan
Sorry if this thread is a bit off-topic for comp.lang.fortran.
I can't replicate your problem. Using FFTW 3.1.2 (built with gcc) and
gfortran 4.1.2 (in both cases using the Debian packages), the test
program below seems to run fine on both x86 and x86_64.
I haven't looked carefully at the rest of your program, but perhaps
there is a bug elsewhere in your code, e.g. you are passing the wrong
array sizes to FFTW, and it is purely fortuitous that it doesn't crash
when you don't call the threads code. If the code below works for you
and FFTW's "make check" passes, then this especially likely.
Regards,
Steven G. Johnson
program foo
integer ok
integer*8 plan
double complex data(1024)
call dfftw_init_threads(ok)
call dfftw_plan_with_nthreads(4)
call dfftw_plan_dft_1d(plan, 1024, data, data, -1, 0)
call dfftw_execute(plan)
call dfftw_destroy_plan(plan)
call dfftw_cleanup_threads
end
Thanks for your timely response (to this and to other similar posts).
> Sorry if this thread is a bit off-topic for comp.lang.fortran.
I was torn between comp.dsp and comp.lang.fortran, but I am not sure
if it is an FFTW issue, a gfortran issue, or (most likely) user error.
> I can't replicate your problem. Using FFTW 3.1.2 (built with gcc) and
> gfortran 4.1.2 (in both cases using the Debian packages), the test
> program below seems to run fine on both x86 and x86_64.
Thank you for your program. I tested it (replacing dfftw... with
sfftw...) and it ran successfully with gfortran and ifort, using the
same compiler flags as my earlier test.
> I haven't looked carefully at the rest of your program, but perhaps
> there is a bug elsewhere in your code, e.g. you are passing the wrong
> array sizes to FFTW, and it is purely fortuitous that it doesn't crash
> when you don't call the threads code. If the code below works for you
> and FFTW's "make check" passes, then this especially likely.
I carefully read the FFTW documentation regarding array sizes in c2r/
r2c DFTs, so at least I believe I have gotten that right.
I apologize to the board for providing a klunky example earlier. Too
many degrees of freedom. Here is a stripped down, standalone version
of the earlier program which I provided:
!-------------------------------------------------------------------------------
program FFTW_Test_Stripped
implicit none
include 'fftw3.f'
integer :: i, j, k, n1, n2, n3,
nthreads, stat
integer*8 :: planf, planb
real, dimension(:,:,:), allocatable :: data
n1=750; n2=750; n3=750
allocate( data(2*(n1/2+1),n2,n3) )
do i=1,n1
do j=1,n2
do k=1,n3
data(i,j,k) = 1.0*(i+j+k)
end do
end do
end do
write(0,*) ' data(25:27,25,25)
',data(25,25,25),data(26,25,25),&
data(27,25,25)
nthreads = 4
call sfftw_init_threads(stat)
call sfftw_plan_with_nthreads(nthreads)
call sfftw_plan_dft_r2c_3d(planf, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_plan_dft_c2r_3d(planb, n1, n2, n3, data, data,
FFTW_ESTIMATE);
call sfftw_execute(planf)
call sfftw_execute(planb)
write(0,*) 'FFT^{-1}[FFT[data(25:27,25,25)]]',data(25,25,25)/
(n1*n2*n3), &
data(26,25,25)/
(n1*n2*n3), &
data(27,25,25)/
(n1*n2*n3)
call sfftw_destroy_plan(planf)
call sfftw_destroy_plan(planb)
call sfftw_cleanup_threads()
deallocate( data )
call exit(0)
end program FFTW_Test_Stripped
!-------------------------------------------------------------------------------
If you save the above program to FFTW_Test_stripped.f90, here is the
ifort compilation and execution:
----------
ifort -assume bscc -assume byterecl -fpp -mtune=pentium4 -O3 -static-
intel \
-vms -w -WB -I/usr/local/include FFTW_Test_stripped.f90 \
-L/usr/local/lib -lfftw3f -lfftw3 -lfftw3f_threads -lpthread -lm
\
-o prog
./prog
data(25:27,25,25) 75.00000 76.00000
77.00000
FFT^{-1}[FFT[data(25:27,25,25)]] 75.00010 76.00011
77.00010
----------
Here is the gfortran compilation and execution:
----------
gfortran -O2 -static -w -Wall -fbounds-check -I/usr/local/include \
FFTW_Test_stripped.f90 -L/usr/local/lib -lfftw3f -lfftw3 \
-lfftw3f_threads -lpthread -lm -o prog
./prog
make: *** [stripped_gfortran] Segmentation fault
----------
I am currently installing gcc 4.2.1. The above example was run with
gfortran 4.1.2.
Regards,
Morgan Brown
---------
(gdb) exec-file prog
(gdb) run
Starting program: /home/seispak/LINUX64/svn/DEV-morgan/Test/FFT/prog
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: shared library handler failed to enable breakpoint
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
---------
I haven't compiled FFTW with debug flags...
Oddly enough, the problem seems to be with the -static flag. When I
remove that flag, the segfault goes away for me! Specifically,
gfortran -I/usr/include fftwthrd.f90 -lfftw3f -lfftw3f_threads -
lpthread -lm -o prog
./prog
works for me but
gfortran -static -I/usr/include fftwthrd.f90 -lfftw3f -
lfftw3f_threads -lpthread -lm -o prog
./prog
segfaults on x86_64 with gfortran 4.1.2 (Debian GNU/Linux). However,
both versions work for me on x86 with the same version of gfortran.
I tried writing an "equivalent" C program to see if the same problem
occurred there (i.e. if it was purely an ld problem), but I couldn't
replicate the segfault.
So, it seems that there is some strange interaction between gfortran's
static linker and FFTW's thread library on x86_64??? I'm not sure how
to debug this, but it seems that it *may* be gfortran's problem after
all.
Regards,
Steven G. Johnson
(By the way, if you want to save local variables in gfortran, the
correct flag is -fno-automatic, not -static. The -static flag is for
static linking.)
Well, the correct way to link in pthreads is to use the -pthreads
option instead of just adding -lpthread.
If you search the GCC email archives, you'll find that one can't/shouldn't
use -static with -pthreads because the static libc may not properly
deal with TLS. See
I assume you mean -pthread, not -pthreads; thanks for the info on the
problem with -lpthread and -static, which I wasn't aware of. However,
unfortunately this does not fix the problem:
gfortran -static -pthread -I/usr/include fftwthrd.f90 -lfftw3f -
lfftw3f_threads -lm -o prog
./prog
still segfaults for me, and works without -static.
Regards,
Steven G. Johnson
The workaround listed on that page, using -Wl,--whole-archive -
lpthread -Wl,--no-whole-archive, does fix the segfault for me. So, I
guess this is an instance of the same glibc "feature".
Thanks again, this is good to know about.
Regards,
Steven G. Johnson
Any idea why it works okay when statically linking a C program with
FFTW threads on the same system? Has the C compiler implemented some
workaround for this that gfortran hasn't?
Regards,
Steven G. Johnson
I don't know. This is getting deep into parts of GCC that I never
ventured. It could be that you're just lucky with the C program
while gfortran is forcing some other memory access pattern. I suppose
that you might check the linker options with the -v options to see
if some linker magic is happening.
With regards to -pthreads versus -pthread, on some targets the former
is an alias for the latter. I can never remember which is preferred.
> double complex data(1024)
I have never seen this form before, bug g95 seems to accept it.
g95 also accepts complex*16, also non-standard but maybe
reasonably common.
-- glen
DOUBLE COMPLEX is also reasonably common. The original
AT&T f77 compiler supported it. It was available in other
compilers before the FORTRAN 77 standard was approved.
The extensions QUADRUPLE PRECISION and QUADRUPLE COMPLEX
are uncommon, even though many compilers support REAL*16
and COMPLEX*32. It might be that the length of the
keywords kept them from being implemented. I am unaware
of any compiler that implemented them.
Bob Corbett
>>Steven G. Johnson wrote:
>>(snip)
>>> double complex data(1024)
>>I have never seen this form before, bug g95 seems to accept it.
>>g95 also accepts complex*16, also non-standard but maybe
>>reasonably common.
> DOUBLE COMPLEX is also reasonably common. The original
> AT&T f77 compiler supported it. It was available in other
> compilers before the FORTRAN 77 standard was approved.
COMPLEX*16 goes at least to OS/360 Fortran G and H.
REAL*16 and COMPLEX*32 to H Extended and the 360/85,
and I believe also in VS Fortran.
-- glen
> On Dec 18, 4:41 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> > Steven G. Johnson wrote:
> >
> > (snip)
> >
> > > double complex data(1024)
> >
> > I have never seen this form before, bug g95 seems to accept it.
> > g95 also accepts complex*16, also non-standard but maybe
> > reasonably common.
>
> DOUBLE COMPLEX is also reasonably common. The original
> AT&T f77 compiler supported it. It was available in other
> compilers before the FORTRAN 77 standard was approved.
Yes. I used several compilers hat supported it. I don't recall names,
but I do recall finding it difficult to deal with making code that used
double precision complex (with any spelling) portable. Some compilers
supported the double precision complex syntax, others supported
complex*16, some supported both, and some didn't support anything of the
sort. Of course, all forms of double precision complex were non-standard
until f90.
I dealt with at least one fairly widely used library that used both
syntaxes, making for maximal portability problems. :-( I don't recall
for sure what library that was. Fftpack keeps comming to mind, but I
might be confusing this with its other coding problems (It had several).
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
[snip]
> Of course, all forms of double precision complex were non-standard
> until f90.
Huh. I did not know that.
All the early complex problems didn't suffer from precision issues then? :o)
cheers,
paulv
The computers would suffer from lack of memory to support such a
standard ;-)
A complex number is represent as a "special record" of 2 real numbers,
so it takes about 2 times as much memory space.
COMPILER BUILDERS will correct me on this.
Kind regards,
Jan Gerrit Kootstra
So do half the problem at a time. :o)
cheers,
paulv
It was a pain. Some codes just used one of the nonstandard forms. Other
codes avoided using complex type at all; they just used separate double
precision variables for the real and imaginary parts and expanded out
the necessary operations.
And then, of course, some of the preeminent scientific machines of the
era used 60-bit single precision floats anyway. Use of double precision
on the CDCs was rare (and often as not was a result of porting codes
originally written on other machines; the double precision wasn't realy
needed on the CDC, but was sometimes left in rather than trying to
convert the code).
I long thought it ironic that direct language support of complex was
touted as one of Fortran's strengths, yet before f90 that support was so
crippled that many codes had to choose between using nonstandard forms
or avoiding the use of the feature.