need advise for optimisation, for array size of 1000,000

malik...@gmail.com

unread,

Sep 11, 2012, 8:49:23 PM9/11/12

to

Gents,
My program is I think working fine but it takes so long to solve. for array size n=100 it takes about 30min to solve, for 10,000 it takes more than 5 days and haven't finished. my aim is to run it for 1000,000 and to get the result within (at most) one hour.
May I have inputs from the masters as to what can be optimised? my routine is shown below. Many thanksss for thissss

call normal(n,mutaup,stdtaup,taup)
call normal(n,muHs,stdHs,Hs)
call normal(n,mumu,stdmu,mu)
call normal(n,muwsub,stdwsub,wsub)

sum=0
do i=1,n

dhyd=dia + two*(tcoat + tmg)

omep(i)=two*pi/taup(i)

vcur=vcref*(((1+0.001/dhyd)*log(dhyd/0.001+1)-1)/log(zcref/0.001+1))

do j=1,nout
ome=omegamo/nout*j
call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
end do

j=0
suuold=zero
seeold=zero
fm0(i)=zero
fm2(i)=zero
scumm=zero
do
j=j + 1
ome=delint*j
call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep(i))) exit
ttt1=half*(suuold + suu(i))*delint
ttt2=half*(seeold + see(i))*delint
fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1
scumm=scumm + ttt2
suuold=suu(i)
seeold=see(i)
end do

us(i)=two*sqrt(fm0(i))

tu(i)=two*pi*sqrt(fm0(i)/fm2(i))

vel(i)=us(i)+vcur

fd(i)=half*rho*cdrag*dhyd*abs(vel(i))*vel(i)
acc(i)=two*pi*us(i)/tu(i)
fi(i)=rho*(one + cadd)*pi*dhyd*dhyd*acc(i)/four
fh(i)=fd(i) + fi(i)

fl(i)=half*rho*clift*dhyd*abs(vel(i))*vel(i)
fv(i)=fl(i)

! FAILURE PROBABILITY CALCULATION

R(i)=mu(i)*wsub(i)
L(i)=fh(i)+mu(i)*fv(i)

if(R(i)-L(i).le.0)then
count=1
else
count=0
end if
sum=sum+count
end do
Pf=sum/n

end program

dpb

unread,

Sep 11, 2012, 8:56:48 PM9/11/12

to

On 9/11/2012 7:49 PM, malik...@gmail.com wrote:
...
> May I have inputs from the masters as to what can be optimised?...

Before you start optimising, profile, profile, profile...be absolutely
sure you know where the time is being spent.

Don't think it'll help much on this code anyway; I'm guessing kspect()
is the guts of it and there's nothing provided for it--certainly a
cursory glance there surely doesn't look like anything in the top level
here that would take that kind of time.

--

malik...@gmail.com

unread,

Sep 11, 2012, 9:27:54 PM9/11/12

to

hehehe..apologies.. i'm a real beginner, i've just started programming a month ago. I dont even know how to do 'profile'.. i guess by running the subroutines separately, time them and compare them?
anyway..you can find the routine for kspect() below.
Many thanks dpb!

do i=1,n

omep(i)=two*pi/taup(i)
sigw=siga
alpha(i)=five/sixteen*Hs(i)**2*omep(i)**(4)/grav**2*(1-0.287*log(gamma))
if (ome.gt.omep(i)) sigw=sigb
if (ome.gt.1d-10) then
ee1(i)=-half*(((ome - omep(i))/(sigw*omep(i)))**2)
ee2(i)=-five/four*(ome/omep(i))**(-4)
see(i)=alpha(i)*grav*grav*ome**(-5)*exp(ee2(i))*gamma**exp(ee1(i))
else
see=zero
end if

! fk number, use bisection method
fk=zero
if (ome.gt.1.0d-10) then
fkbelow=max(ome*ome/grav, ome/sqrt(grav*depth))
fkabove=two*fkbelow
fkmid=fkbelow
j=1
do
ttt=fkmid - ome*ome/grav/tanh(fkmid*depth)
if (abs(ttt).lt.1.0d-10) exit
if (ttt.gt.zero) then
fkabove=fkmid
else
fkbelow=fkmid
end if
fkmid=half*(fkbelow + fkabove)
j=j + 1
if (j.gt.1000) then
write(iou,*)'ERROR in load routine'
write(iou,*)'internal error, failed to converge'
write(iou,*)'number in 1000 iterations'
end if
end do
fk=fkmid
end if

!
ttt=fk*depth
if (ttt.lt.1.0d-6) then
gtran=sqrt(grav/depth)
else if (ttt.lt.150.0d0) then
gtran=ome/sinh(ttt)
else
gtran=zero
end if

!
suu(i)=gtran*gtran*see(i)
end do
return

end subroutine

Steven G. Kargl

unread,

Sep 11, 2012, 10:17:18 PM9/11/12

to

On Tue, 11 Sep 2012 18:27:54 -0700, malik.astri wrote:

> hehehe..apologies.. i'm a real beginner, i've just started programming
> a month ago. I dont even know how to do 'profile'.. i guess by running
> the subroutines separately, time them and compare them?
> anyway..you can find the routine for kspect() below.

What compiler are you using? What operating system? What options are
you providing with to the compiler.

Please post the entire piece of code. What you posted below does not
show any declarations. In addition, provide whatever input you use
during testing. This way we/someone can compile and inspect whats
going on.

>
> do i=1,n
>

(code deleted to keep this short).

> suu(i)=gtran*gtran*see(i)
> end do
> return

log() and tanh() can be slow.

--
steve

malik...@gmail.com

unread,

Sep 11, 2012, 10:45:01 PM9/11/12

to

Steven,
I'm using Silverfrost Plato for the compiler, the free version for f.95 from a website. For the operating system, it is Intel(R) Core(TM)i7-2600 C...@3.4GHz, RAM 4GB and 32 bit operating System.
I didnt touch the options in the compiler at all, so I guess they are all on default values.

Below, is the whole code and underneath it I've posted also the input file.

Man many thanks Steven!

program
implicit none
integer, parameter :: ni = 1
integer, parameter :: nn = 10000
integer :: nout
integer, parameter :: iin = 16
integer, parameter :: iou = 17
integer, parameter :: iou1 = 18
integer :: i,j,n,count
real(2), parameter :: pi = 3.141592654358979323846d0
real(2), parameter :: zero = 0.0d0
real(2), parameter :: half = 0.5d0
real(2), parameter :: one = 1.0d0
real(2), parameter :: two = 2.0d0
real(2), parameter :: four = 4.0d0
real(2), parameter :: five = 5.0d0
real(2), parameter :: sixteen = 16.0d0
real(2), parameter :: grav = 9.81
real(2) :: siga
real(2) :: sigb
real(2) :: gamma
real(2) :: rho
real(2) :: cdrag
real(2) :: clift
real(2) :: cadd
real(2) :: zcref
real(2) :: vcref
real(2) :: dia
real(2) :: tcoat
real(2) :: tmg
real(2) :: depth
real(2) :: omegamo
real(2) :: delint
real(2) :: cutint
real(2) :: dhyd,fh(nn),fv(nn)
real(2) :: vcur
real(2) :: acc(nn),fi(nn),fd(nn),fl(nn)
real(2) :: mumu,stdmu,L(nn),R(nn),Pf,sum,mu(nn)
real(2) :: muwsub,stdwsub,wsub(nn)
real(2) :: mutaup,stdtaup,taup(nn)
real(2) :: muHs,stdHs,Hs(nn)
character :: input*20
real(2) :: us(nn),tu(nn),vel(nn)
real(2) :: ome
real(2) :: suuold,seeold,fm0(nn),fm2(nn),scumm,ttt1,ttt2
real(2) :: omep(nn),alpha(nn),see(nn),suu(nn)

print*,"type the input file with extention"
read (*,*) input
open (unit=iin,file=input)
read (iin,*) dia
read (iin,*) tcoat
read (iin,*) tmg
read (iin,*) muwsub,stdwsub
read (iin,*) gamma
read (iin,*) siga
read (iin,*) sigb
read (iin,*) depth
read (iin,*) mutaup,stdtaup
read (iin,*) muHs,stdHs
read (iin,*) omegamo
read (iin,*) nout
read (iin,*) delint
read (iin,*) cutint
read (iin,*) zcref
read (iin,*) vcref
read (iin,*) rho
read (iin,*) cdrag
read (iin,*) cadd
read (iin,*) clift
read (iin,*) mumu,stdmu
read (iin,*) n

call normal(n,mutaup,stdtaup,taup)
call normal(n,muHs,stdHs,Hs)
call normal(n,mumu,stdmu,mu)
call normal(n,muwsub,stdwsub,wsub)

sum=0
do i=1,n

!

dhyd=dia + two*(tcoat + tmg)

!
omep(i)=two*pi/taup(i)
!
vcur=vcref*(((1+0.001/dhyd)*log(dhyd/0.001+1)-1)/log(zcref/0.001+1))
!

do j=1,nout
ome=omegamo/nout*j
call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
end do

!

j=0
suuold=zero
seeold=zero
fm0(i)=zero
fm2(i)=zero
scumm=zero
do
j=j + 1
ome=delint*j
call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep(i))) exit
ttt1=half*(suuold + suu(i))*delint
ttt2=half*(seeold + see(i))*delint
fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1
scumm=scumm + ttt2
suuold=suu(i)
seeold=see(i)
end do

!
us(i)=two*sqrt(fm0(i))
!
tu(i)=two*pi*sqrt(fm0(i)/fm2(i))
!
vel(i)=us(i)+vcur

!

fd(i)=half*rho*cdrag*dhyd*abs(vel(i))*vel(i)
acc(i)=two*pi*us(i)/tu(i)
fi(i)=rho*(one + cadd)*pi*dhyd*dhyd*acc(i)/four
fh(i)=fd(i) + fi(i)

!

fl(i)=half*rho*clift*dhyd*abs(vel(i))*vel(i)
fv(i)=fl(i)

! FAILURE PROBABILITY CALCULATION

!
R(i)=mu(i)*wsub(i)
L(i)=fh(i)+mu(i)*fv(i)

!

if(R(i)-L(i).le.0)then
count=1
else
count=0
end if
sum=sum+count
end do
Pf=sum/n

open (unit=iou,file="output.txt",action="write",status="replace")
write (iou,*) 'The probability of failure is ',Pf,' no. of failure ',sum
write (iou,*) R(1)-L(1),R(2)-L(2),R(3)-L(3),R(4)-L(4),R(5)-L(5)
write (iou,*) R(60)-L(60),R(70)-L(70),R(80)-L(80),R(90)-L(90),R(100)-L(100)
close (iou)

end program

subroutine kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)

integer, parameter :: nn = 10000
integer, parameter :: iou = 17
real(2) :: ome
real(2) :: fk
real(2) :: gtran
real(2), parameter :: pi = 3.141592654358979323846d0
real(2), parameter :: zero = 0.0d0
real(2), parameter :: half = 0.5d0
real(2), parameter :: one = 1.0d0
real(2), parameter :: two = 2.0d0
real(2), parameter :: four = 4.0d0
real(2), parameter :: five = 5.0d0
real(2), parameter :: sixteen = 16.0d0
real(2), parameter :: grav = 9.81
real(2) :: siga
real(2) :: sigb
real(2) :: depth
real(2) :: gamma
real(2) :: fkbelow,fkabove,fkmid,ttt,sigw
integer:: i,j,n
real(2) :: taup(nn)
real(2) :: Hs(nn)
real(2) :: omep(nn),alpha(nn),ee1(nn),ee2(nn),see(nn),suu(nn)

do i=1,n
! surface spectral density function

write(iou,*)'ERROR in dload routine'
write(iou,*)'internal error, failed to converge on wave'
write(iou,*)'number in 1000 iternations'

end if
end do
fk=fkmid
end if

!
ttt=fk*depth
if (ttt.lt.1.0d-6) then
gtran=sqrt(grav/depth)
else if (ttt.lt.150.0d0) then
gtran=ome/sinh(ttt)
else
gtran=zero
end if

!
suu(i)=gtran*gtran*see(i)
end do
return

end subroutine

subroutine normal(n,mu,std,dinvnorm)

!symbols parameters
REAL(KIND=2)::RANDOM@,Prob(10000)
real*8:: dinvnorm(10000)
real*8:: p_low,p_high
real*8:: a1,a2,a3,a4,a5,a6
real*8:: b1,b2,b3,b4,b5
real*8:: c1,c2,c3,c4,c5,c6
real*8:: d1,d2,d3,d4
real*8:: z,q,r
real*8:: mu,std
integer::n,i
!constant parameters
a1=-39.6968302866538
a2=220.946098424521
a3=-275.928510446969
a4=138.357751867269
a5=-30.6647980661472
a6=2.50662827745924
b1=-54.4760987982241
b2=161.585836858041
b3=-155.698979859887
b4=66.8013118877197
b5=-13.2806815528857
c1=-0.00778489400243029
c2=-0.322396458041136
c3=-2.40075827716184
c4=-2.54973253934373
c5=4.37466414146497
c6=2.93816398269878
d1=0.00778469570904146
d2=0.32246712907004
d3=2.445134137143
d4=3.75440866190742
p_low=0.02425
p_high=1-p_low

!start subroutine
DO i=1,n
Prob(i)=RANDOM@()
if(Prob(i).lt.p_low) goto 201
if(Prob(i).ge.p_low) goto 301
201 q=dsqrt(-2*dlog(Prob(i)))
z=(((((c1*q+c2)*q+c3)*q+c4)*q+c5)*q+c6)/((((d1*q+d2)*q+d3)*q+d4)*q+1)
goto 204
301 if((Prob(i).ge.p_low).and.(Prob(i).le.p_high)) goto 202
if(Prob(i).gt.p_high) goto 302
202 q=Prob(i)-0.5
r=q*q
z=(((((a1*r+a2)*r+a3)*r+a4)*r+a5)*r+a6)*q/(((((b1*r+b2)*r+b3)*r+b4)*r+b5)*r+1)
goto 204
302 if((Prob(i).gt.p_high).and.(Prob(i).lt.1)) goto 203
203 q=dsqrt(-2*dlog(1-Prob(i)))
z=-(((((c1*q+c2)*q+c3)*q+c4)*q+c5)*q+c6)/((((d1*q+d2)*q+d3)*q+d4)*q+1)
204 dinvnorm(i)=z*std+mu
END DO

end

**********************************************************************

The input data is:

0.4064 !od
0.0213 !tcoat
0.006 !tmg
905.4,0 !wsub (mean,std)
3.27 ! gamma
0.068 ! Signma A
0.087 ! Signma B
119 ! depth
7.89,2.19 ! Taup(mean,std)
8.43,2.34 ! Hs(mean,std)
5 ! omep(rad/s)
1000 ! nou
1e-4 ! delint
1e-10 ! cutint
1 ! zcref
1.052 ! vcref
1025 !
1.4 !
2.29 !
1.0 !
0.64,0.2 ! mu(mean,std)
10000 ! number of samples

Louisa

unread,

Sep 12, 2012, 2:41:40 AM9/12/12

to

On Sep 12, 12:45 pm, malik.as...@gmail.com wrote:

I don't presume to have any idea of what this computation does,
or where time is being spent.
Nevertheless, I do have some specific comments that may be helpful.

dhyd/0.001 etc?
This is better written as 1000*dhyd etc for two reasons:
* 0.001 is single precision; and
* multiplication is faster than division.

Thus, you are losing precision.

No IMPLICIT NONE?

five/four? Why not just 1.25d0?
(ome/omep(i))**(-4) ? Why not (omep(i)/ome) ** 4
May be quicker.

> see(i)=alpha(i)*grav*grav*ome**(-5)*exp(ee2(i))*gamma**exp(ee1(i))

Just a mention that exp can be slow (as also log elsewhere).
...grav*ome**(-5) may be better as grav/ome**5

IMPLICIT NONE?

> !symbols parameters
> REAL(KIND=2)::RANDOM@,Prob(10000)
> real*8:: dinvnorm(10000)
> real*8:: p_low,p_high
> real*8:: a1,a2,a3,a4,a5,a6
> real*8:: b1,b2,b3,b4,b5
> real*8:: c1,c2,c3,c4,c5,c6
> real*8:: d1,d2,d3,d4
> real*8:: z,q,r
> real*8:: mu,std
> integer::n,i
> !constant parameters

You have gone to the trouble of specifying 15 digits in all
the constants below, but they are all single precision because
you omitted "d0".

Les Neilson

unread,

Sep 12, 2012, 6:15:34 AM9/12/12

to

<malik...@gmail.com> wrote in message
news:99275ca0-2605-406e...@googlegroups.com...

> Steven,
> I'm using Silverfrost Plato for the compiler, the free version for f.95
> from a website. For the operating system, it is Intel(R) Core(TM)i7-2600
> C...@3.4GHz, RAM 4GB and 32 bit operating System.
> I didnt touch the options in the compiler at all, so I guess they are all
> on default values.
>
> Below, is the whole code and underneath it I've posted also the input
> file.
>
> Man many thanks Steven!
>

>
> subroutine normal(n,mu,std,dinvnorm)
>
> !symbols parameters
> REAL(KIND=2)::RANDOM@,Prob(10000)

As others have said you should really look at using a code profiler to find
where the optimisation is really needed but :
My little testing of the code shows that the infinite do/enddo loop in your
main program takes a long time to converge to the exit condition.
In my current debugging session i = 1 and j = 1400 and so far ttt1 and ttt2
are always zero and so fm0, fm2, scumm, suuold and seeold are also always
zero
That's a lot of effort for adding zero to things.

A couple of small points :

In subroutine normal
Prob doesn't need to be an array.
It is only used in the do loop and one element at a time Prob(i), without
reference to any previous subscripted elements.
so it can be replaced as a scalar variable.
Also you could do
subroutine normal(n,mu,std,dinvnorm)
implicit none
integer::n
real*8:: mu
real*8:: std
real*8:: dinvnorm(n) ! Since you pass the size of the array as n

You could declare another parameter : twopi as pi*2
replace two*pi with twopi

declare a parameter gravsq as grav*grav
replace grav*grav and grav**2 with gravsq

replace five/sixteen with 0.3125d0

you could replace
302 if((Prob.gt.p_high).and.(Prob.lt.1)) goto 203
with
302 continue
since whatever the value of Prob the next statement executed is at 203

Hope this helps
Les

Les Neilson

unread,

Sep 12, 2012, 7:10:59 AM9/12/12

to

Another small point.
You have write(iou, ... statements in subroutine kspect
but the file has not been opened
In itself this would not be a problem but if you want the messages to go to
output.txt then you need to open that file, probably when you open the input
file.

Les

Les Neilson

unread,

Sep 12, 2012, 10:04:10 AM9/12/12

to

Looking at the code further I am wondering if this is part of a larger
program since there are several arrays which only need to be scalar
variables
Also NN has a value of 10000 but the code shown only outputs a few values of
R and L (maximum R(100) and L(100))
so presumably your do loop in main (and kpsect?) could be reduced to do i =
1, 100

Otherwise
For the code as shown
in kpsect the arrays omep, ee1, and ee2 can just be scalars
and the same in main for
omep, us, tu, vel, fh, fv, acc, fi and fd
and fl doesn't need to be declared at all. It is calculated and immediately
assigned to fv(i) you could instead just assign the calculation to the
scalar fv

Les

Dick Hendrickson

unread,

Sep 12, 2012, 10:40:45 AM9/12/12

to

On 9/11/12 9:45 PM, malik...@gmail.com wrote:
> Steven,
> I'm using Silverfrost Plato for the compiler, the free version for f.95 from a website. For the operating system, it is Intel(R) Core(TM)i7-2600 C...@3.4GHz, RAM 4GB and 32 bit operating System.
> I didnt touch the options in the compiler at all, so I guess they are all on default values.
>

I'd try a newer compiler. The newer compilers generally do a better job
at optimizing for newer hardware. The g* fortrans are free and
relatively easy to install.

You should also look at some of the compiler command line switches.
Many compilers are more or less set to "debug mode" by default. You
should look for general purpose optimization levels, with names like -O
or \OPT; as opposed to ones with long names that you don't understand.
You don't want to mess around with things like
-\assume-no-auto-realloc.

As a general rule, you should run a program with 2 different compilers
when you are developing it. Some compilers are better at detecting odd
errors than others. Also, if you try different optimization levels be
sure to compare the outputs. You'd be surprised what a good optimizer
can do to an incorrectly written program :) .

Dick Hendrickson

Les Neilson

unread,

Sep 12, 2012, 11:36:21 AM9/12/12

to

In subroutine kspect you have

if (ome .gt. 1d-10 ) then
...
else
see = zero ! <== This sets the whole array 'see' to zero
! should it be see(i) = zero ?
endif

Les

Louis Krupp

unread,

Sep 12, 2012, 4:44:28 PM9/12/12

to

On Tue, 11 Sep 2012 19:45:01 -0700 (PDT), malik...@gmail.com wrote:

<snip>

>For the operating system, it is Intel(R) Core(TM)i7-2600 C...@3.4GHz, RAM 4GB and 32 bit operating System.

That's not the operating system; that's the hardware. The operating
system would be Windows or Linux (and there are different versions of
those).

<snip>

> real(2), parameter :: zero = 0.0d0
> real(2), parameter :: half = 0.5d0
> real(2), parameter :: one = 1.0d0
> real(2), parameter :: two = 2.0d0
> real(2), parameter :: four = 4.0d0
> real(2), parameter :: five = 5.0d0
> real(2), parameter :: sixteen = 16.0d0

Are all of those named whole numbers really better than 0, 1, 2, 4, 5
and 16? If, for example, "four" is really a magic number, say
"zfactor," then call it "zfactor."

Louis

Louisa

unread,

Sep 12, 2012, 8:47:41 PM9/12/12

to

On Sep 12, 7:44 pm, "Les Neilson"
<l.neil...@acecadsoftware.REMOVEME.com> wrote:
> <malik.as...@gmail.com> wrote in message

Even better is:
subroutine normal(mu,std,dinvnorm)
implicit none
real*8:: mu
real*8:: std
real(kind=kind(1.0dp)) :: dinvnorm( : )
real(kind=kind(1.0dp)) :: Prob(ubound(dinvnorm,1))
integer::n

n = ubound(dinvnorm,1)

as Prob seems to require the same size as divnorm.
Better than real*8 and kind=2 (which are not portable),
is kind expressed in some way dependent on double precision
or some other comparable means.

Steven G. Kargl

unread,

Sep 12, 2012, 10:26:54 PM9/12/12

to

On Tue, 11 Sep 2012 19:45:01 -0700, malik.astri wrote:

> Steven,
> I'm using Silverfrost Plato for the compiler, the free version for f.95 from a website.
> For the operating system, it is Intel(R) Core(TM)i7-2600 C...@3.4GHz, RAM 4GB and 32 bit
> operating System. I didnt touch the options in the compiler at all, so I guess they are
> all on default values.
>
> Below, is the whole code and underneath it I've posted also the input file.
>
> Man many thanks Steven!
>

(code deleted)

First, a profiler shows that the program spends all its time in
the kspect routine, so that's the routine you need to concentrate
on for reducing the computation time.

In testing over just 20 steps, I've managed to reduce the time
per step from 1/3 to 1/2 of the time for your original code.
Look at the code within the DO-LOOPs and remove anything that
is a constant expression. For example, in the line

alpha(i)=five/sixteen*Hs(i)**2*omep(i)**(4)/grav**2*(1-0.287*log(gamma))

you can compute

five/sixteen*(1-0.287*log(gamma))

outside the loop. Note that log() is expensive to compute, and this save
quite a bit of time. There are several opportunities to optimize. Now,
unfortanately, you can't move

gamma**exp(ee1(i))
tanh(fkmid*depth)
sinh(ttt)

outside of the loop, and that first expression is slowwwww.

Finally, as noted in another reply, the convergence in the bisection method
also kills performance. Here you'll need to improve or change the
algorithm.

--
steve

Louis Krupp

unread,

Sep 12, 2012, 11:34:49 PM9/12/12

to

If no one else has mentioned this:

If you compute log(gamma) outside the loop and store in a variable,
say, lgamma, then -- if I remember basic algebra correctly, which is
not a given at this point -- you might be able to save a *little* time
if you write:

exp(exp(ee1) * lgamma)

instead of

gamma**exp(ee1)

(As Les noted, ee1 probably doesn't have to be an array.)

Louis

Richard Maine

unread,

Sep 13, 2012, 1:23:39 AM9/13/12

to

Steven G. Kargl <s...@REMOVEtroutmask.apl.washington.edu> wrote:

> Finally, as noted in another reply, the convergence in the bisection method
> also kills performance. Here you'll need to improve or change the
> algorithm.

Yeah. I didn't happen to see that other post (yet?), but I was going to
say something along those lines. Twiddling with the "little" stuff, as
all the other replies here are about, can certainly help a bit. I can
well imagine factors of several.

But the OP was looking to take something that ran in 5 days for size
10,000 and run it in an hour for size 100,000. The *ONLY* way that's
going to happen is if he completely reexamines his algorithm long before
getting into issues of programming it.

Bisection is an awfully poor algorithm for most things (I'm afraid I
didn't spend enough time to figure out whether it might be a decent
choice for the particular application, but I rather doubt it). Since
there's a bisection in the innermost loop, fixing that might plausibly
buy a few orders of magnitude. Probably still not enough, but a lot
better than any of the other twiddles (which shoudl probably also be
done).

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

malik...@gmail.com

unread,

Sep 13, 2012, 3:51:21 AM9/13/12

to

Gents,

I cant thank you enough for the inputs and comments (steven, louisa,Les, Dick, Richard, Louis,dpb). I've been juggling this with other stuff so I can accomodate your inputs as much as possible. based on you inputs, as richard says, the improvement in the processing speed is not enough to reach my target, although i didnt expect it to be quite significant sometimes over a small change. For example, just putting several things outside the loop can drop the total time by 4 secs for a small array..this could be of great significance when applied to much larger input arrays.
Yesterday, when DPB mention about profile I immediately follow his suggestion and I found out that the longest time took place here:

j=0

do
j=j + 1
ome=delint*j

call kspect(n,ome,taup,siga,sigb,Hs,conalpha,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep)) exit
ttt1=half*(suuold + suu(i))*delint

fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1

suuold=suu(i)
end do

So, after trying to apply most of your inputs and kept not getting to where I want, I finally found a cheat, which is I think acceptable. As everyone else has said, kspect is the problem. Hence, I change the one of the inputs for kspect -->delint
initially, it was 1e-4.. I changed it to 5e-2. it's dropped the processing time by alot obviously but the results are still ok. with this, I think i can run 100,000 number in 5-6hrs approx. which i can still accept.

Please see below, the modified code:

program
implicit none
integer, parameter :: ni = 1
integer, parameter :: nn = 100

integer :: nout ! Number frequency output samples
integer, parameter :: iin = 16 ! input unit
integer, parameter :: iou = 17 ! output unit
integer, parameter :: iou1 = 18 ! output unit 2

integer :: i,j,n,count
real(2), parameter :: pi = 3.141592654358979323846d0

real(2), parameter :: zero = 0.0d0
real(2), parameter :: half = 0.5d0
real(2), parameter :: one = 1.0d0
real(2), parameter :: two = 2.0d0
real(2), parameter :: four = 4.0d0
real(2), parameter :: five = 5.0d0
real(2), parameter :: sixteen = 16.0d0

real(2), parameter :: grav = 9.81

real(2), parameter :: gravsq = 96.2361
real(2), parameter :: twopi = pi*2

real(2) :: siga
real(2) :: sigb
real(2) :: gamma
real(2) :: rho
real(2) :: cdrag
real(2) :: clift
real(2) :: cadd
real(2) :: zcref
real(2) :: vcref
real(2) :: dia
real(2) :: tcoat
real(2) :: tmg
real(2) :: depth
real(2) :: omegamo
real(2) :: delint
real(2) :: cutint

real(2) :: vcur
real(2) :: mumu,stdmu,Pf,sum
real(2) :: muwsub,stdwsub
real(2) :: mutaup,stdtaup
real(2) :: muHs,stdHs
character :: input*20
real(2) :: ome,conalpha
real(2) :: suuold,ttt1
real(2) :: omep,alpha,see(nn),suu(nn),dhyd,Hs(nn)
real(2) :: fh,fv,acc,fi,fd,fl
real(2) :: L,R,us(nn),tu(nn),vel,fm0(nn),fm2(nn),mu(nn),wsub(nn),taup(nn)
REAL TIME1, TIME2

print*,"type the input file with extention"
read (*,*) input
open (unit=iin,file=input)

CALL CPU_TIME(TIME1)

read (iin,*) dia
read (iin,*) tcoat
read (iin,*) tmg
read (iin,*) muwsub,stdwsub
read (iin,*) gamma
read (iin,*) siga
read (iin,*) sigb
read (iin,*) depth
read (iin,*) mutaup,stdtaup
read (iin,*) muHs,stdHs
read (iin,*) omegamo
read (iin,*) nout
read (iin,*) delint
read (iin,*) cutint
read (iin,*) zcref
read (iin,*) vcref
read (iin,*) rho
read (iin,*) cdrag
read (iin,*) cadd
read (iin,*) clift
read (iin,*) mumu,stdmu
read (iin,*) n

call normal(n,mutaup,stdtaup,taup)
call normal(n,muHs,stdHs,Hs)
call normal(n,mumu,stdmu,mu)
call normal(n,muwsub,stdwsub,wsub)

! initialize
conalpha=0.3125d0*(1-0.287*log(gamma))/gravsq
suuold=zero
!

dhyd=dia + two*(tcoat + tmg)

!
vcur=vcref*(((1+0.001/dhyd)*log(1000.*dhyd+1)-1)/log(1000.*zcref+1))

do i=1,n
! omega peak
omep=twopi/taup(i)
! output spectrums
fm0(i)=zero
fm2(i)=zero
j=0

do
j=j + 1
ome=delint*j

call kspect(n,ome,taup,siga,sigb,Hs,conalpha,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep)) exit
ttt1=half*(suuold + suu(i))*delint

fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1

suuold=suu(i)
end do

!

!
us(i)=two*sqrt(fm0(i))
!
tu(i)=two*pi*sqrt(fm0(i)/fm2(i))
!

vel=us(i)+vcur

!
!
fd=half*rho*cdrag*dhyd*abs(vel)*vel
acc=twopi*us(i)/tu(i)
fi=rho*(one + cadd)*pi*dhyd*dhyd*acc/four
fh=fd + fi

!
fl=half*rho*clift*dhyd*abs(vel)*vel
fv=fl

!
!
R=mu(i)*wsub(i)
L=fh+mu(i)*fv

! monte carlo
if(R-L.le.0)then

count=1
else
count=0
end if
sum=sum+count
end do
Pf=sum/n

CALL CPU_TIME(TIME2)

open (unit=iou,file="output1.txt",action="write",status="replace")
write (iou,*) 'M0 : ',fm0(1)
write (iou,*) 'M2 : ',fm2(1)
write (iou,*) 'Tu : ',tu(1)
write (iou,*) 'U : ',us(1)
write (iou,*) 'V : ',vcur
write (iou,*) 'alpha : ',alpha
write (iou,*) 'Probability ',Pf
write (iou,*) 'Processing time was ', TIME2-TIME1, ' seconds'
close (iou)

print*,'Processing time was ', TIME2-TIME1, ' seconds'

end program

subroutine kspect(n,ome,taup,siga,sigb,Hs,conalpha,alpha,gamma,depth,see,suu)

integer, parameter :: iou = 19 ! output unit

real(2) :: ome
real(2) :: fk
real(2) :: gtran
real(2), parameter :: pi = 3.141592654358979323846d0

real(2), parameter :: zero = 0.0d0
real(2), parameter :: half = 0.5d0
real(2), parameter :: one = 1.0d0
real(2), parameter :: two = 2.0d0
real(2), parameter :: four = 4.0d0
real(2), parameter :: five = 5.0d0
real(2), parameter :: sixteen = 16.0d0

real(2), parameter :: grav = 9.81

real(2), parameter :: gravsq = 96.2361
real(2), parameter :: twopi = pi*2

real(2) :: siga
real(2) :: sigb
real(2) :: depth
real(2) :: gamma

real(2) :: fkbelow,fkabove,fkmid,ttt,sigw,conalpha
integer:: i,j,n
real(2) :: omep,alpha,ee1,ee2,see(n),suu(n),taup(n),Hs(n)

! initialize
sigw=siga

! fk number, use bisection method
fk=zero
if (ome.gt.1.0d-10) then
fkbelow=max(ome*ome/grav, ome/sqrt(grav*depth))
fkabove=two*fkbelow
fkmid=fkbelow
j=1
do
ttt=fkmid - ome*ome/grav/tanh(fkmid*depth)
if (abs(ttt).lt.1.0d-10) exit
if (ttt.gt.zero) then
fkabove=fkmid
else
fkbelow=fkmid
end if
fkmid=half*(fkbelow + fkabove)
j=j + 1
if (j.gt.1000) then

open (unit=iou,file="error.txt",action="write",status="replace")

write(iou,*)'ERROR in dload routine'
write(iou,*)'internal error, failed to converge on wave'

write(iou,*)'number in 1000 iterations'
close (iou)

end if
end do
fk=fkmid
end if

!
ttt=fk*depth
if (ttt.lt.1.0d-6) then
gtran=sqrt(grav/depth)
else if (ttt.lt.150.0d0) then
gtran=ome/sinh(ttt)
else
gtran=zero
end if

do i=1,n
!
omep=twopi/taup(i)
alpha=conalpha*Hs(i)**2*omep**(4)
if (ome.gt.omep) sigw=sigb
if (ome.gt.1d-10) then
ee1=-half*(((ome - omep)/(sigw*omep))**2)
ee2=-five/four*(ome/omep)**(-4)
see(i)=alpha*gravsq*ome**(-5)*exp(ee2)*gamma**exp(ee1)
else
see(i)=zero

end if

!
suu(i)=gtran*gtran*see(i)
end do
return

end subroutine

subroutine normal(n,mu,std,dinvnorm)

!symbols parameters
REAL(KIND=2)::RANDOM@,Prob(n)
real*8:: dinvnorm(n)

real*8:: p_low,p_high
real*8:: a1,a2,a3,a4,a5,a6
real*8:: b1,b2,b3,b4,b5
real*8:: c1,c2,c3,c4,c5,c6
real*8:: d1,d2,d3,d4
real*8:: z,q,r
real*8:: mu,std
integer::n,i
!constant parameters

**********************************************************
input data:

0.4064 !od
0.0213 !tcoat
0.006 !mg
905.4,0 !wsub(mean,std)

3.27 ! gamma
0.068 ! Signma A
0.087 ! Signma B
119 ! depth
7.89,2.19 ! Taup(mean,std)
8.43,2.34 ! Hs (mean,std)
5 !

1000 !
5e-2 ! delint

1e-10 ! cutint
1 ! zcref
1.052 ! vcref

1025 ! density
1.4 ! cd
2.29 ! cm
1.0 ! cl
0.64,0.2 ! mu(mean,std)
100 ! number of samples

malik...@gmail.com

unread,

Sep 13, 2012, 4:00:06 AM9/13/12

to

The other thing that I think can greatly increase the speed is by arranging this:

do i=1,n
! omega peak
omep=twopi/taup(i)
! output spectrums
fm0(i)=zero
fm2(i)=zero
j=0

do
j=j + 1
ome=delint*j

call kspect(n,ome,taup,siga,sigb,Hs,conalpha,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep)) exit
ttt1=half*(suuold + suu(i))*delint

fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1

suuold=suu(i)
end do

to this:

j=0

do i=1,n
! omega peak
omep=twopi/taup(i)
! output spectrums
fm0(i)=zero
fm2(i)=zero

do
j=j + 1
ome=delint*j

call kspect(n,ome,taup,siga,sigb,Hs,conalpha,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep)) exit
ttt1=half*(suuold + suu(i))*delint

fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1

suuold=suu(i)
end do

Note that the only thing I do is just relocating 'j=0' otuside the DO loop. this will help greatly. But i will gives wrong result. This is what I dont understand. j=0 is independednt of the loop anyway, why would change the result? can somebody explain? thank youu

Les Neilson

unread,

Sep 13, 2012, 6:47:37 AM9/13/12

to

<malik...@gmail.com> wrote in message
news:4630786f-4453-4c0d...@googlegroups.com...
<snip>

>Note that the only thing I do is just relocating 'j=0' otuside the DO loop.
>this will help greatly. But i will gives wrong result. This is what I dont
>understand. j=0 is >independednt of the loop anyway, why would change the
>result? can somebody explain? thank youu

When i = 1 then j will be zero
When i = 2 or 3 or ... then j will be whatever value it had at the end of
the other (inner) do loop

Les

Paul Anton Letnes

unread,

Sep 13, 2012, 8:04:00 AM9/13/12

to

Also, to nitpick on something non-performance related,

>> real(2), parameter :: zero = 0.0d0

should be written
>> real(dp), parameter :: zero = 0.0d0
with a paramenter
integer, parameter :: dp = kind(1.0d0)
somerwhere else. Since people are suggesting switching compilers, it
could be important indeed, as the "2" is non-portable, whereas the
kind(1.0d0) is.

Paul

dpb

unread,

Sep 13, 2012, 9:16:48 AM9/13/12

to

...

Yesterday, when DPB mention about profile I immediately follow his
suggestion and I found out that the longest time took place here:

j=0

do
j=j + 1
ome=delint*j
call kspect(n,ome,taup,siga,sigb,Hs,

conalpha,alpha,gamma,depth,see,suu)
if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep)) exit
ttt1=half*(suuold + suu(i))*delint

fm0(i)=fm0(i) + ttt1
fm2(i)=fm2(i) + ome*ome*ttt1

suuold=suu(i)
end do

So, after trying to apply most of your inputs and kept not getting to
where I want, I finally found a cheat, which is I think acceptable. As
everyone else has said, kspect is the problem. Hence, I change the one
of the inputs for kspect -->delint
initially, it was 1e-4.. I changed it to 5e-2. it's dropped the
processing time by alot obviously but the results are still ok. with
this, I think i can run 100,000 number in 5-6hrs approx. which i can
still accept.

...

subroutine kspect(n,ome,taup,siga,sigb,Hs,
conalpha,alpha,gamma,depth,see,suu)

...

! fk number, use bisection method
fk=zero
if (ome.gt.1.0d-10) then
fkbelow=max(ome*ome/grav, ome/sqrt(grav*depth))
fkabove=two*fkbelow
fkmid=fkbelow
j=1
do
ttt=fkmid - ome*ome/grav/tanh(fkmid*depth)
if (abs(ttt).lt.1.0d-10) exit
if (ttt.gt.zero) then
fkabove=fkmid
else
fkbelow=fkmid
end if
fkmid=half*(fkbelow + fkabove)
j=j + 1
if (j.gt.1000) then

...

write(iou,*)'internal error, failed to converge on wave'
write(iou,*)'number in 1000 iterations'

...

end if
end do
fk=fkmid
end if

...

I'd think if you were going to modify a convergence criterion it might
be more fruitful to look at the above 1E-10 and instead use a tolerance
on the value -- I have no idea what the numerical values are; perhaps it
is pretty loose but it might be quite tight...

Also, as others have noted, bisection generally isn't know as being
particularly quick; how good/bad will depend greatly on how "wiggly" the
function is and how good the initial guesses are.

Again, to gain much ground, this is probably the place to look...

If you're not up to writing a different solver, there are many around
that you can download...if you're not aware of it Netlib and the GAMS
taxonomy is worth knowing about if nothing else...

<http://www.netlib.org/misc/gams.html>
<http://netlib3.cs.utk.edu/cgi-bin/search.pl?query=gams/F1b*>
<http://netlib3.cs.utk.edu/go/zeroin.f>

--

Louisa

unread,

Sep 13, 2012, 10:44:29 AM9/13/12

to

On Sep 13, 5:51 pm, malik.as...@gmail.com wrote:

All these constants are still single precision:--

> !constant parameters
> a1=-39.6968302866538
> a2=220.946098424521
> a3=-275.928510446969
> a4=138.357751867269
> a5=-30.6647980661472
> a6=2.50662827745924
> b1=-54.4760987982241
> b2=161.585836858041
> b3=-155.698979859887
> b4=66.8013118877197
> b5=-13.2806815528857
> c1=-0.00778489400243029
> c2=-0.322396458041136
> c3=-2.40075827716184
> c4=-2.54973253934373
> c5=4.37466414146497
> c6=2.93816398269878
> d1=0.00778469570904146
> d2=0.32246712907004
> d3=2.445134137143
> d4=3.75440866190742
> p_low=0.02425

They are not, repeat not, double precision.

dpb

unread,

Sep 13, 2012, 11:31:31 AM9/13/12

to

On 9/13/2012 9:44 AM, Louisa wrote:
> On Sep 13, 5:51 pm, malik.as...@gmail.com wrote:
>
> All these constants are still single precision:--
>
>> !constant parameters
>> a1=-39.6968302866538
>> a2=220.946098424521

...

>> d4=3.75440866190742
>> p_low=0.02425
>
> They are not, repeat not, double precision.

Which reminds me of another possible algorithmic optimization (albeit of
lesser importance than the ksect(sp?) subroutine bottleneck) so I'll
stick it here...

Since this is called so many times, generally there are
acceptance/rejection methods for generating pseudorandom normal deviates
that are generally quicker than the inverse cdf form.

There is one essentially trivial optimization possible in this routine
that may help a little--harvesting either all or a large fraction at a
time of the initial uniform rn's by replacing the RANDOM() function call
as was done in the opening program.

Again, these are after dealing w/ the main problem first, of course,
altho the RANDOM() replacement is easy enough that it might as well be
done given the other mod's made...

--

glen herrmannsfeldt

unread,

Sep 13, 2012, 11:40:19 AM9/13/12

to

Louisa <louisa...@gmail.com> wrote:
> On Sep 13, 5:51�pm, malik.as...@gmail.com wrote:

> All these constants are still single precision:--

>> !constant parameters
>> a1=-39.6968302866538
>> a2=220.946098424521
>> a3=-275.928510446969
>> a4=138.357751867269
>> a5=-30.6647980661472
>> a6=2.50662827745924
>> b1=-54.4760987982241
>> b2=161.585836858041

(snip)

That is certainly true, but most likely doesn't have much
effect on any optimization.

It would be intersting to know where the numbers came from
with that many digits.

-- glen

dpb

unread,

Sep 13, 2012, 12:07:37 PM9/13/12

to

On 9/13/2012 10:40 AM, glen herrmannsfeldt wrote:
...

> It would be intersting to know where the numbers came from
> with that many digits.

While I didn't bother to look it up for certain, Abramowitz and Stegun
is a good bet... :)

--

dpb

unread,

Sep 13, 2012, 12:36:32 PM9/13/12

to

On 9/13/2012 8:16 AM, dpb wrote:
> ...
> Yesterday, when DPB mention about profile I immediately follow his
> suggestion and I found out that the longest time took place here:
>
> j=0
> do
> j=j + 1
> ome=delint*j
> call kspect(n,ome,taup,siga,sigb,Hs,
> conalpha,alpha,gamma,depth,see,suu)

...

OK, one last set of thoughts...

If you're interested in tail probabilities as I presume must be owing to
wanting high numbers of trials, the next area to explore would likely be
stratified sampling.

Another "trick" in MC simulation w/ high-cost evaluation models can be
to replace the complicated model w/ a less expensive approximation. I
have no idea what the response is your model is calculating nor what it
looks like or even how many independent variables there are but if the
response is reasonably smooth it might be feasible to replace the direct
calculation being done now with a response surface or other
approximating model. How suitable this is depends on how well-behaved
the response under study is, of course, but it has been used w/ success
in many areas.

These are, of course, much larger kinds of modifications than simply
changing the existing formulation a little. But, otoh, if the need is
there or maybe you can extend an existing art in the particular area of
application and become famous.... :)

--

Steven G. Kargl

unread,

Sep 13, 2012, 1:17:24 PM9/13/12

to

On Thu, 13 Sep 2012 15:40:19 +0000, glen herrmannsfeldt wrote:

> Louisa <louisa...@gmail.com> wrote:
>> On Sep 13, 5:51 pm, malik.as...@gmail.com wrote:
>
>> All these constants are still single precision:--
>
>>> !constant parameters
>>> a1=-39.6968302866538
>>> a2=220.946098424521
>>> a3=-275.928510446969
>>> a4=138.357751867269
>>> a5=-30.6647980661472
>>> a6=2.50662827745924
>>> b1=-54.4760987982241
>>> b2=161.585836858041
>
> (snip)
>
> That is certainly true, but most likely doesn't have much
> effect on any optimization.

No change in execution time. Changes in computed output
values start at about the 8 or 9th decimal digit.

--
steve

glen herrmannsfeldt

unread,

Sep 13, 2012, 1:29:05 PM9/13/12

to

Oh, yes, my always favorite source for Gaussian Quadrature constants.

A fair number of algorithms need to do computations in double
precision to generate single precision results. (Or worse).

Pretty much anything involving numerical solutions to differential
equations needs to subtract nearly equal values, and so needs
the extra precision.

In most cases, then, you don't need all the digits.

Still, might as well use them.

-- glen

dpb

unread,

Sep 13, 2012, 9:24:34 PM9/13/12

to

On 9/13/2012 12:29 PM, glen herrmannsfeldt wrote:
> dpb<no...@non.net> wrote:
>> On 9/13/2012 10:40 AM, glen herrmannsfeldt wrote:
>> ...
>
>>> It would be intersting to know where the numbers came from
>>> with that many digits.
>
>> While I didn't bother to look it up for certain, Abramowitz and Stegun
>> is a good bet... :)
>
> Oh, yes, my always favorite source for Gaussian Quadrature constants.
>
> A fair number of algorithms need to do computations in double
> precision to generate single precision results. (Or worse).
>

...

> In most cases, then, you don't need all the digits.
>
> Still, might as well use them.

Well, you _would_ add that... :)

OK, it's not from A&S directly altho it's possible it's the Muller
expansion referenced in the back of the section. A quick perusal didn't
find the Muller paper so I quit at that point... :)

Either way, I'd guess one of the more modern acceptance/rejection
methods would outperform it...dontcha just _hate_ it when there's
nothing at all that gives the source of stuff in the code... :( (Of
course, our poster may have trimmed all the "fluff" out to try to keep
posting length to minimum).

--

e p chandler

unread,

Sep 14, 2012, 12:55:58 AM9/14/12

to

On Tuesday, September 11, 2012 10:45:01 PM UTC-4, malik...@gmail.com wrote:
> Below, is the whole code and underneath it I've posted also the input file.

[code snipped]

Examine what subroutine KSPECT does. It only modifies the argument arrays ALPHA, SEE and SUU. ALPHA is not used outside the routine. In the process, it computes FK, which is a function only of OME and DEPTH. Both of these are arguments to the routine but they are used without change. Yet, FK is re-computed N times. GRAV is a constant. Also note that ALL of SEE and SUU are computed on each call to KSPECT.

Let's see how KSPECT is used:

>do i=1,n
> ...

> do j=1,nout
> ome=omegamo/nout*j
> call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
> end do

So each of N times you do NOTHING with the results of KSPECT here.

> j=0
> ...
> do
> j= j + 1

> ome=delint*j
> call kspect(n,ome,taup,siga,sigb,Hs,alpha,gamma,depth,see,suu)
> if ((max(see(i), suu(i)).lt.cutint).and.(ome.gt.omep(i))) exit
> ttt1=half*(suuold + suu(i))*delint
> ttt2=half*(seeold + see(i))*delint
> fm0(i)=fm0(i) + ttt1
> fm2(i)=fm2(i) + ome*ome*ttt1
> scumm=scumm + ttt2
> suuold=suu(i)
> seeold=see(i)
> end do

Here you compute ALL of SEE and SUU but use only the Ith entry.

And after all of this prodigious effort, here is what comes out!

> ...

> if(R(i)-L(i).le.0)then
> count=1
> else
> count=0
> end if
> sum=sum+count
>end do

>Pf=sum/n

COMMENT:

After having struggled with this program, I appreciate the features of modern Fortran more and more. If the OP had used INTENT in his subroutines or had documented which arguments were pure inputs or which were modified by the routine, I think some of his possible optimizations would have been much clearer just by reading the program listing. Also, I think that the OP needs to pay more attention to the notion of SCOPE of variables and what LOCAL variables are.

Excuse my posting here, but most of this thread is no longer available in my newsreader.

--- e

dpb

unread,

Sep 14, 2012, 11:28:37 AM9/14/12

to

On 9/13/2012 11:55 PM, e p chandler wrote:
> On Tuesday, September 11, 2012 10:45:01 PM UTC-4, malik...@gmail.com
> wrote:
>> Below, is the whole code and underneath it I've posted also the
>> input file.
>
> [code snipped]
>

...

...[erudite discussion snipped for brevity]...

> And after all of this prodigious effort, here is what comes out!
>
>> ...
if(R(i)-L(i).le.0)then
count=1
else
count=0
end if
>> sum=sum+count

or, of course,

if(R(i)<=L(i)) sum=sum+1

....

You've hit on where I was headed but hadn't had the time to actually get
there yet...I wondered if the code was/is computing anything at all
close to whatever it is the OP thinks it is...my tentative conclusion is
"I'm not sure...and I doubt he is" :(

What makes it all even worse is look at how R and L above are computed--

R(i)=mu(i)*wsub(i)

and mu and wsub are simply the standard deviates returned from the
subroutine normal and are never modified. So, they are computed from
the original calls at the head of the program--

call normal(n,mumu,stdmu,mu) ! mu, wsub returned
call normal(n,muwsub,stdwsub,wsub)

Now L has a little more in it--

L(i)=fh(i)+mu(i)*fv(i)

but
fi(i)=rho*(1.0D0 + cadd)*pi*dhyd*dhyd*acc(i)/4.0d0
fh(i)=fd(i) + fi(i)
fl(i)=0.5D0*rho*clift*dhyd*abs(vel(i))*vel(i)
fv(i)=fl(i)

working backwards, we finally find that vel is dependent on suu which is
the (apparent) end objective of subroutine kspect but it is the only
one--everything else (I think unless I've missed it which is possible
'cuz I sorta' finished this here on the fly) is either constant or just
a pre-computable scaled sample from a normal distribution.

I've not had time to delve more deeply into kspect itself yet but
certainly a very large fraction of what's being done here is, as you
point out, totally superfluous.

I agree on the lack of use of features in the language that could help
but a little bit of just working through the problem from basics before
beginning coding would also seemed to have gone a long ways... :)

--

e p chandler

unread,

Sep 14, 2012, 7:06:57 PM9/14/12

to

"dpb" wrote in message news:k2vif4$jdk$1...@speranza.aioe.org...

> I wondered if the code was/is computing anything at all close to whatever
> it is the OP thinks it is

[snip]

> I've not had time to delve more deeply into kspect itself yet but
> certainly a very large fraction of what's being done here is, as you point
> out, totally superfluous.

The OP has posted a revised program that removes some of the junk but KSPECT
still loops on I to compute all of SEE and SUU when only the Ith entry is
needed. At least
FK is only computed once.

One quick fix would be for the OP to pass I as an argument to KSPECT and
eliminate the DO loop on I.
Another might be to get rid of KSPECT entirely and write its code within the
main program.

Also, for the sake of future readers of his code, I would recast subroutine
NORMAL in modern Fortran along the lines of

CALL RANDOM_NUMBER(P)
IF (P < lower_tail) then
calculate Z one way
ELSE IF (P > upper_tail) then
calculate Z a second way
ELSE
calculate Z a third way
END IF
transform Z

and test it thoroughly.

---- e

dpb

unread,

Sep 15, 2012, 9:06:40 AM9/15/12

to

On 9/14/2012 6:06 PM, e p chandler wrote:
...

> Also, for the sake of future readers of his code, I would recast
> subroutine NORMAL in modern Fortran along the lines of
>
> CALL RANDOM_NUMBER(P)
> IF (P < lower_tail) then
> calculate Z one way
> ELSE IF (P > upper_tail) then
> calculate Z a second way
> ELSE
> calculate Z a third way
> END IF
> transform Z
>
> and test it thoroughly.

...

Indeed.

I'd probably download a newer acceptance/rejection algorithm from NETLIB
or NIST or one of the other sources... :) Along w/ a modern uniform
generator--I'm always nervous not knowing what a vendor has implemented
and for very high numbers of samples things like serial correlation can
really skew estimates of tail probabilities...

It'll be the first place reviewers will probe if OP ever submits results
to a journal for publication ime...

--

e p chandler

unread,

Sep 19, 2012, 8:46:38 PM9/19/12

to

On Friday, September 14, 2012 11:28:41 AM UTC-4, dpb wrote:
> On 9/13/2012 11:55 PM, e p chandler wrote:
> > On Tuesday, September 11, 2012 10:45:01 PM UTC-4, malik...@gmail.com
> > wrote:
> >> Below, is the whole code and underneath it I've posted also the
> >> input file.
> > [code snipped]
> ...[erudite discussion snipped for brevity]...

> > [discussion of optimization by removing an un-needed inner loop]

> You've hit on where I was headed but hadn't had the time to actually get
> there yet...I wondered if the code was/is computing anything at all
> close to whatever it is the OP thinks it is...my tentative conclusion is
> "I'm not sure...and I doubt he is" :(
> What makes it all even worse is look at how R and L above are computed--
> R(i)=mu(i)*wsub(i)
> and mu and wsub are simply the standard deviates returned from the

Based on the input data, wsub is a constant since its S.D. is zero.

> subroutine normal and are never modified. So, they are computed from
> the original calls at the head of the program--

> Now L has a little more in it--
> L(i)=fh(i)+mu(i)*fv(i)
> but
> fi(i)=rho*(1.0D0 + cadd)*pi*dhyd*dhyd*acc(i)/4.0d0
> fh(i)=fd(i) + fi(i)
> fl(i)=0.5D0*rho*clift*dhyd*abs(vel(i))*vel(i)
> fv(i)=fl(i)
> working backwards, we finally find that vel is dependent on suu which is
> the (apparent) end objective of subroutine kspect but it is the only
> one--everything else (I think unless I've missed it which is possible
> 'cuz I sorta' finished this here on the fly) is either constant or just
> a pre-computable scaled sample from a normal distribution.
>
> I've not had time to delve more deeply into kspect itself yet but
> certainly a very large fraction of what's being done here is, as you
> point out, totally superfluous.

One part of the program is numerical integration which essentially
sums SUU(i) and omega**2 * SUU(i). But this is done in fixed increments of 0.05 instead of what I think is the more natural (omega/omega_peak). Maybe more of the problem would collapse to constants or known values in these terms.

In addition I noticed an additional bug. The test for convergence in bisection is not good. When it fails, the program infinite loops instead of executing a STOP at the first convergence error.

--- e

Robert Miles

unread,

Oct 25, 2012, 11:10:57 PM10/25/12

to

On 9/13/2012 12:29 PM, glen herrmannsfeldt wrote:

> dpb <no...@non.net> wrote:
>> On 9/13/2012 10:40 AM, glen herrmannsfeldt wrote:
>> ...
>
>>> It would be intersting to know where the numbers came from
>>> with that many digits.
>
>> While I didn't bother to look it up for certain, Abramowitz and Stegun
>> is a good bet... :)
>
> Oh, yes, my always favorite source for Gaussian Quadrature constants.
>
> A fair number of algorithms need to do computations in double
> precision to generate single precision results. (Or worse).

I've even worked with a mostly Fortran program that needed quadruple
precision to generate single precision results. Octuple precision
was added later, but I never saw anything clearly indicating that it
was needed.

The program originally had assembler functions to handle the
double precision operations before the hardware supported double
precision; the higher precisions were added by changing those
functions later.