> "gah4" wrote in message
> news:0ab6410a-af0e-4f5d...@googlegroups.com...
> On Wednesday, December 15, 2021 at 11:06:07 PM UTC-8,
bjbr...@gmail.com
> wrote:
> (snip, I wrote)
> Just to be sure... I was not concerned about the overhead of the function
> call.
> > Once again the structure of that function Foo, here written as a scalar
> > elemental function. This is the three-term recurrence for
> > Laguerre polynomial r = L[n](x).
> Oh, that one. Many years ago (Fortran 66 days) I was working on a program
> written by someone else that way too slow. It computed a bunch of
> integrals with Gaussian quadrature, the first time I knew about that.
> Now, Gaussian quadrature is pretty fast, but it was doing a lot of them.
> As well as remember now, there was a subroutine that would compute
> two of them, then divide, with the second one being a normalization.
> In any case, for reasons similar to what you say, it was recomputing
> the same thing many times. I think to fix it, I separately computed the
> integrals, moved some things out of a loop, and so eliminated much
> duplication.
Here you know that the O.P. is not doing Gauss-Laguerre quadrature because
his function returns only L[n](x) and not also L[n-1](x). I'm not sure what
application needs so many values of Laguerre polynomials: one I can think
of is hydrogenlike atoms, but they need generalized Laguerre polynomials
and there is no extra parameter in his code either.
> > Suppose now that this function is called with scalar argument (n) and
> > array argument (x). Then I would think that we really want a loop within
> > each branch of the if-then-else construct and maybe within the (do k =
> > 2, n)
> > as well; we don't want one single loop over the entire code, with the
> > test
> > on (n) inside the loop. And if Foo is declared elemental and there is
> > not
> > a special version for the case of scalar (n) and array (x), then I
> > expect the
> > compiler to do just that, place the loop outside the if-then-else.
> > However, to add to my confusion I just did a timing test.
> > One instance where Foo is just an elemental function, no further
> > overloading, and another version where the elemental function is
> > overloaded with a function for arguments scalar (n) and array (x).
> > Makes no difference in the timing when using gfortran;
> > I tried it with -O0 and with -O3.
> I think that sounds right. The statements outside the inner loop
> are fairly fast, and no so many, so doing them mutliple times
> make a small difference. Branch prediction will mean almost
> no delay to doing the if, that you might have expected.
> If there was a large part of the calculation depending on n,
> but not on x, then the difference would be larger.
> I think your idea is good, but doesn't apply to this example.
Well, I looked at some of my code:
subroutine laguerre(n,alpha,x,y,y1,nsturm)
integer, intent(in) :: n
real(wp), intent(in) :: alpha
real(wp), intent(in) :: x
real(wp), intent(out) :: y
real(wp), intent(out) :: y1
integer, optional :: nsturm
integer k
real(wp) y2
y = 1
y1 = 0
if(present(nsturm)) nsturm = 0
do k = 0, n-1
y2 = y1
y1 = y
y = ((2*k+alpha+1-x)*y1-k*y2)/(k+alpha+1)
if(present(nsturm)) nsturm = merge(nsturm+1,nsturm, y/=0 .AND. y*y1 <=
0)
end do
end subroutine laguerre
On comparison I see that there are early-out tests in the O.P.'s code which
are unnecessary. Of course for his usage there is no need for the Sturm
sequence value nor the parameter alpha which makes it generalized rather
than simple Laguerre. Also he doesn't have to return L[n-1](x) because, as
pointed out earlier, he can't be doing Gaussian quadrature.
But there is a big potential advantage in the array call vs. the elemental
call: an x86 processor can perform 4 wide multiply-accumulates on double
precision data via two pipelines. Have you checked whether gfortran produces
4-wide AVX code for the array version? Those divisions take a long time. It
might be useful to create an array of reciprocals on function entry and then
reuse it in the main loop.
But in any case, don't try to guess what the compiler is doing here but look
at the assembly code it generates.