Count function using defined array values

nano3174

unread,

Apr 23, 2023, 3:01:20 PM4/23/23

to

This program originally was a test program to see if I could count numbers in a specific range. Now I am making a program to do the same thing, but for 4 different columns separately (20 bins for each column b,c,d,e.) So, for column b I want 20 bins with the complete bin range going from (-3 to 3). For example, bin1 for column b will count how many numbers in that column are in the range (-3 to -2.7), bin2 would count for the range (-2.7, -2.4), etc.. I want the bin width and each bin range to be the same for each column so that each column is counted for the same ranges. Originally, I had a horribly inefficient program (which worked) but in the end I had 80 if statements (20 for each column.) I wanted to see if I could use arrays and the count function to reduce it to just 4 lines. I have seen suggestions in the comments to have an array which looks like:

binb(i)=binb(i)+count(b>=lower(i) .and. b<upper(i))
binc(i)=binc(i)+count(c>=lower(i) .and. c<upper(i))
bind(i)=bind(i)+count(d>=lower(i) .and. d<upper(i))
bine(i)=bine(i)+count(e>=lower(i) .and. e<upper(I))

but lower and upper must be arrays instead of scalars... Here is my program thus far:

program mean_analysis
implicit none

integer i, j, k, N, l
double precision a, b, c, d, e
integer binb(1:20),binc(1:20),bind(1:20),bine(1:20)
real lower(1:20),upper(1:20)

character(100) event

upper(1)=-2.7
lower(1)=-3

open(unit = 7, file="zpc_initial_momenta.dat")
do l=2,20
lower(l)=lower(l-1)+.3
upper(l)=upper(l-1)+.3
end do

do k=1, 10
read(7,'(A)') event
do j=1,4000
read(7,*) a, b, c, d, e
do i=1,20
binb(i)=binb(i)+count(b>=lower(i:) .and. b<upper(:i))
binc(i)=binc(i)+count(c>=lower(i:) .and. c<upper(:i))
bind(i)=bind(i)+count(d>=lower(i:) .and. d<upper(:i))
bine(i)=bine(i)+count(e>=lower(i:) .and. e<upper(:i))

end do
end do
end do

close(7)

open(unit = 8, file="outputanalysis.dat")
Write(8,*) 'The bins in each column are as follows:'
Write(8,*) 'FIRST COLUMN (MOMENTUM IN X DIRECTION)'
write(8,*) binb

close(8)

end program

I have tried to remedy the lower - upper scalar issue by implementing an idea someone had on another post of mine, to make lower-> lower(I:) and upper -> upper(:I) , but it does not use the correct i-th values for the upper and lower arrays that I defined earlier with a do loop. Any suggestions or help is greatly appreciated. Thank you!

gah4

unread,

Apr 23, 2023, 4:24:54 PM4/23/23

to

On Sunday, April 23, 2023 at 12:01:20 PM UTC-7, nano3174 wrote:
> This program originally was a test program to see if I could count numbers
> in a specific range. Now I am making a program to do the same thing,
> but for 4 different columns separately (20 bins for each column b,c,d,e.)
> So, for column b I want 20 bins with the complete bin range going from (-3 to 3).
> For example, bin1 for column b will count how many numbers in that column
> are in the range (-3 to -2.7), bin2 would count for the range (-2.7, -2.4), etc..
> I want the bin width and each bin range to be the same for each column so
> that each column is counted for the same ranges.

> Originally, I had a horribly inefficient program (which worked) but in the end
> I had 80 if statements (20 for each column.) I wanted to see if I could use
> arrays and the count function to reduce it to just 4 lines.
> I have seen suggestions in the comments to have an array which looks like:

It should work with scalar instead of arrays for upper and lower.

Maybe it isn't a problem in your case, but some worry about cumulative
rounding, adding the binary representation of 0.3 multiple times.

Otherwise, I might have done:

bin = 20.0*(b(k)-(-3.0))/(3.0-(-3.0))+1
if(bin>=1 .and. bin <=20) binb(bin) = binb(bin) +1

The above statements are done once, instead of 20 times,
though each one might take longer than one of the comparisons,
it should be faster than 20. (You need that for each of
b, c, d, e, though it might also be done in a loop.)

At the beginning you mentioned efficiency, but I believe that your
program does just as much comparing and assigning as the
version with 80 if statements.

In fact, loop unrolling is an optimization that some compilers
(and people) do, replacing loops with separate statements.

Edmondo Giovannozzi

unread,

Apr 28, 2023, 7:30:14 AM4/28/23

to

First of all I would define a subroutine that will update the bin count, so in the loop you will just call this subroutine.

do j=1,4000
read(7,*) a, b, c, d, e

call update_bin_count(a, bina, lower, upper)
call update_bin_count(b, binb, lower, upper)
call update_bin_count(c, binc, lower, upper)
call update_bin_count(d, bind, lower, upper)
call update_bin_count(e, bine, lower, upper)
end do

As suggested before I will use:

bin = 20.0*(b(k)-(-3.0))/(3.0-(-3.0))+1
if(bin>=1 .and. bin <=20) binb(bin) = binb(bin) +1

inside the update_bin_count if the bins are regular, otherwise a simple loop will be fine:

subroutine update_bin_count(v, bin, lower, upper)
real v, bin(:), lower(:), upper(:)
integer :: i
do i = 1, size(bin)
if (v >= lower(i) .and. v < upper(i)) then
bin(i) = bin(i) + 1
exit
endif
enddo
end subroutine

The routine should be inside a module otherwise is not going to work as it is using assumed shape arrays.
By the way, every Fortran routine should be defined inside a module in a new software.
Now this is scaling as the number of elements by the number of bins as you can see.

If you have a huge number of elements and a huge number of irregular bins, you may need a different algorithm.

Ron Shepard

unread,

Apr 28, 2023, 11:07:27 AM4/28/23

to

On 4/28/23 6:30 AM, Edmondo Giovannozzi wrote:
> If you have a huge number of elements and a huge number of irregular bins, you may need a different algorithm.

One easy algorithm change for the irregular bin case would be to do a
binary search on the bin index rather than the simple loop over the
bins. For random input values, this would reduce the average effort in
the search from size(bin)/2 to log2(size(bin)).

The regular bin case of course does not have this factor, so if
appropriate that would be a more efficient choice. Another possibility
is to set the bins based on some function of the value (say log(v) for
positive v, or some monotonic user function). You pay the cost of the
function evaluation for each value, but then you have just the regular
grid division operation to determine the bin index.

$.02 -Ron Shepard