Find the indice of the element with the nearest value of a float in an array

3,326 views
Skip to first unread message

Fred

unread,
Apr 10, 2016, 7:40:07 AM4/10/16
to julia-users
Hi,

I am looking for the most efficient (fastest) way to find the indice of the element with the nearest value of a float in an array.

x = [1:0.1:10]

julia
> x
91-element Array{Float64,1}:
 
1.0
 
1.1
 
1.2
 
1.3
 
1.4
 
 
 
9.4
 
9.5
 
9.6
 
9.7
 
9.8
 
9.9
 
10.0

It is very easy to find the indice of an exact value of x, for example 8.2

julia
> find(x .== 8.2)
1-element Array{Int64,1}:
 
73

But if I want the indice of the closest value of 8.22

julia
> minimum(abs(x-8.22))
0.02000000000000135

julia
> find(x .== minimum(abs(x-8.22)))
0-element Array{Int64,1}


Of course it is easy to do that with a loop but is it the fastest solution ?

min_i
= 0
min_x
= 1.0

 
for i=[1:length(x)]
       e
= abs(collect(x)[i] - 8.22)
       
if e < min_x
           min_x
= e
           min_i
= i
       
end
 
end

println(min_x, " -> ", min_i)
0.02000000000000135 -> 73


Thanks for your comments !

Tim Holy

unread,
Apr 10, 2016, 7:59:07 AM4/10/16
to julia...@googlegroups.com
indmin(abs(x-val)) is easy and pretty good, but it does create two
temporaries. Faster would be

function closest_index(x, val)
ibest = start(eachindex(x))
dxbest = abs(x[ibest]-val)
for I in eachindex(x)
dx = abs(x[I]-val)
if dx < dxbest
dxbest = dx
ibest = I
end
end
ibest
end

This should not allocate any memory and is likely the fastest. (It might be
slightly faster with @inbounds, of course...)

It would be possible to create an indmin(f, x) that applies f to each element
of x and returns the index of the minimum; this would be efficient in the
development version of julia but not julia-0.4.

Best,
--Tim

Erik Schnetter

unread,
Apr 10, 2016, 8:01:49 AM4/10/16
to julia...@googlegroups.com
Fred

Yes, the fastest way is a loop. (It might be possible to extend e.g.
`minimum` by a "predicate" argument, but that would require a change
to the base library.)

You would write the loop slightly differently, though:
```Julia
mini = 0
minx = 0.0
mindx = typemax(Float64)
for (i,x) in enumerate(array)
dx = abs(x - x0)
if dx < mindx
mini, minx, mindx = i, x, dx
end
end
mini, minx, mindx
```

This will return `imin=0` for empty inputs.

-erik
--
Erik Schnetter <schn...@gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/

Ravi S

unread,
Apr 10, 2016, 8:19:48 AM4/10/16
to julia-users
Not sure which is faster, but here's another way to do it:

julia> reduce((x,y)->x[2]<y[2]?x:y,(0,1000),map((x,y)->(x,abs(y-8.22)),1:length(x),x))
(73,0.02000000000000135)

- Ravi

Mauro

unread,
Apr 10, 2016, 8:20:34 AM4/10/16
to julia...@googlegroups.com
If your array is sorted, as your example suggests, there maybe faster
methods, binary search comes to mind (implemented in searchsorted).
Also, if the array is unsorted but you need to look up many values, it
might be worth sorting it first. Mauro

Fred

unread,
Apr 10, 2016, 8:30:48 AM4/10/16
to julia-users
Thank you very much !

I give you the results :

my loop solution :
0.000419 seconds (547 allocations: 708.797 KB)
0.02000000000000135 -> 73

 @time closest_index(x,8.22)
  0.000003 seconds (4 allocations: 160 bytes)
73

@time for (i,x) in enumerate(array)...
0.000181 seconds (821 allocations: 19.953 KB)
738.20.02000000000000135

@time reduce((x,y)->x[2]<y[2]?x:y,(0,1000),map((x,y)->(x,abs(y-8.22)),1:length(x),x))
  0.005890 seconds (892 allocations: 24.617 KB)

Of course in my particular situation the array is sorted, so in that case I although think about using dichotomy

Fred

unread,
Apr 10, 2016, 8:42:41 AM4/10/16
to julia-users
Thank you very much Mauro !  searchsorted is the simplest solution and one of the fastest but it gives two indices so another comparison is needed to find the closest value :

@time searchsorted(x, 8.22)
  0.000004 seconds (7 allocations: 240 bytes)
74:73

abs(x[73] - 8.22) > abs(x[74] - 8.22) ? i = 74 : i = 73
73

Steven G. Johnson

unread,
Apr 10, 2016, 8:56:33 AM4/10/16
to julia-users


On Sunday, April 10, 2016 at 8:30:48 AM UTC-4, Fred wrote:
my loop solution :
0.000419 seconds (547 allocations: 708.797 KB)
0.02000000000000135 -> 73

 Probably you are doing this wrong; it shouldn't be allocating so much memory.  Is your loop using global variables?  Did you remember to time it twice (since the first time you call it there is compilation overhead.)  Did you try the function Tim Holy posted?

Fred

unread,
Apr 10, 2016, 9:10:19 AM4/10/16
to julia-users
I tested my loop monre than 2 times as it is written in my post and I have always the same results. The function Tim Holy posted is much faster, I posted the results above :)

Mauro

unread,
Apr 10, 2016, 9:14:39 AM4/10/16
to julia...@googlegroups.com
> I tested my loop monre than 2 times as it is written in my post and I have
> always the same results. The function Tim Holy posted is much faster, I
> posted the results above :)

I seem to recall that your example loop was not in a function(?) If so,
that makes it lots slower.

Fred

unread,
Apr 10, 2016, 9:24:34 AM4/10/16
to julia-users
That's true ! But why a loop is faster in a function ? :)

Mauro

unread,
Apr 10, 2016, 9:41:52 AM4/10/16
to julia...@googlegroups.com
On Sun, 2016-04-10 at 15:24, Fred <fred.so...@gmail.com> wrote:
> That's true ! But why a loop is faster in a function ? :)

Check out:
http://docs.julialang.org/en/release-0.4/manual/performance-tips/#avoid-global-variables
Message has been deleted

Fred

unread,
Apr 10, 2016, 10:17:46 AM4/10/16
to julia-users
Hi,

I post here my best solution taking advantage that the array is sorted. I expected that solution to be a lot much faster than other solutions, but not really.
Indeed I am very impressed by the speed of searchsorted
 :

x
= [1:0.1:1000000]
val
= 8.22

function dicotomy(x, val)
  a
= start(eachindex(x))
  b
= length(x)
  j
= length(x)
  dxbest
= abs(x[a]-val)
  dx
= dxbest

 
while true
    dx
< dxbest ? dxbest = dx : 1
    j
= round(Int,(a+b)/2)
    dx
= x[j]-val
    x
[j]-val < 0 ? a = j : b = j
    abs
(a-b) < 2 && break
 
end
 
return a,b
end

@time dicotomy(x, 8.22)
 
0.000004 seconds (5 allocations: 192 bytes)
(73,74)


@time searchsorted(x, 8.22)
 
0.000005 seconds (7 allocations: 240 bytes)

@time closest_index(x,8.22)
 
0.027618 seconds (4 allocations: 160 bytes)
73




Tim Holy

unread,
Apr 10, 2016, 10:18:03 AM4/10/16
to julia...@googlegroups.com
Even better: get rid of the brackets around 1:0.1:100000, and you'll be that
much more impressed.

--Tim

On Sunday, April 10, 2016 07:16:16 AM Fred wrote:
> Hi,
>
> I post here my best solution taking advantage that the array is sorted. I
> expected to be a lot much faster than other solutions, but not really.

Fred

unread,
Apr 10, 2016, 10:26:06 AM4/10/16
to julia-users
Maybe my array is too small to see a difference, but if I increase the size I will lack of RAM ;)

julia> x = 1:0.1:1000000
1.0:0.1:1.0e6

julia
> @time searchsorted(x, 8.22)
 
0.045590 seconds (33.21 k allocations: 1.535 MB)
74:73

julia
> @time searchsorted(x, 8.22)
 
0.000005 seconds (8 allocations: 288 bytes)
74:73

julia
> @time searchsorted(x, 8.22)
 
0.000005 seconds (8 allocations: 288 bytes)
74:73

julia
> @time closest_index(x,8.22)
 
0.103219 seconds (4.37 k allocations: 222.884 KB)
73

julia
> @time closest_index(x,8.22)
 
0.095684 seconds (4 allocations: 160 bytes)
73

julia
> @time dicotomy(x, 8.22)
 
0.009142 seconds (3.45 k allocations: 173.973 KB)
(73,74)

julia
> @time dicotomy(x, 8.22)
 
0.000005 seconds (5 allocations: 192 bytes)
(73,74)

julia
> @time dicotomy(x, 8.22)

 
0.000004 seconds (5 allocations: 192 bytes)
(73,74)

Tim Holy

unread,
Apr 10, 2016, 11:50:02 AM4/10/16
to julia...@googlegroups.com
Just FYI:

julia> x = 1:0.1:1000000
1.0:0.1:1.0e6

julia> y = collect(x); # this is the same as y = [x;]

julia> sizeof(x)
32

julia> sizeof(y)
79999928

--Tim

Fred

unread,
Apr 10, 2016, 12:03:06 PM4/10/16
to julia-users
A huge size difference ! I have to read my array from data file so I suppose it is like Y and X is only for simulations ?

Tim Holy

unread,
Apr 10, 2016, 12:55:28 PM4/10/16
to julia...@googlegroups.com
On Sunday, April 10, 2016 09:03:06 AM Fred wrote:
> A huge size difference ! I have to read my array from data file so I
> suppose it is like Y and X is only for simulations ?

There turn out to be many situations in which you can take shortcuts if you
know the values are sorted in increasing order with no skips in them. For
example, a[1:5] has better performance than a[[1:5;]], even though they yield
the same answer. One of julia's strengths is that you can leverage this
knowledge very effectively. So don't convert to an array unless you have some
reason that you want/need to.

But yes, any data you read from a data file will likely be an array.

Best,
--Tim

DNF

unread,
Apr 10, 2016, 5:30:10 PM4/10/16
to julia-users
The big error you made originally is calling collect in every iteration of the loop. Just deleting collect speeds things up by 100x. The lesson is that you should (almost) never use collect.

The other lesson is: don't do [1:0.1:10]. It makes your code slower, and very soon your it will stop working correctly. Just delete [].

function mindist0(x, val)
   
# your original

    min_i
= 0
    min_x
= 1.0

   
for i=1:length(x)
        e
= abs(collect(x)[i] - val)

       
if e < min_x
            min_x
= e
            min_i
= i
       
end
   
end

   
return (min_i, min_x)
end


function mindist1(x, val)
   
#  same as the original, except removing 'collect'

    min_i
= 0
    min_x
= 1.0

   
for i = 1:length(x)
        e
= abs(x[i] - val)

       
if e < min_x
            min_x
= e
            min_i
= i
       
end
   
end

   
return (min_i, min_x)
end


function mindist2(x, val)
   
# using enumerate to avoid indexing
    min_i
= 0
    min_x
= Inf
   
for (i, xi) in enumerate(x)
        dist
= abs(xi - val)
       
if dist < min_x
            min_x
= dist
            min_i
= i
       
end
   
end
   
return (min_i, min_x)
end


function test_dist(x, val, N)
   
# putting time testing inside a function for optimal result
   
@assert mindist0(x, val) == mindist1(x, val) == mindist2(x, val)
   
@time for _ in 1:N mindist0(x, val) end
   
@time for _ in 1:N mindist1(x, val) end
   
@time for _ in 1:N mindist2(x, val) end
end


>> test_dist
(1:0.1:10, 8.22, 1)
0.000060 seconds (182 allocations: 76.781 KB)
0.000001 seconds
0.000001 seconds

>> test_dist(1:0.1:10, 8.22, 1)
0.641741 seconds (1.82 M allocations: 749.817 MB, 7.57% gc time)
0.007380 seconds
0.005570 seconds

DNF

unread,
Apr 10, 2016, 5:34:25 PM4/10/16
to julia-users

I messed up copy-paste here:

>> test_dist(1:0.1:10, 8.22, 1)
0.641741 seconds (1.82 M allocations: 749.817 MB, 7.57% gc time)
0.007380 seconds
0.005570 seconds


It should be (looping 10000 times)
>> test_dist(1:0.1:10, 8.22, 10^4)

Cedric St-Jean

unread,
Apr 10, 2016, 10:02:38 PM4/10/16
to julia-users
Am I alone in pining for a predicate version of findmin/findmax et al., or a `by` keyword argument? It would make this code much simpler. 

Fred

unread,
Apr 11, 2016, 6:15:31 AM4/11/16
to julia-users
Thank you for this explanation DNF ! It is the first time I use collect and the reason why I used it was an error message suggesting that I should use it :) maybe because of my first mistake to use []. 

DNF

unread,
Apr 11, 2016, 6:30:23 AM4/11/16
to julia-users
That's right.  '[]' for collecting should be replaced with 'collect'. But in your case you shouldn't have used '[]' in the first place :)

It seems that it is very common for people to collect ranges into arrays, but it is rare that you actually need to. The best approach is to not collect, and then see if it works the way you intend it to. 
Reply all
Reply to author
Forward
0 new messages