Find the indice of the element with the nearest value of a float in an array

Fred

unread,

Apr 10, 2016, 7:40:07 AM4/10/16

to julia-users

Hi,

I am looking for the most efficient (fastest) way to find the indice of the element with the nearest value of a float in an array.

x = [1:0.1:10]

julia> x
91-element Array{Float64,1}:
  1.0
  1.1
  1.2
  1.3
  1.4
  ⋮  
  9.4
  9.5
  9.6
  9.7
  9.8
  9.9
 10.0

It is very easy to find the indice of an exact value of x, for example 8.2

julia> find(x .== 8.2)
1-element Array{Int64,1}:
 73

But if I want the indice of the closest value of 8.22

julia> minimum(abs(x-8.22))
0.02000000000000135

julia> find(x .== minimum(abs(x-8.22)))
0-element Array{Int64,1}


Of course it is easy to do that with a loop but is it the fastest solution ?

min_i = 0
min_x = 1.0

 for i=[1:length(x)]
       e = abs(collect(x)[i] - 8.22)
       if e < min_x
           min_x = e
           min_i = i
       end
 end

println(min_x, " -> ", min_i)
0.02000000000000135 -> 73

Thanks for your comments !

Tim Holy

unread,

Apr 10, 2016, 7:59:07 AM4/10/16

to julia...@googlegroups.com

indmin(abs(x-val)) is easy and pretty good, but it does create two
temporaries. Faster would be

function closest_index(x, val)
ibest = start(eachindex(x))
dxbest = abs(x[ibest]-val)
for I in eachindex(x)
dx = abs(x[I]-val)
if dx < dxbest
dxbest = dx
ibest = I
end
end
ibest
end

This should not allocate any memory and is likely the fastest. (It might be
slightly faster with @inbounds, of course...)

It would be possible to create an indmin(f, x) that applies f to each element
of x and returns the index of the minimum; this would be efficient in the
development version of julia but not julia-0.4.

Best,
--Tim

Erik Schnetter

unread,

Apr 10, 2016, 8:01:49 AM4/10/16

to julia...@googlegroups.com

Fred

Yes, the fastest way is a loop. (It might be possible to extend e.g.
`minimum` by a "predicate" argument, but that would require a change
to the base library.)

You would write the loop slightly differently, though:
```Julia
mini = 0
minx = 0.0
mindx = typemax(Float64)
for (i,x) in enumerate(array)
dx = abs(x - x0)
if dx < mindx
mini, minx, mindx = i, x, dx
end
end
mini, minx, mindx
```

This will return `imin=0` for empty inputs.

-erik

--
Erik Schnetter <schn...@gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/

Ravi S

unread,

Apr 10, 2016, 8:19:48 AM4/10/16

to julia-users

Not sure which is faster, but here's another way to do it:

julia> reduce((x,y)->x[2]<y[2]?x:y,(0,1000),map((x,y)->(x,abs(y-8.22)),1:length(x),x))
(73,0.02000000000000135)

- Ravi

Mauro

unread,

Apr 10, 2016, 8:20:34 AM4/10/16

to julia...@googlegroups.com

If your array is sorted, as your example suggests, there maybe faster
methods, binary search comes to mind (implemented in searchsorted).
Also, if the array is unsorted but you need to look up many values, it
might be worth sorting it first. Mauro

Fred

unread,

Apr 10, 2016, 8:30:48 AM4/10/16

to julia-users

Thank you very much !

I give you the results :

my loop solution :
0.000419 seconds (547 allocations: 708.797 KB)
0.02000000000000135 -> 73

@time closest_index(x,8.22)
0.000003 seconds (4 allocations: 160 bytes)
73

@time for (i,x) in enumerate(array)...
0.000181 seconds (821 allocations: 19.953 KB)
738.20.02000000000000135

@time reduce((x,y)->x[2]<y[2]?x:y,(0,1000),map((x,y)->(x,abs(y-8.22)),1:length(x),x))
0.005890 seconds (892 allocations: 24.617 KB)

Of course in my particular situation the array is sorted, so in that case I although think about using dichotomy

Fred

unread,

Apr 10, 2016, 8:42:41 AM4/10/16

to julia-users

Thank you very much Mauro ! searchsorted is the simplest solution and one of the fastest but it gives two indices so another comparison is needed to find the closest value :

@time searchsorted(x, 8.22)
0.000004 seconds (7 allocations: 240 bytes)
74:73

abs(x[73] - 8.22) > abs(x[74] - 8.22) ? i = 74 : i = 73
73

Steven G. Johnson

unread,

Apr 10, 2016, 8:56:33 AM4/10/16

to julia-users

On Sunday, April 10, 2016 at 8:30:48 AM UTC-4, Fred wrote:

my loop solution :
0.000419 seconds (547 allocations: 708.797 KB)
0.02000000000000135 -> 73

Probably you are doing this wrong; it shouldn't be allocating so much memory. Is your loop using global variables? Did you remember to time it twice (since the first time you call it there is compilation overhead.) Did you try the function Tim Holy posted?

Fred

unread,

Apr 10, 2016, 9:10:19 AM4/10/16

to julia-users

I tested my loop monre than 2 times as it is written in my post and I have always the same results. The function Tim Holy posted is much faster, I posted the results above :)

Mauro

unread,

Apr 10, 2016, 9:14:39 AM4/10/16

to julia...@googlegroups.com

> I tested my loop monre than 2 times as it is written in my post and I have
> always the same results. The function Tim Holy posted is much faster, I
> posted the results above :)

I seem to recall that your example loop was not in a function(?) If so,
that makes it lots slower.

Fred

unread,

Apr 10, 2016, 9:24:34 AM4/10/16

to julia-users

That's true ! But why a loop is faster in a function ? :)

Mauro

unread,

Apr 10, 2016, 9:41:52 AM4/10/16

to julia...@googlegroups.com

On Sun, 2016-04-10 at 15:24, Fred <fred.so...@gmail.com> wrote:
> That's true ! But why a loop is faster in a function ? :)

Check out:
http://docs.julialang.org/en/release-0.4/manual/performance-tips/#avoid-global-variables

Message has been deleted

Fred

unread,

Apr 10, 2016, 10:17:46 AM4/10/16

to julia-users

Hi,

I post here my best solution taking advantage that the array is sorted. I expected that solution to be a lot much faster than other solutions, but not really.
Indeed I am very impressed by the speed of searchsorted
:


x = [1:0.1:1000000]
val = 8.22

function dicotomy(x, val)
  a = start(eachindex(x))
  b = length(x)
  j = length(x)
  dxbest = abs(x[a]-val)
  dx = dxbest

  while true
    dx < dxbest ? dxbest = dx : 1
    j = round(Int,(a+b)/2)
    dx = x[j]-val
    x[j]-val < 0 ? a = j : b = j
    abs(a-b) < 2 && break
  end
  return a,b
end

@time dicotomy(x, 8.22)
 0.000004 seconds (5 allocations: 192 bytes)
(73,74)


@time searchsorted(x, 8.22)
  0.000005 seconds (7 allocations: 240 bytes)

@time closest_index(x,8.22)
  0.027618 seconds (4 allocations: 160 bytes)
73

Tim Holy

unread,

Apr 10, 2016, 10:18:03 AM4/10/16

to julia...@googlegroups.com

Even better: get rid of the brackets around 1:0.1:100000, and you'll be that
much more impressed.

--Tim

On Sunday, April 10, 2016 07:16:16 AM Fred wrote:
> Hi,
>
> I post here my best solution taking advantage that the array is sorted. I

> expected to be a lot much faster than other solutions, but not really.

Fred

unread,

Apr 10, 2016, 10:26:06 AM4/10/16

to julia-users

Maybe my array is too small to see a difference, but if I increase the size I will lack of RAM ;)

julia> x = 1:0.1:1000000
1.0:0.1:1.0e6

julia> @time searchsorted(x, 8.22)
  0.045590 seconds (33.21 k allocations: 1.535 MB)
74:73

julia> @time searchsorted(x, 8.22)
  0.000005 seconds (8 allocations: 288 bytes)
74:73

julia> @time searchsorted(x, 8.22)
  0.000005 seconds (8 allocations: 288 bytes)
74:73

julia> @time closest_index(x,8.22)
  0.103219 seconds (4.37 k allocations: 222.884 KB)
73

julia> @time closest_index(x,8.22)
  0.095684 seconds (4 allocations: 160 bytes)
73

julia> @time dicotomy(x, 8.22)
  0.009142 seconds (3.45 k allocations: 173.973 KB)
(73,74)

julia> @time dicotomy(x, 8.22)
  0.000005 seconds (5 allocations: 192 bytes)
(73,74)

julia> @time dicotomy(x, 8.22)


  0.000004 seconds (5 allocations: 192 bytes)
(73,74)

Tim Holy

unread,

Apr 10, 2016, 11:50:02 AM4/10/16

to julia...@googlegroups.com

Just FYI:

julia> x = 1:0.1:1000000
1.0:0.1:1.0e6

julia> y = collect(x); # this is the same as y = [x;]

julia> sizeof(x)
32

julia> sizeof(y)
79999928

--Tim

Fred

unread,

Apr 10, 2016, 12:03:06 PM4/10/16

to julia-users

A huge size difference ! I have to read my array from data file so I suppose it is like Y and X is only for simulations ?

Tim Holy

unread,

Apr 10, 2016, 12:55:28 PM4/10/16

to julia...@googlegroups.com

On Sunday, April 10, 2016 09:03:06 AM Fred wrote:
> A huge size difference ! I have to read my array from data file so I
> suppose it is like Y and X is only for simulations ?

There turn out to be many situations in which you can take shortcuts if you
know the values are sorted in increasing order with no skips in them. For
example, a[1:5] has better performance than a[[1:5;]], even though they yield
the same answer. One of julia's strengths is that you can leverage this
knowledge very effectively. So don't convert to an array unless you have some
reason that you want/need to.

But yes, any data you read from a data file will likely be an array.

Best,
--Tim

DNF

unread,

Apr 10, 2016, 5:30:10 PM4/10/16

to julia-users

The big error you made originally is calling collect in every iteration of the loop. Just deleting collect speeds things up by 100x. The lesson is that you should (almost) never use collect.

The other lesson is: don't do [1:0.1:10]. It makes your code slower, and very soon your it will stop working correctly. Just delete [].

function mindist0(x, val)
    # your original


    min_i = 0
    min_x = 1.0


    for i=1:length(x)
        e = abs(collect(x)[i] - val)


        if e < min_x
            min_x = e
            min_i = i
        end
    end


    return (min_i, min_x)
end


function mindist1(x, val)
    #  same as the original, except removing 'collect'


    min_i = 0
    min_x = 1.0


    for i = 1:length(x)
        e = abs(x[i] - val)


        if e < min_x
            min_x = e
            min_i = i
        end
    end


    return (min_i, min_x)
end


function mindist2(x, val)
    # using enumerate to avoid indexing
    min_i = 0
    min_x = Inf
    for (i, xi) in enumerate(x)
        dist = abs(xi - val)
        if dist < min_x
            min_x = dist
            min_i = i
        end
    end
    return (min_i, min_x)
end


function test_dist(x, val, N)
    # putting time testing inside a function for optimal result
    @assert mindist0(x, val) == mindist1(x, val) == mindist2(x, val)
    @time for _ in 1:N mindist0(x, val) end
    @time for _ in 1:N mindist1(x, val) end
    @time for _ in 1:N mindist2(x, val) end
end


>> test_dist(1:0.1:10, 8.22, 1)
0.000060 seconds (182 allocations: 76.781 KB)
0.000001 seconds
0.000001 seconds

>> test_dist(1:0.1:10, 8.22, 1)
0.641741 seconds (1.82 M allocations: 749.817 MB, 7.57% gc time)
0.007380 seconds
0.005570 seconds

DNF

unread,

Apr 10, 2016, 5:34:25 PM4/10/16

to julia-users

I messed up copy-paste here:

>> test_dist(1:0.1:10, 8.22, 1) 0.641741 seconds (1.82 M allocations: 749.817 MB, 7.57% gc time) 0.007380 seconds 0.005570 seconds

It should be (looping 10000 times)

>> test_dist(1:0.1:10, 8.22, 10^4)

Cedric St-Jean

unread,

Apr 10, 2016, 10:02:38 PM4/10/16

to julia-users

Am I alone in pining for a predicate version of findmin/findmax et al., or a `by` keyword argument? It would make this code much simpler.

Fred

unread,

Apr 11, 2016, 6:15:31 AM4/11/16

to julia-users

Thank you for this explanation DNF ! It is the first time I use collect and the reason why I used it was an error message suggesting that I should use it :) maybe because of my first mistake to use [].

DNF

unread,

Apr 11, 2016, 6:30:23 AM4/11/16

to julia-users

That's right. '[]' for collecting should be replaced with 'collect'. But in your case you shouldn't have used '[]' in the first place :)

It seems that it is very common for people to collect ranges into arrays, but it is rare that you actually need to. The best approach is to not collect, and then see if it works the way you intend it to.

Reply all

Reply to author

Forward