Uniform float equality handling in iris

63 views
Skip to first unread message

Carwyn Pelley

unread,
Oct 19, 2016, 8:01:01 AM10/19/16
to scitools...@googlegroups.com
Currently, Iris does not handle floats very well (we perform equality checks on floats) and on the occasion where it causes problems as Identified by usecases, a tolerance is implemented for that specific problematic equality.
I propose that floats within iris be handled correctly, uniformly (i.e. we put in place the means to eventually remove this problem and certainly not extend the problem further).

Here is a UI proxy to what I have done within our ANTS project that I would like to put forward for iris:

iris.utils.ndarray.greater(x1, x2)
iris
.utils.ndarray.less(x1, x2)
iris
.utils.ndarray.isclose(x1, x2)
iris
.utils.ndarray.allclose(x1, x2)


...

Which could all utilise a global tolerance as defined somewhere like `iris.config.TOLERANCE`

Here is an illustration of how it can be easily implemented:

def _numpy_arithmetic_handling(x1, x2):
    x1
= np.asarray(x1)
    x2
= np.asarray(x2)
    ttype
= np.promote_types(x1.dtype, x2.dtype)
    use_tolerance
= False
   
if isinstance(ttype, np.float):
        use_tolerance
= True
   
return x1, x2, use_tolerance


def greater(x1, x2):
   
"""
    Return the truth value of (x1 > x2) element-wise with tolerance defined
    by ants.config.TOLERANCE.

    See Also
    --------
    :func:`numpy.greater`

    """

    x1
, x2, use_tolerance = _numpy_arithmetic_handling(x1, x2)
    x3
= x2
   
if use_tolerance:
        x3
= x2-ants.config.TOLERANCE
   
return np.greater(x1, x3)

...


Here is a non-exhaustive list of cases where equality with floats has caused problems (I'm sure a more thorough search would reveal many more):
https://github.com/SciTools/iris/pull/2201
https://github.com/SciTools/iris/pull/2062
https://github.com/SciTools/iris/pull/1663

What do people think?

Stephan Hoyer

unread,
Oct 19, 2016, 11:52:30 AM10/19/16
to Carwyn Pelley, Iris-dev
Usually a better approach than setting global config options is to use function arguments or a context manager.

The global tolerance might be fixed at zero or something near machine precision. To use a lower tolerance, you could write something like:

with iris.set_tolerance(1e-6):
    # assuming x1 and x2 are not base numpy.ndarray objects
    y = x1 < x2

or maybe:
y = iris.utils.ndarray.less(x1, x2, tolerance=1e-6)

Cheers,
Stephan

On Wed, Oct 19, 2016 at 5:01 AM, Carwyn Pelley <cpell...@gmail.com> wrote:
Currently, Iris does not handle floats very well and on the occasion where it causes problems as Identified by usecases, a tolerance is implemented for that specific problematic equality.
I propose that floats within iris be handled correctly, uniformly, and with control to the enduser (i.e. we put in place the means to eventually remove this problem and certainly not extend the problem further).


Here is a UI proxy to what I have done within our ANTS project that I would like to put forward for iris:

`iris.utils.ndarray.greater(x1, x2)`
`iris.utils.ndarray.less(x1, x2)`
`iris.utils.ndarray.isclose(x1, x2)`
`iris.utils.ndarray.allclose(x1, x2)`
...

Which could all utilise a global tolerance as defined somewhere like `iris.config.TOLERANCE`.

--
You received this message because you are subscribed to the Google Groups "Iris-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scitools-iris-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carwyn Pelley

unread,
Oct 20, 2016, 4:26:19 AM10/20/16
to Iris-dev, cpell...@gmail.com
Hi Stephan,


> or maybe:
> y = iris.utils.ndarray.less(x1, x2, tolerance=1e-6)

Thanks for taking interest.  You may have understood the intentions of my proposal.

My main motivations at this point is laying the groundwork for float handling within iris itself (under the hood), not in providing utilities for enduser usage.

Perhaps an illustration of the problem might clear things up.

Take a regridding operation like `source.regrid(target, AreaWeighted())`
This is just one example code path where iris handling of floats is just plain broken from start to finish:

1. This regridding operation consists of coordinate system comparisons which have floats.
 - This can unexpectedly fail because floats are 32bit comparing with 64bit.
 - This can unexpectedly fail because often its the derived attributes which are utilised in comparison (GeogCS can have two of the following three elements supplied: semi_major_axis, semi_minor_axis, and
        inverse_flattening.  Deriving the missing element means a loss of precision and a failure of equality).
2. Coordinates comparisons are required and most often consist of floats.
3. Longitude coordinates are often on different ranges [-180, 180] or [0, 360].  Iris wrapping these ranges means that you loose precision, exacerbating float comparison problems.
4. Containment when it comes to floats is fraught with danger (within_bounds used for the area weighted regridder for example).
...

The list goes on, and iris handling of floats at any of these stages has caused numerous problems for us over time.

What I'm proposing is that coord comparison, crs comparison..., any algorithm within iris which deals with floats switch to utilising something like the proposed `iris.utils.greater` utility functions etc.
This allows a consistent and reasonable handling of floats.

marqh

unread,
Oct 21, 2016, 5:34:46 AM10/21/16
to Iris-dev, cpell...@gmail.com
I'd like to take a little time to understand more detail, which I have not yet done

In principal, I'm in favour of not comparing floats:
  'every time two floats are compared for equality, a fairy dies' :(

I am keen to limit impact to API users, which will influence structural changes, but hopefully at a detail level


Carwyn Pelley

unread,
Nov 25, 2016, 5:49:55 AM11/25/16
to scitools...@googlegroups.com, cpell...@gmail.com
Stephan:
> ...which is why I would suggest exposing it as a function parameter (rather than a global constant).

I'm sorry I don't see what benefit in writing a new equality method on every object in iris and helper functions to every numpy operation.  A global doesn't need to mean strictly 'constant'.  This is why I'm trying to separate the issue of end user utility to tackling the fundamental issue at hand.  For example let's take your context manager approach (which I like by the way) and apply it to our 'global constant':

import config
from contextlib import contextmanager

@contextmanager
def set_global_tolerance(value):
    orig
= config.TOLERANCE
    config
.TOLERANCE = value
   
yield
    config
.TOLERANCE = orig


print config.TOLERANCE


with set_global_tolerance(1e-6):
   
print config.TOLERANCE


print config.TOLERANCE



Here would then be the output:

1e-10
1e-6
1e-10



> In principal, I'm in favour of not comparing floats

I'm afraid this is simply not a luxury we have with Python.  Being a duck-typed language, our operations necessarily need to behave reasonably when faced with integers or floats.  I would certainly not want to go down the route of unnecessary conditionals based on type.  For example, a coordinates points maybe integer or float.

> ...I am keen to limit impact to API users...

If your responding to my proposal, again I don't understand your point.  My proposal would NOT change anything for the end-users directly.
This proposal is about providing the utilities for iris itself to deal with floating point numbers correctly.  That is, any '==', '!=', '<=', '>=', '<', '>', 'np.equal', 'np.greater', ... etc. calls on with relevant objects are replaced with calls to these helper functions.

I want to stress that this is not simply an enhancement but a package-wide solution to numerous 'bugs' both discovered and undiscovered due to the handling of floats by iris.
I should also stress that iris handling of floats is shared amongst other users in science.

I still feel the problem I'm addressing and how I'm proposing to solve it is not understood.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages