The isequal function is an equivalence relation on all objects. The == function is not an equivalence relation: it doesn't always return a Bool and it's not reflexive since NaN != NaN. People could introduce other non-equivalence behaviors, although obviously we want it to remain at least symmetric and mostly reflexive and transitive. The NA object is a good candidate for breaking the equivalenceness of ==: we probably want NA == NA to be NA.
The isless function is a strict partial order, whereas < isn't, for the same reasons that == isn't an equivalence relation. Moreover, the relation (a,b)->!isless(a,b)&!isless(b,a) is a partial equivalence relation that is compatible with isequal(a,b) in the sense that whenever the former is defined, it agrees with the latter, and whenever it isn't, the latter is false.
It seems to me that it is the multivalued case that causes eq(x,y)
not to be an equivalence relation, or not to be boolean. Examples:
- A .== B is non-boolean since it compares multiple values
- NA can be thought of as the set of all possible values, so the
answer to NA==1 is both yes and no, i e NA.
- NaNs usually arise when the numeric result would have many (or no?)
possible values:
* Inf - Inf has the entire real line as possible value, if you take Inf
to mean any variable that goes to infinity. (so Inf is also multivalued)
* Likewise with 0/0, taken as the solution to 0*x = 0
* I think 1/0 should have been NaN, since 1x = 0 has no solutions.
(But it gives Inf)
* I think 1/0 should have been NaN, since 1x = 0 has no solutions.
(But it gives Inf)You probably meant 0x = 1 here...
Also, changing IEEE 754 semantics is simply not up for debate, regardless of how much or little sense alternate proposals make. Just saying.
> It seems to me that it is the multivalued case that causes eq(x,y)
> not to be an equivalence relation, or not to be boolean. Examples:
> - A .== B is non-boolean since it compares multiple values
> - NA can be thought of as the set of all possible values, so the
> answer to NA==1 is both yes and no, i e NA.
Three-valued logic (as used in SQL) is even less intuitive than NaN.
I would stay as far away from it as possible.
> - NaNs usually arise when the numeric result would have many (or no?)
> possible values:
> * Inf - Inf has the entire real line as possible value, if you take
> Inf
> to mean any variable that goes to infinity. (so Inf is also
> multivalued)
> * Likewise with 0/0, taken as the solution to 0*x = 0
> * I think 1/0 should have been NaN, since 1x = 0 has no solutions.
> (But it gives Inf)
That's because lim 1/x as goes to 0 approaches infinity. In
floating-point terms, 0.0 means the range from exact 0 (inclusive) to
the smallest representable positive number (exclusive), and Inf means
the range from the largest representable positive number (exclusive)
on up. So when you divide 1 (or any "normal" value) by a very small
quantity, you get a very large quantity. Note that the reciprocal of
denormal numbers is Inf, because there is no overflow analogue of
denormal numbers.
However, sometimes NaN actually does mean "no such real number". For
example, there is no real number whose cosine is 2, and consequently
acos(2) => NaN. In the complex field, acos(2+0im) => 0.0 +
1.31695789692482im.
> * Keys for sorting/hashing etc should be compared using invariant, boolean
> predicates; e g mutable keys compare by identity.
> The cleanest way would be to use EGAL, but then 1 and int8(1) would hash
> differently. Is this desirable?
Yes, it is. Consider the case of memoizing functions, ones which
store already-computed results in a hash table so that when you call
the same function again, the result is just looked up. A
general-purpose memoization facility wants to use EGAL as the identity
function for keys in the table. A function that is sensitive to the
type (not just the value) of its argument should still be memoizable.
> * With an invariant comparison for sorting/hasing, is there a motivation for
> keeping the current (contingent) isequal?
The trouble is that what counts as contingent equality is different
for each different data structure. So if we keep it, we should be
clear that people are expected to add new methods for
equality-testing.
For purposes of forward error analysis at least, floating point numbers are not ranges. If f(0.0) can give the right answer for zero, it must not give the answer for the smallest denormal, or eps/2, or anything else.
Sure infinitely many numbers round to a given float, but once the rounding is done the float represents only that number. You are not free to make up extra digits unless you only care about reverse error.
Yes, but this is something that we're going to deal with anyway. A lot
of data analysis needs to be able to deal with missing values.
The current idea (as per Harlan's DataFrame) is to have standalone NA
objects that overwrite the various comparison operators. I'd
personally be happier if NAs were more tightly integrated, because it
means that their behaviour would have been properly vetted over a
wider range of situations. I understand that it comes with complexity
and performance costs even in cases where no NAs are present.
Keeping NAs as a add-on bolted with the DataFrame objects should work
for most cases and it could propagate as necessary, but not having any
type of NAs would make life a lot harder for a number of people.
All of that to say that this is a problem that this is an issue that
will need to be tackled, even if it's not a core element of the
language.
Julia's rationals, which allow only limited precision, are a reasonable compromise.
NA
is a logical constant of length 1 which contains a missing value indicator. NA
can be coerced to any other vector type except raw.
There are also constants NA_integer_
, NA_real_
, NA_complex_
and NA_character_
of the other atomic vector types which support missing values: all .. are reserved words in the R language.
The generic function is.na
indicates which elements are missing.
The generic function is.na<-
sets elements to NA
.
from R help doc:
<key point is that R has 5 different (and differently represented) NA values, one for each core type>
<they look alike, which has caused some confusion, but it allows NA everywhere typeright>