Le vendredi 24 juin 2016 à 23:09 -0700, Jessica Koh a écrit :
> I actually really agree with this! Does it mean we need to change the
> existing function's source code to deal with the problem as you
> suggest?
The code for cov() will have to be more complex that the sum()
illustration Andreas gave: to compute the covariance, you need to skip
all observations for which one of the two variables is missing. This
gets even more complex when computing covariances between columns of
matrices, since you need to decide whether to skip rows with at least
one missing value, or to use different row subsets depending on the
pairs of columns involved.
Alternatively, I wonder whether this problem could be solved using
special pseudo-weights types. This could allow sharing the code with
the weighted covariance function. A special weights type would simply
be passed, with weight 1 for non-missing observations, and 0 for
missing ones. These values could be a custom (internal) number type for
which 0 * NULL would return 0, in order to skip these observations.
Anyway, one will need to experiment with these approaches in practice
to see whether that would work.
Regards
> > It would be great if we could come up with a solution where the
> > NA/Nullable handling wouldn't have to be hard coded in a specific
> > statistical function, say cov. It's early and I haven't had coffee
> > yet so the idea is probably flawed but, in general, it might be
> > useful to use a dedicated `Accumulator` type when doing
> > accumulations, e.g. a sum would be something like
> >
> > function sum(x::AbstractVector)
> > acc = Acc{eltype(x) + eltype(x)}(0)
> > for xx in x
> > acc !+ xx
> > end
> > end
> >
> > then instead of specifying the NA handling for every statistical
> > function. It would be a matter of defining something like
> > `(!+)(x::Acc, y::Nullable) = x` to "remove" the effect of NAs in
> > the accumulation. Of course, you don't always want to remove NAs so
> > this would have to be adjustable. What kind of functionality exists
> > in NullableArrays for handling Nullable is different ways?
> >
> > The original reason I've started to consider the accumulator type
> > is to have a way of handling memory reuse, e.g. for BigFloats and
> > JuMP expressions but maybe it could also be useful for NA/Nullable
> > handling.
> >
> >
> > On Wed, Jun 8, 2016 at 4:42 AM, Milan Bouchet-Valat