Hi!
I've been looking at ways to get things like
[1, NA] to create
DataArrays. Currently this triggers an error because there exists a promote rule [1] (with a skeptic comment in the code) so that
NAtype is promoted to any type it is combined with. Since there is no
convert() method for that (and there cannot be), this fails.
I know that currently one can do
@data [1, NA] to get the same behavior, and this works very well. Still, I'm wondering whether we can do better, since
[1, NA] not working at all is really disturbing.
So far I have found one possible design that would allow such a behavior in a (I think) Julian fashion. A new type similar to
DataArray, but holding a single value or NA could be created, let's call it
DataValue:
type DataValue{T}
value::T
isna::Bool
end
Any combination of
NA and another type would be promoted to
DataValue of this type, with a very simple "conversion":
import Base.convert
convert{T}(::Type{DataValue{T}}, x::T) = DataValue{T}(x, false)
convert{T}(::Type{DataValue{T}}, x::NAtype) = DataValue{T}(0, true)
(With this definition
DataValue{T}(0, true) only works for
Int, I couldn't find a generic way of creating any value, but it surely exists...)
But of course this does not yet allow creating
DataArrays:
julia> [1, NA]
2-element Array{DataValue{Int64},1}:
DataValue{Int64}(1,false)
DataValue{Int64}(0,true)
Looking at how
[ ] and thus
vcat() work, it appears that
cat(1, ...) at abstractarray.jl:792 is called, and that the type promotion happens there. So I figured a solution would be to make
cat() to call another method just after it has determined the type to which the elements to concatenate will be promoted. This would allow to hook up
DataArrays in the process at this point, using a specific method for
DataValues, and to convert the
DataValues into a proper
DataArray. This allows handling NAs in a very integrated way, without hardcoding them in core Julia at all.
Creating
DataValues just to combine them one second later adds an overhead, but the
[ ] syntax is mostly useful for small arrays, since you need to have all the values written literally (concatenating
DataArrays is still handled differently of course). This syntax is useful in particular when presenting examples and teaching, where showing that
NAs are handled consistently in the language is very important to convince people.
Does that sound reasonable, or am I totally on crack?
Regards
1 :
https://github.com/JuliaStats/DataArrays.jl/blob/master/src/natype.jl#L36