Ignoring NaNs

424 views
Skip to first unread message

Roger Herikstad

unread,
Feb 5, 2014, 11:11:45 PM2/5/14
to julia...@googlegroups.com
Hi,
 Are there equivalent functions to Matlab's nanmean and nanstd, i.e. functions for computing mean and standard deviation while ignoring NaN's? It's simple to put something together, of course, e.g.

function nanmean(x)
 mean(~isnan(x))
end

but it would nice to have as part of Base, or perhaps StatsBase? 

~ Roger

John Myles White

unread,
Feb 5, 2014, 11:14:32 PM2/5/14
to julia...@googlegroups.com
We could do this in StatsBase. In general, our approach has been to encourage development of tools for working with missing data in a more generic way than using NaN, which is what DataArrays are meant for. But NaN are a totally reasonable tool when you're willing to commit to floating point arithmetic, so we might want to add some flags to the functions in StatsBase to do this.

-- John
Message has been deleted

John Myles White

unread,
Sep 7, 2014, 1:45:28 PM9/7/14
to julia...@googlegroups.com
I think we’re still not really interested in promoting the use of NaN as a surrogate for NULL, especially given that Nullable is going to be added to Base in 0.4.

Your functions would perform substantially better if you iterated over the values of A. For example,

function nanmean(A::Array)
s, n = 0.0, 0
for val in A
if !isnan(val)
s += val
n += 1
end
end
return s / n
end

 — John

On Sep 7, 2014, at 10:36 AM, Alex <holli...@gmail.com> wrote:

I know it's a little late, but I was looking for the same thing and couldn't find it. I've made some slight adjustments to some code I found on github and made functions for nanmean and nanstd. I did not optimize for performance and wanted them to be able to handle arrays of various sizes.  

NANMEAN:

function nanmean(x::Array)

  z=similar(x)
  fill!(z,1)
  z[isnan(x)]=0
  numb_not_NaN_in_x=sum(z)

  nansum_x=sum(x) do x isnan(x) ? 0 : x end #from https://gist.github.com/milktrader/5213361
  nansum_x/numb_not_NaN_in_x

end


NANSTD
function nanstd(x::Array)

  z=similar(x)
  fill!(z,1)
  z[isnan(x)]=0
  numb_not_NaN_in_x=sum(z)

  nansum_x=sum(x) do x isnan(x) ? 0 : x end #from https://gist.github.com/milktrader/5213361
  nanmean_x=nansum_x/numb_not_NaN_in_x

  y=(x-nanmean_x).*(x-nanmean_x)

## NanMean for Sample
function nanmean_sample(y::Array)
  w=similar(y)
  fill!(w,1)
  w[isnan(y)]=0
  numb_not_NaN_in_y=sum(w)

  nansum_y=sum(y) do y isnan(y) ? 0 : y end #from https://gist.github.com/milktrader/5213361
  nansum_y/(numb_not_NaN_in_y-1)

end
nanstd_x=sqrt(nanmean_sample(y))

end

Alex

unread,
Sep 7, 2014, 1:47:41 PM9/7/14
to julia...@googlegroups.com
Hi There, 

I know its a little late, but I've had the same need for nanmean and nanstd, and when I stumbled across this I noticed that your function example will not provided the same answer as matlab. For example,

x= [1 2 NaN]
 
will produce the answer of 1.5 in matlab, but 2/3 using your function as it seems to produce the average number of  values that are NaN. 

I've written a nanmean and a nanstd function based off of some code I found on github for nansum. I have not optimized it for performance, but I wanted it to be able to handle arrays of various sizes.
Best, 

Alex


On Wednesday, February 5, 2014 9:11:45 PM UTC-7, Roger Herikstad wrote:

Alex

unread,
Sep 7, 2014, 1:53:35 PM9/7/14
to julia...@googlegroups.com
Thanks for the tip John. I'll add that feature to my functions!
Reply all
Reply to author
Forward
0 new messages