Ignoring NaNs

Roger Herikstad

unread,

Feb 5, 2014, 11:11:45 PM2/5/14

to julia...@googlegroups.com

Hi,

Are there equivalent functions to Matlab's nanmean and nanstd, i.e. functions for computing mean and standard deviation while ignoring NaN's? It's simple to put something together, of course, e.g.

function nanmean(x)

mean(~isnan(x))

end

but it would nice to have as part of Base, or perhaps StatsBase?

~ Roger

John Myles White

unread,

Feb 5, 2014, 11:14:32 PM2/5/14

to julia...@googlegroups.com

We could do this in StatsBase. In general, our approach has been to encourage development of tools for working with missing data in a more generic way than using NaN, which is what DataArrays are meant for. But NaN are a totally reasonable tool when you're willing to commit to floating point arithmetic, so we might want to add some flags to the functions in StatsBase to do this.

-- John

Message has been deleted

John Myles White

unread,

Sep 7, 2014, 1:45:28 PM9/7/14

to julia...@googlegroups.com

I think we’re still not really interested in promoting the use of NaN as a surrogate for NULL, especially given that Nullable is going to be added to Base in 0.4.

Your functions would perform substantially better if you iterated over the values of A. For example,

function nanmean(A::Array)

s, n = 0.0, 0

for val in A

if !isnan(val)

s += val

n += 1

end

return s / n

end

— John

On Sep 7, 2014, at 10:36 AM, Alex <holli...@gmail.com> wrote:

I know it's a little late, but I was looking for the same thing and couldn't find it. I've made some slight adjustments to some code I found on github and made functions for nanmean and nanstd. I did not optimize for performance and wanted them to be able to handle arrays of various sizes.

NANMEAN:

function nanmean(x::Array)

z=similar(x)
fill!(z,1)
z[isnan(x)]=0
numb_not_NaN_in_x=sum(z)

nansum_x=sum(x) do x isnan(x) ? 0 : x end #from https://gist.github.com/milktrader/5213361
nansum_x/numb_not_NaN_in_x

end

NANSTD
function nanstd(x::Array)

z=similar(x)
fill!(z,1)
z[isnan(x)]=0
numb_not_NaN_in_x=sum(z)

nansum_x=sum(x) do x isnan(x) ? 0 : x end #from https://gist.github.com/milktrader/5213361
nanmean_x=nansum_x/numb_not_NaN_in_x

y=(x-nanmean_x).*(x-nanmean_x)

## NanMean for Sample
function nanmean_sample(y::Array)
w=similar(y)
fill!(w,1)
w[isnan(y)]=0
numb_not_NaN_in_y=sum(w)

nansum_y=sum(y) do y isnan(y) ? 0 : y end #from https://gist.github.com/milktrader/5213361
nansum_y/(numb_not_NaN_in_y-1)

end

nanstd_x=sqrt(nanmean_sample(y))

end

Alex

unread,

Sep 7, 2014, 1:47:41 PM9/7/14

to julia...@googlegroups.com

Hi There,

I know its a little late, but I've had the same need for nanmean and nanstd, and when I stumbled across this I noticed that your function example will not provided the same answer as matlab. For example,

x= [1 2 NaN]

will produce the answer of 1.5 in matlab, but 2/3 using your function as it seems to produce the average number of values that are NaN.

I've written a nanmean and a nanstd function based off of some code I found on github for nansum. I have not optimized it for performance, but I wanted it to be able to handle arrays of various sizes.

Best,

Alex

On Wednesday, February 5, 2014 9:11:45 PM UTC-7, Roger Herikstad wrote:

Alex

unread,

Sep 7, 2014, 1:53:35 PM9/7/14

to julia...@googlegroups.com

Thanks for the tip John. I'll add that feature to my functions!

Reply all

Reply to author

Forward