count not null values in a pandas DF or Series

2,883 views
Skip to first unread message

Abhishek Pratap

unread,
Dec 14, 2012, 2:43:46 PM12/14/12
to pyd...@googlegroups.com
I thought this would be straight forward with .count() however I am not sure how does it ignore the NA/NaN etc.

s = pandas.Series([1,2,3,4,5,6,'NA'])
s.count()
7

s = pandas.Series([1,2,3,4,5,6,'NaN'])
s.count()
7

**would expect 6**

I guess I not sure how it recognizes null values. All the columns in the data frame have null values I just need a count of non null values.

Thanks!
-Abhi

Adam Hughes

unread,
Dec 14, 2012, 3:49:50 PM12/14/12
to pyd...@googlegroups.com
Hi,

First, if you are using 'NA' or 'NaN', these are going to be read as strings and not Nan's.  Nan is a numpy datatype, so you can do:

import numpy as np
In [36]: s=Series([1,2,3,4,5,6,np.nan])

In [37]: s
Out[37]: 
0     1
1     2
2     3
3     4
4     5
5     6
6   NaN

When the Nan's are properly undersood, s.count() will ignore them by default.  Notice my array s has 7 elements (0-6), so its len() evaluates to 7.

In [50]: len(s)
Out[50]: 7

In [51]: s.count()
Out[51]: 6


If you are reading in data from a file using pandas from_csv(), there is a special parameter you can pass in so that certain character strings (like whitespace, 0 etc...) are converted to Nan's
Reply all
Reply to author
Forward
0 new messages