Mean of consecutive days in an array?

39 views
Skip to first unread message

Bruce Godfrey

unread,
Feb 14, 2013, 4:21:38 PM2/14/13
to py...@googlegroups.com
This should probably be obvious to me but I can't seem to find a solution.  

In the example below I get the air temperature for one day for each year of 10 years.

What's the most efficient way to get the mean for 10 consecutive days for each year of 10 years?  I'd like to get the mean of days 0-10 for each year of the decade.

#####
from pydap.client import open_url
import numpy as np
import matplotlib.pyplot as plt
import os


tmp = dataset['air_temperature']
var = tmp.array[0:3649:365,int(100),int(100)]
var1d = var.flatten()

tm = dataset['time']
timed = tm[0:3649:365,int(100),int(100)]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot_date(x=timed,y=var1d,fmt="b-")
plt.savefig("graph.png")
os.startfile("graph.png")
#####

Thanks,
-Bruce

James Hiebert

unread,
Feb 14, 2013, 4:43:49 PM2/14/13
to py...@googlegroups.com
Just use numpy.mean...

my_means = np.array( [ np.mean(x[i*365:i*365+10,:,:], 0) for i in range(10) ] )

The second argument to mean is "axis", so it will mean the time axis, but leave the x and y axes alone. That will give you an array that's 10x100x100 (so you'll need to make multiple plots). Obviously, watch out for leap years or caledars with days != 365.

~James
> --
> You received this message because you are subscribed to the Google Groups "pydap" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pydap+un...@googlegroups.com.
> To post to this group, send email to py...@googlegroups.com.
> Visit this group at http://groups.google.com/group/pydap?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
James Hiebert
Lead, Computational Support
Pacific Climate Impacts Consortium
http://www.pacificclimate.org
Room 112, University House 1, University of Victoria
PO Box 1700 Sta CSC, Victoria, BC V8V 2Y2
E-mail: hie...@uvic.ca
Tel: (250) 472-4521
Fax: (250) 472-4830

Roberto De Almeida

unread,
Feb 15, 2013, 8:39:15 AM2/15/13
to py...@googlegroups.com
On Thu, Feb 14, 2013 at 11:43 PM, James Hiebert <ja...@hiebert.name> wrote:
my_means = np.array( [ np.mean(x[i*365:i*365+10,:,:], 0) for i in range(10) ] )

The second argument to mean is "axis", so it will mean the time axis, but leave the x and y axes alone. That will give you an array that's 10x100x100 (so you'll need to make multiple plots). Obviously, watch out for leap years or calendars with days != 365.

This is how I'd do it too. In this case the dataset has a "noleap" calendar, so no need to worry about leap years.

--Rob



--
Roberto De Almeida, PhD

Bruce Godfrey

unread,
Feb 19, 2013, 12:37:39 PM2/19/13
to py...@googlegroups.com, rob...@dealmeida.net
James, Rob,

Thank you very much.  That works great on the dataset I am testing above.

One follow-up question.  On an opendap server I have setup in a development environment I'm trying the same code against an aggregated dataset.  The aggregated dataset encompasses 6 .nc files (55 years of data rather than 10 years).  When I run the same code I get the error:

ServerError: 'Server error 403: "Request too big=18900.0 Mbytes, max=500.0"'

I could increase the "max" parameter on the server but this could get quite large.  I was wondering if I should be using another approach for coding this type of request?  If I just want to get a single value for each year it works fine like this with no size error: 
var = tmp.array[0:18250:365,int(100),int(100)]

It's just when I try to average the 10-day window that I get the error:
var = np.array( [ np.mean(tmp[i*365:i*365+10,int(100),int(100)], 0) for i in range(10) ] ) 

Do you recommend a different approach?

Thanks,
-Bruce

Roberto De Almeida

unread,
Feb 19, 2013, 1:19:14 PM2/19/13
to py...@googlegroups.com
On Tue, Feb 19, 2013 at 7:37 PM, Bruce Godfrey <brg...@gmail.com> wrote:
ServerError: 'Server error 403: "Request too big=18900.0 Mbytes, max=500.0"'

I could increase the "max" parameter on the server but this could get quite large.  I was wondering if I should be using another approach for coding this type of request?  If I just want to get a single value for each year it works fine like this with no size error: 
var = tmp.array[0:18250:365,int(100),int(100)]

It's just when I try to average the 10-day window that I get the error:
var = np.array( [ np.mean(tmp[i*365:i*365+10,int(100),int(100)], 0) for i in range(10) ] ) 

Do you recommend a different approach?

This is strange, because the first request has size 18250/365, 1, 1 = (50, 1, 1)
While the second does 10 requests of size (10, 1, 1), so individual requests are smaller!

You can try this:

var = np.mean( np.array([ tmp[i:18250:365,100,100] for i in range(10) ]), axis=0 )
 
This will create an array of size (10, 50, 1, 1), corresponding to 10 days, 50 years and a point location. We then average across the 10 first days, resulting on a (50, 1, 1) array.

Let me know if it works.

Thanks,
--Rob
 

 
Thanks,
-Bruce

--
You received this message because you are subscribed to the Google Groups "pydap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydap+un...@googlegroups.com.
To post to this group, send email to py...@googlegroups.com.
Visit this group at http://groups.google.com/group/pydap?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Bruce Godfrey

unread,
Feb 19, 2013, 2:50:09 PM2/19/13
to py...@googlegroups.com, rob...@dealmeida.net
You can try this:

var = np.mean( np.array([ tmp[i:18250:365,100,100] for i in range(10) ]), axis=0 )
 
This will create an array of size (10, 50, 1, 1), corresponding to 10 days, 50 years and a point location. We then average across the 10 first days, resulting on a (50, 1, 1) array.

Let me know if it works.

Thanks,
--Rob

 Hi Rob, I end up getting the same error using that syntax as well.  It is strange.  This is a THREDDS server I'm testing against.  I wonder if there might be something buggy with THREDDS trying to determine the size of the request.  The request works with the pydap server for a file with 1 decade worth of data.  If I put the identical file on the THREDDS server and make an identical request I get the "Request too big" message.  

-Bruce 

Roberto De Almeida

unread,
Feb 20, 2013, 4:11:11 AM2/20/13
to py...@googlegroups.com
Indeed, this looks like a bug. Each request has only 200 bytes!

I think the problem is in the way the THREDDS server aggregates the data. It seems that it's trying to request the entire dataset; 18900.0 Mbytes is exactly its size (55y * 365d * 442lat * 523lon * 4bytes). 

Can you send me your script so I can take a look?

Thanks,
--Rob



-Bruce 

--
You received this message because you are subscribed to the Google Groups "pydap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydap+un...@googlegroups.com.
To post to this group, send email to py...@googlegroups.com.
Visit this group at http://groups.google.com/group/pydap?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Bruce Godfrey

unread,
Feb 20, 2013, 11:44:48 AM2/20/13
to py...@googlegroups.com, rob...@dealmeida.net


On Wednesday, February 20, 2013 1:11:11 AM UTC-8, Roberto De Almeida wrote:
Indeed, this looks like a bug. Each request has only 200 bytes!

I think the problem is in the way the THREDDS server aggregates the data. It seems that it's trying to request the entire dataset; 18900.0 Mbytes is exactly its size (55y * 365d * 442lat * 523lon * 4bytes). 

Can you send me your script so I can take a look?

Thanks,
--Rob



Hi Rob, here is the code I've been testing.  

This works for either dataset:  var = ds.array[0:18250:365,int(100),int(100)]

This works for the dataset at "cloud.insideidaho" but not "inside-dev1":  var = np.array( [ np.mean(ds[i*365:i*365+9,int(100),int(100)], 0) for i in range(10) ] ) 

"inside-dev1" is behind a firewall at this point.  

#####
from pydap.client import open_url
import numpy as np
import matplotlib.pyplot as plt
import os


ds = dataset['specific_humidity']

#var = ds.array[0:18250:365,int(100),int(100)]
var = np.array( [ np.mean(ds[i*365:i*365+9,int(100),int(100)], 0) for i in range(10) ] ) 
var1d = var.flatten()

tm = dataset['time']
timed = tm[0:18250:365,int(100),int(100)]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot_date(x=timed,y=var1d,fmt="b-")
plt.savefig("graph.png")
os.startfile("graph.png")
#####

-Bruce

Bruce Godfrey

unread,
Feb 21, 2013, 4:29:02 PM2/21/13
to py...@googlegroups.com, rob...@dealmeida.net
Just an additional note.  If I edit the THREDDS configuration file increasing the size of the request limit I then get this error when I run the script: 

Traceback (most recent call last):
  File "C:\temp\REACCH\improveSpeedFeb\specific_humidity.py", line 14, in <module>
    var = np.array( [ np.mean(ds[i*365:i*365+9,int(100),int(100)], 0) for i in range(10) ] )
  File "C:\Python27\ArcGIS10.1\lib\site-packages\pydap-3.1.rc1-py2.7.egg\pydap\model.py", line 662, in __getitem__
    var.data = var.data[slice_]
  File "C:\Python27\ArcGIS10.1\lib\site-packages\pydap-3.1.rc1-py2.7.egg\pydap\proxy.py", line 118, in __getitem__
    data = data2 = DapUnpacker(xdrdata, dataset).getvalue()
  File "C:\Python27\ArcGIS10.1\lib\site-packages\pydap-3.1.rc1-py2.7.egg\pydap\xdr.py", line 119, in getvalue
    out.append(self.getvalue())
  File "C:\Python27\ArcGIS10.1\lib\site-packages\pydap-3.1.rc1-py2.7.egg\pydap\xdr.py", line 146, in getvalue
    out.shape = self.var.shape
ValueError: total size of new array must be unchanged


Reply all
Reply to author
Forward
0 new messages