Some doubts on cube slicing

243 views
Skip to first unread message

Alejandro Villamarin

unread,
Jan 23, 2014, 4:15:12 AM1/23/14
to scitoo...@googlegroups.com
Hi,

I'm having some issues with data extraction from a cube.

I'm opening a GRIB2 file from NOAA (you can find it here: https://copy.com/9DnUVd2W9zkl) , that if I use wgrib2 with it, I get the following information (only showing 10 records of 255):

1:0:06Z16jan2014:WDIR Wind Direction (from which blowing) [deg]:lvl1=(1,1) lvl2=(255,missing):surface:anl:
2:28715:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,1) lvl2=(255,missing):1 in sequence:anl:
3:44159:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,2) lvl2=(255,missing):2 in sequence:anl:
4:54536:06Z16jan2014:WDIR Wind Direction (from which blowing) [deg]:lvl1=(1,1) lvl2=(255,missing):surface:1 hour fcst:
5:83772:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,1) lvl2=(255,missing):1 in sequence:1 hour fcst:
6:98217:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,2) lvl2=(255,missing):2 in sequence:1 hour fcst:
7:108464:06Z16jan2014:WDIR Wind Direction (from which blowing) [deg]:lvl1=(1,1) lvl2=(255,missing):surface:2 hour fcst:
8:137555:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,1) lvl2=(255,missing):1 in sequence:2 hour fcst:
9:151239:06Z16jan2014:SWDIR Direction of Swell Waves [deg]:lvl1=(241,2) lvl2=(255,missing):2 in sequence:2 hour fcst:
10:161535:06Z16jan2014:WDIR Wind Direction (from which blowing) [deg]:lvl1=(1,1) lvl2=(255,missing):surface:3 hour fcst:

From this I infer that the file contains information about Wind direction and Direction of Swell Waves....I'm just starting with GRIB2/Multi-dimensional files, so forgive my ignorance if something of what I say sounds ridicoulous...:D

If I get more specific about a record, say number 10, for instance:

10:161535:vt=2014011609:surface:3 hour fcst:WDIR Wind Direction (from which blowing) [deg]:
    ndata=76845:undef=61318:mean=198.327:min=113.51:max=330.03
    grid_template=0:winds(N/S):
    lat-lon grid:(327 x 235) units 1e-06 input WE:NS output WE:SN res 48
    lat 49.100000 to 40.910000 by 0.035000
    lon 267.799984 to 284.099968 by 0.050000 #points=76845

This looks like its telling me that this reacord has 76845 "data" points, mean value for it is 198.327, max 330.03, min 113.51, etc...right?

Ok, now I go back to IRIS and open the same file:

filename = 'glw.grl.WDIR.grb2'
cubes = iris.load(filename)
print cubes

The output is:

0: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
1: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
2: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)

I really like the concept of Cube from IRIS, it does make a lot of sense to me. From this output, I understand the that file holds info about 3 phenomena that IRIS packed into 3 "cubes" with dimensions 85x235x327 each. Now...why is IRIS printing unknown/unknown...does that mean that the file is corrupt or header information incomplete?

Going deeper, say I get the first cube:

time = cubes[0]

Get the first element of the data array:

print time[0,0,0]

Output:

unknown / (unknown)                 (scalar cube)
     Scalar coordinates:
          forecast_period: 0 hours
          latitude: 49.1 degrees
          longitude: 267.799984 degrees
          originating_centre: US National Weather Service, National Centres for Environmental Predic...
          time: 2014-01-16 00:00:00

Some questions...how to I get the actual value? I can't see it anywhere...from my understanding [0,0,0] should represent an actual discrete data point right? probably at a given time, latitude, longitude?

Thanks!
Alejandro

Andrew Dawson

unread,
Jan 23, 2014, 4:43:41 AM1/23/14
to scitoo...@googlegroups.com
From this output, I understand the that file holds info about 3 phenomena that IRIS packed into 3 "cubes" with dimensions 85x235x327 each.
 
That is correct.
 
Now...why is IRIS printing unknown/unknown...does that mean that the file is corrupt or header information incomplete?
 
There is nothing wrong with the data or header, this is simply a limitation of the GRIB handling in Iris I think. "unknown / (unknown)" simply means Iris doesn't know what phenomenon the cube is or what units it has. You can set these yourself with:
 
cube = cubes[0]
cube
.rename('name_of_phenomenon')
 
If you provided a recognised CF standard name Iris will understand it, I don't have access to Iris right now so I can't tell you if this will automatically set units, but you can always do that manually too:
 
cube.units = '...'   # whatever they are, expressed in udunits compatible form
 
Some questions...how to I get the actual value? I can't see it anywhere...from my understanding [0,0,0] should represent an actual discrete data point right? probably at a given time, latitude, longitude?
 
When you slice a cube you always get a cube, even if it contains only a single value. To see the value you can use the data attribute, which will show the actual data array:
 
cube[0, 0, 0].data
 
or
 
cube.data[0, 0, 0]
 
Hope that helps.

Alejandro Villamarin

unread,
Jan 23, 2014, 5:29:30 AM1/23/14
to Andrew Dawson, scitoo...@googlegroups.com
Hello Andrew,

Indeed, that seems to work. A couple more questions:

1.- Is there a range for each of the dimensions (85, 235, 327)? I'm asking because I will want to itereate the data array, and I don't know if the range goes from 0..N or it follows some other pattern...

2.- This leads me to ask...I've found many data are empty (or NaN)...is there a built-in function to "remove all NaN values"? That would be great

Thanks!
Alejandro


--
You received this message because you are subscribed to a topic in the Google Groups "Iris" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scitools-iris/AEE-jVIqbK0/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to scitools-iri...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alejandro Villamarin

unread,
Jan 23, 2014, 11:45:42 AM1/23/14
to scitoo...@googlegroups.com, Andrew Dawson
Hello again,

I've managed to get rid of the NaN values, altough I'm not sure if it is a good idea...I'm losing dimensions by doing so...going from a multidimensional array to a 1D array:

    filename = 'glw.grl.WDIR.grb2'
    cubes
= iris.load(filename)
   
print
cubes
    waves
= cubes[2]
   
print "Current shape for wave cube is:"
   
print waves.shape
    data
= waves.data
    data
= data[~np.isnan(data)] #Get rid of the NaN values
   
print data.shape #data becomes a 1D array here
   
for i in range(data.shape[0]): #Iterate the remaining Cubes, get rid of the NaN values too and print them
        a
= waves[i]
        values
= a.data
        values
= values[~np.isnan(values)]
       
print values.shape
       
for j in range(values.shape[0]):
           
print str(values[j])
       
       

   


Output:

0: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
1: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
2: unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
Current shape for wave cube is:
(85, 235, 327)
(146011,)
(2548,)
351.15
357.56
356.24
352.29
350.08
347.77
281.02
70.68
101.85
102.65
102.15
102.33
358.27
353.49
353.13
350.35
346.44

Now, as I said before, I'd like to know what the ranges for each dimension are....is this possible? I've tried this:

    for i in range(indexOne):
       
for j in range(indexTwo):
           
for k in range(indexThree):
                a
= waves[i,j,k]
               
if str(a.data) == '--':
                   
print "Empty value"
               
else:
                   
print "Not empty. Value is " + str(a.data)


But I can't get a reference to the index, besides the counter, obviosly. I'm asking for this to know the correlation between the 3 dimensions.

Regards,
Alejandro
To unsubscribe from this group and all of its topics, send an email to scitools-iris+unsubscribe@googlegroups.com.

bblay

unread,
Jan 23, 2014, 12:29:58 PM1/23/14
to scitoo...@googlegroups.com, Andrew Dawson
Hi,


I'm losing dimensions by doing so

You might not need to get rid of all the NaN values.
Iris' analysis and plotting routines should ignore them.

Alejandro Villamarin

unread,
Jan 23, 2014, 3:39:34 PM1/23/14
to scitoo...@googlegroups.com, Andrew Dawson
Hi,

Ok, altough I don't plan to use (not yet) IRIS for plotting...I just want to extract the data (the values) from GRIB2 files and probably store it after some parsing. Thus...any clues on how to get the ranges for each dimension? I really think I need this to know what each value means...otherwise is just raw data with no sense...

Besides, when I print a Cube, and it prints for instance latitude and longitude with an 'X'...what is that supposed to mean? That it has a value?

Regards,
Alejandro

Andrew Dawson

unread,
Jan 24, 2014, 2:47:49 PM1/24/14
to Alejandro Villamarin, scitoo...@googlegroups.com
Ok, altough I don't plan to use (not yet) IRIS for plotting...I just want to extract the data (the values) from GRIB2 files and probably store it after some parsing.

It is not really clear what you are trying to achieve, so it is difficult to offer you targeted advice. I would like to mention that Iris provides a very rich data analysis environment, so it is possible your analysis can be done without resorting to ripping the data array from the cube (and thus losing the valuable metadata).

Thus...any clues on how to get the ranges for each dimension?

Not entirely sure what you mean, do you mean how big each dimension is? If so you can use the shape attribute of the cube, which behaves exactly the same way as that of a numpy array. If you want the values of the coordinates then you can extract them from the cube:

print cube.coord('latitude')   # iris coordinate object
print cube.coord('latitude').points    # numpy array

Looking at your previous example it seems that you may want to read up on numpy arrays and in particular masked arrays. I noticed you were converting values to strings and checking for '--' to indicate missing values, but there are many advanced features of masked arrays that mean you don't need to do this (and in fact you probably shouldn't do this!). The numpy documentation is pretty good, try the user guide http://docs.scipy.org/doc/numpy-1.7.0/user/ and the reference documentation http://docs.scipy.org/doc/numpy-1.7.0/reference/. Iris is built around numpy so a good understanding of the numpy basics will go a long way.

Besides, when I print a Cube, and it prints for instance latitude and longitude with an 'X'...what is that supposed to mean? That it has a value?

This is telling you which dimension coordinates are associated with each dimension of the cube. It is possible to have multiple auxiliary coordinates representing the same dimension, and it is also possible for a single auxiliary coordinate to represent more than one cube dimension, which is why it is necessary to present the information in this way.

Alejandro Villamarin

unread,
Jan 24, 2014, 3:47:59 PM1/24/14
to Andrew Dawson, scitoo...@googlegroups.com
Hello Andrew,

Thanks for your very detailed response. Yes, you're right, maybe I should state what I'm trying to achieve: I need to parse lots of GRIB2 files from NOAA, probably those from the WaveWatch Model III. When I say parse, I mean get the timeseries values for a particular variable (e.g. Swell wave) and probably store it in another place (noSQL db most surely). I'll be doing some correlation with this data and some other afterwards. Thus, the way I see this using IRIS is:

1º Load the file and get the cubes for the measured phenomena
2º For each cube, get the data from the grid associated.
3º Store the extracted data

The main problem I'm having is that I'm not able to get the metadata about the phenomena each Cube is measuring. I can print the contents of the associated numpy array for each cube yes, but its just raw data that makes no sense to me (yet), thus, I'd like to be able to get something like cell in a cube something like, e.g.:

This value "X", belongs to dimension time (06:00:00), at longitude (43.04) and latitude (34.56).

Not sure if this is possible, or I should rethink this :)

As I said previously, I don't plan to do any plotting, just need to extract and store somewhere else, with another format no as structured as GRIB2.

Hope this helps,
Alejandro

bblay

unread,
Jan 27, 2014, 12:42:56 PM1/27/14
to scitoo...@googlegroups.com, Andrew Dawson
Hi,

Does this help?
for ndi in np.ndindex(cube.shape):
    value
= cube.data[ndi]
   
print "index", ndi, "value", value,
   
for dim in range(cube.ndim):
        coord
= cube.coord(dimensions=dim)
        coord_index
= ndi[dim]
       
print coord.name(), coord.points[coord_index],
   
print ""

Here is a simpler alternative:
for ndi in np.ndindex(cube.shape):
    point_cube
= cube[ndi]
   
print point_cube.data,
   
for coord_name in ["latitude", "longitude"]:
       
print coord_name, point_cube.coord(coord_name).points[0],
   
print ""


B

Alejandro Villamarin

unread,
Jan 27, 2014, 5:44:40 PM1/27/14
to scitoo...@googlegroups.com, Andrew Dawson

Hello bblay,

First snippet isn't working. Second one outputs each latitude and longitude from each point in the cube right?

This is helpful indeed, but I'm still unsure of what phenomena I'm measuring...for instance, printing the cube results in:


unknown / (unknown)                 (time: 85; latitude: 235; longitude: 327)
     Dimension coordinates:
          time                           x             -               -
          latitude                       -             x               -
          longitude                      -             -               x
     Auxiliary coordinates:
          forecast_period                x             -               -
     Scalar coordinates:

          originating_centre: US National Weather Service, National Centres for Environmental Predic...

Thus, how to know what I'm measuring here? I can get the values yes, but not sure what they mean yet. From the previous explanation from Andrew (regarding the X), looks like each dimension coordinate is associated with itself...this puzzles me...what does it mean that time dimension is associated with time? Same goes for longitude-latitude.

I really like IRIS, but is getting really hard to make some sense of the data stored in a Cube...probably because of my lack of experience in the grid files domain :)

Regards,
Alejandro

Andrew Dawson

unread,
Jan 28, 2014, 4:04:13 AM1/28/14
to scitoo...@googlegroups.com, Andrew Dawson
Thus, how to know what I'm measuring here? I can get the values yes, but not sure what they mean yet.
 
As you saw in your own first example, you can index the cube as well as the data array the cube contains. So if I have a 3d cube and I select a particular point with cube[0, 0, 0] then this will result in a cube which contains one value. All the metadata (coordinates, units etc.) are contained in the cube, and the actual value is contained in this cube's data attribute. So if you want all the metadata to be available, you need to work with cubes, not the data arrays they contain.
 
From the previous explanation from Andrew (regarding the X), looks like each dimension coordinate is associated with itself...this puzzles me...what does it mean that time dimension is associated with time? Same goes for longitude-latitude.
 
In Iris a cube can have as many or as few metadata coordinates as you like. It is possible to have a cube containing 4-d data that has no coordinates (i.e. no metadata, values for the coordinate points for any of the 4 dimensions). In this case the print would show 4 dimensions along the top and list no coordinates down the side. In a simple case where you have a 4-d cube with 4 metadata coordinates, a single 1d coordinate to represent each dimension, you would see 4 coordinates listed down the side, with a X under the cube dimension they describe. The name printed on the top is just for convenience (and may even be removed in the future) as the top row simply represents the dimensionality of the contained data.
 
Once you understand the difference between a dimension of an array (listed along the top) and a coordinate which describes a dimension (listed down the side), it should be clear that the coordinates are not associated with themselves, but with a dimension of the contained data. The coordinates I have discussed so far are actually called dimension coordinates (part of the CF specification) and are restricted to be 1-d and monotonic. A second type of coordinate is the auxiliary coordinate, which is not restricted in the same way as a dimension coordinate. It is when you have some of these coordinates as well that the representation with the Xs and really comes into play.
 
Let's say you have a 2-d cube on some projection other than latitude-longitude, and it has 2 dimension coordinates associated with its 2 data dimensions, these will be called projection_x_coordinate and projection_y_coordinate, and they'll measure some distance in projection coordinates. These coordinates are 1-d monotonic and describe a grid (just like lat-lon coordinates do on a lat-lon projection). Now, this cube can also have two auxiliary coordinates that describe the actual latitude and longitude of each point. These coordinates will have to be 2-d since the data points won't necessarily form a grid in lat-lon space. In this case the cube output would look like this:
 
cube_phenomenon / (units)           (projection_y_coordinate: 96; projection_x_coordinate: 121)
     
Dimension coordinates:
          projection_y_coordinate                           x                            
-
          projection_x_coordinate                          
-                            x
     
Auxiliary coordinates:
          latitude                                          x                            x
          longitude                                         x                            x

This tells you that whilst the 1-d coordinates projection_y_coordinate and projection_x_coordinate are associated with one data dimension each, there is a value of latitude and longitude for every point in the data array.
 
This reply was a bit longer than I originally intended! I hope it makes sense and helps you understand the structure of the cube with respect to coordinates.
 
 

Carwyn Pelley

unread,
Jan 31, 2014, 9:03:59 AM1/31/14
to
> From the previous explanation from Andrew (regarding the X), looks like each dimension coordinate is associated with itself...this puzzles me...what does it mean that time dimension is associated with time? Same goes for longitude-latitude.

Have you tried looking at the user guide?  It has a detailed explanation on the topics that your having difficulty with.  This should help you get started:
http://scitools.org.uk/iris/docs/latest/userguide/iris_cubes.html


> Thus, how to know what I'm measuring here? I can get the values yes, but not sure what they mean yet.

As A. J. Dawson mentioned, "unknown / (unknown)" simply means Iris doesn't know what phenomenon the cube is or what units it has.  You could write a callback to pull in attributes from the grib message, since it sounds like you are not doing any analysis.  This will allow you to identify what this data represents in the terminology of the grib message.  You can then fill-in this metadata as mentioned above:

def callback_grib_missingmeta(cube, grib_wrapper, filename):
   
try:
        cube
.attributes.update({'table2Version':grib_wrapper.table2Version,
                               
'indicatorOfParameter':grib_wrapper.indicatorOfParameter,
                               
'grib.edition':grib_wrapper.grib.edition})
   
except AttributeError:
       
print 'Cannot find GRIB key parameterCategory'


cubes
= iris.load(filename, callback=callback_grib_missingmeta)

Hope this helps,

@cpelley
Reply all
Reply to author
Forward
0 new messages