Wrapped extract

316 views
Skip to first unread message

bblay

unread,
Jan 8, 2013, 12:10:35 PM1/8/13
to scitools...@googlegroups.com

From https://github.com/SciTools/iris/issues/83

It's time to nail this down into a concrete plan.
The purpose of this post is to present a solution for critical analysis and/or mockery.

I suggest:

  • Upgrade iris._constraints.Constraint.extract():
    • Look for a coord constraint on a circular coord where the constraint goes beyond the points range.
      • This can be done by taking the left and right adjacent coords (i.e with + and - mod) and checking if any points pass the test in those, but not the original.
      • Possibly ugly, but I think this works? Does floating point accuracy get in the way?
    • If found, apply an updated version of Carwyn's code (above).
      • The update will be to the shift amount, which will apply an "appropriate shift", not just halve it.
      • An "appropriate shift" is the number of points from the first point to the first extraction point.

So. Is this solution acceptable? Is it flawed? Are there any nicer alternatves?

Andrew Dawson

unread,
Jan 8, 2013, 12:47:16 PM1/8/13
to scitools...@googlegroups.com
From a user perspective I think this is the right way to do it in general. The rolling of the cube should definitely take place behind the scenes and just leave the user with the end result they expected.

As for the technical side, I've been thinking about this a lot myself recently, and yours seems like a reasonable strategy.

Possibly ugly, but I think this works? Does floating point accuracy get in the way?

If we are thinking along the same lines the grid would have to be really fine for FP accuracy to be a real problem I think...

 Of course it would be helpful to see a prototype of this in case there is some behaviour that has been overlooked etc.

bblay

unread,
Jan 16, 2013, 7:25:36 AM1/16/13
to scitools...@googlegroups.com
Thanks for the input.

Whilst working on a prototype, a question occured:
Would we ever want to use extract on circular a coord without wrapping?

I don't image we would ever want to.

Andrew Dawson

unread,
Jan 16, 2013, 7:35:15 AM1/16/13
to scitools...@googlegroups.com
I don't think so. The circular aspect of the coordinate is part of the abstraction, so it should always be handled with wrapping I think.

bblay

unread,
Jan 16, 2013, 6:21:33 PM1/16/13
to scitools...@googlegroups.com
Here's a first working draft: https://github.com/bblay/iris/commit/c614f890afcb8962df4548e1f0247ddfbcea2fa7

Output:
(61,)
[-30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13
 
-12 -11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5
   
6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23
 
24  25  26  27  28  29  30]
(61,)
[330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347
 
348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
 
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383
 
384 385 386 387 388 389 390]

bblay

unread,
Jan 17, 2013, 12:08:08 PM1/17/13
to scitools...@googlegroups.com
This doesn't work for some >= constraints.

======================================================================
FAIL
: test_combined (iris.tests.test_cdm.TestCubeExtract)
----------------------------------------------------------------------
Traceback (most recent call last):
 
File "/net/home/h05/itbb/git/iris/lib/iris/tests/test_cdm.py", line 576, in test_combined
   
self.assertCML(self.single_cube.extract(constraint), ('cdm', 'extract', 'lat_gt_10_and_lon_ge_10.cml'))
 
File "/net/home/h05/itbb/git/iris/lib/iris/tests/__init__.py", line 189, in assertCML
   
self._check_same(xml, reference_path, reference_filename)
 
File "/net/home/h05/itbb/git/iris/lib/iris/tests/__init__.py", line 245, in _check_same
   
self._assert_str_same(reference, item, reference_filename, type_comparison_name)
 
File "/net/home/h05/itbb/git/iris/lib/iris/tests/__init__.py", line 135, in _assert_str_same
   
self.fail("%s do not match: %s\n%s" % (type_comparison_name, reference_filename, diff))
AssertionError: CML do not match: ('cdm', 'extract', 'lat_gt_10_and_lon_ge_10.cml')
--- Reference
+++ Test result
@@ -80,2 +80,2 @@
-        <dimCoord id="63128986" points="[11.25, 13.125, 15.0, ..., 354.375, 356.25,
-        358.125]"
shape="(186,)" standard_name="longitude" units="Unit('degrees')" value_type="float32">
+        <dimCoord id="63128986" points="[11.25, 13.125, 15.0, ..., 365.625, 367.5,
+        369.375]"
shape="(192,)" standard_name="longitude" units="Unit('degrees')" value_type="float32">
@@ -150 +150 @@
-    <data checksum="0x5552b088" dtype="float32" shape="(38, 64, 186)"/>
+    <data checksum="0x7684b27" dtype="float32" shape="(38, 64, 192)"/>


It's picked 6 extra longitude points.

bblay

unread,
Jan 17, 2013, 12:09:55 PM1/17/13
to scitools...@googlegroups.com
I think it's a bug in the implementation, rather than a flaw in the method, because it is supposed to pick the same number of points as the original coord.

pp-mo

unread,
Jan 17, 2013, 12:17:25 PM1/17/13
to scitools...@googlegroups.com
Could it be this ? ...I looked into that code, and I thought lines 166-172 looked suspicious...
  new_coord.points = numpy.concatenate((
                          l_coord.points[-roll:],
                          c_coord.points[:-roll]))
  if new_coord.has_bounds():
      new_coord.bounds = numpy.concatenate((
                              l_coord.bounds[-roll:],
                              coord.bounds[:-roll]))

Surrely that last line should have 'c_coord' instead of 'coord' , by analogy with the previous?
Maybe this also proves the test could be better ;-)

bblay

unread,
Jan 18, 2013, 5:27:00 AM1/18/13
to scitools...@googlegroups.com
thanks, pp-mo.

looks like a fail bug, so might not be the cause
but very grateful for you spotting this.

yes, the tests definitely need to test left and right wrapping with bounds too.
will do.

bblay

unread,
Jan 18, 2013, 5:39:51 AM1/18/13
to scitools...@googlegroups.com
It's not a bug.

The source longitude ranges from 0 to 358.125 in 192 steps.
With the given constraint of lon >= 10
 - The old code just chopped the first 6 points off.
 - The new code honours the circularity, as discussed above,
   taking the next 6 points around the circle, as it's supposed to do.

I'll fix the cml and add bounded tests, then reopen the PR.

RHattersley

unread,
Jan 18, 2013, 6:03:47 AM1/18/13
to scitools...@googlegroups.com
For a circular coordinate I'm not really sure what a one-sided constraint such as "lon >= 170" means. i.e. I don't know what result I'd expect. So that means I'd rather it didn't exist!

I started with longitude values from -180 to 180, then "lon >= 170" could give:
 1) just [170, 180]
 2) or [170, 180, 190, ... 530]

There's a relatively unambiguous result for two-sided constraints "lon in range 170 to 190", but that doesn't have to be expressed as "170 <= lon <= 190". If you do express it that way then "lon >= 170" would have to give result (2), but in such a way that adding "lon <= 190" gives just [170, 180, 190].

bblay

unread,
Jan 18, 2013, 8:01:21 AM1/18/13
to scitools...@googlegroups.com
For a circular coordinate I'm not really sure what a one-sided constraint such as "lon >= 170" means. i.e. I don't know what result I'd expect.

I don't agree.
If you ask for "lon >=170" you should only ever expect to get all the points.
If you ask for "170 <= lon <= 190", then you should only ever expect to get a few points.

However, I do see a problem when combining two separate constraints.
(Apologies if that's what you were getting at.)
>170, combined with <190 would roll it one way, then roll it the other, and give you all the points, which is wrong.
Perhaps we should just raise a warnings or exception if we find two constraints on the same circular coord.

Alternatively, we could consider an upgrade to the constraints system itself, where there's no ambiguity.
(Could include a solution to [this] (https://groups.google.com/forum/?fromgroups=&hl=en-GB#!topic/scitools-iris-dev/6y5XKCpojxQ))

Phil Elson

unread,
Jan 18, 2013, 8:05:23 AM1/18/13
to scitools...@googlegroups.com
Fundamentally I agree with RHattersley - I don't think funky (but useful) circular finctionality should be the same syntax as the standard constraint.

Personally, I think wrapped extractions should be an opt-in thing, and frankly I'm not convinced it is something that is necessary to be done at load time - IMHO a separate method/function would be perfectly suitable.

Andrew Dawson

unread,
Jan 18, 2013, 8:26:12 AM1/18/13
to scitools...@googlegroups.com
I am massively in favour of circular dimensions being dealt with automatically by the constraints system. I don't want to have to check what representation my longitude dimension has if it is circular, it should be arbitrary and I should be able to extract any subset I choose without doing anything special. However, with this power comes responsibility: my responsibility as a user not to do something like applying two constraints to the same circular coord of the kind lon > 170 and lon < 190. If doing so I should expect and understand that I will get all the longitude values.

If this requires an overhaul of the constraints system in order to be implemented in a way where there is no ambiguity then perhaps that is a good idea... Personally I am fine with getting all points from a circular dimension if I ask for lon > 170 etc. (although a warning might be useful, I don't know). I'd rather just have this as a well documented caveat and have sensible handling of circular dimensions in constraints, than have to have a separate interface for circular extraction.

Andrew Dawson

unread,
Jan 18, 2013, 8:41:34 AM1/18/13
to scitools...@googlegroups.com
In addition to my previous remarks, we also have to consider if applying a constraint such as:

Constraint(longitude=lambda l: l > 170)

to a circular dimension actually makes any sense? I don't think it does. I think that the ambiguity lies in the constraint and not in the implementation of the constraints system. The only way this constraint can make sense is if the user has checked where their longitudes end and wants everything from 170 to that end point. If that is the case then they already know what their end point is and should really be using a constraint of the form:

Constraint(longitude=lambda l: 170 < l < end_point)

which is unambiguous and well-formed for a circular dimension.

If people feel it is necessary then there could be some way to turn off the circular handling, but I strongly feel that handling circular dimension naturally as bblay is trying to implement should be the default.




Carwyn Pelley

unread,
Mar 1, 2013, 11:40:07 AM3/1/13
to scitools...@googlegroups.com
bump!

Hi all, Iv noticed that this thread has somewhat stalled.  This is a feature I know many people have requested for some time.  It would be good to come to some agreement on the behaviour that we might expect and on the implementation that were happy with.

Summary:
Wrapped extraction at load time? or separate function?
Should we really get all point when providing a single constraints? (i.e. lon >=170 would return all points)

Andrew Dawson

unread,
Mar 1, 2013, 12:23:58 PM3/1/13
to scitools...@googlegroups.com
Thanks for bringing this up again. I just want to clarify the case for doing this automatically:

Let's say I'm analysing data from multiple sources, some climate models, some reanalysis data sets. Some of these data sets have longitudes defined (-180, 180), some (0, 360). My data analysis requires me to extract the sub-region 80W-40E. I need to be able to write one program that handles this for all the data sets, without me having to code in a special cases for the data sets that are not defined  around the particular range I am interested in. In this scenario, I shouldn't have to care about how the longitudes are defined in each case, all I need to know is that they are circular, and I should be good to go.

I appreciate that this feature is seen as a fancy extra by some, but I want to convince you that it really is a fundamental and necessary thing for iris to be able to do, and something it should do behind the scenes, no extra function call or overhead placed on the user. The specific range of a circular dimension should be treated as arbitrary as part of the concept of circular dimensions, so single sided constraints don't really mean anything for circular dimensions anyway do they?

bblay

unread,
Mar 4, 2013, 8:49:33 AM3/4/13
to scitools...@googlegroups.com
My thinking feels as circular as a coordinate.
Feel free to skip to the last sentence.

I came to believe believe that ">170" should not automatically wrap.
User code should be reasonably robust, and should be able to handle reasonable data after deployment.
Processing an unexpectedly circular field should not result in a nasty surprise,
which is what would happen if we automatically wrap ">170".

Nobody needs a "one-sided" wrap, like (-30 < x).
That question was a side-effect of the pull request.
The only driver for wrapping is ranged, like (-30 < x < 30).
I doubt we can happily get lambdas like (-30 < x) to never wrap
whilst getting lambdas like (-30 < x < 30) to always wrap,
so we need to provide a range constraint that always wraps, and don't wrap lambdas.

The circularity comes in here:
If we don't care about a one-sided wrap, but we allow a one-sided extract,
then people that use the one-sided extract might expect a wrap...

Here's the only useful part of this post:
I think the only happy solution would be for the constraints system to be completely removed,
in favour of user code, where we can call a bbox extract function that always wraps. Raised here.

RHattersley

unread,
Feb 4, 2014, 7:26:16 AM2/4/14
to scitools...@googlegroups.com
Right then ... it's time to resurrect this issue and see if we can get a solution in place.

My focus is going to be on how to support queries such as "I want the longitude range -30 to 30" even when the source data is defined over the range 0  to 360, as that's the issue that people keep raising.

The user API suggestions I'm aware of are:

Constraint(longitude=WrapThingy(-30, 30, rh_inclusive=False))

WrapConstraint('longitude', -30, 30, rh_inclusive=False)

cube
= wrapped_extract(cube, 'longitude', -30, 30, rh_inclusive=False)  # or just `extract(cube, ...)`

# Not strictly equivalent as it requires knowledge of the dimension order.
cube = cube.loc[:, -30:30]

Are there any others that I've missed?


I'm currently tempted by the utility function. This might give user code such as:

def select_my_data(cubes, extent):
    lon_min
, lon_max, lat_min, lat_max = extent
    cube
= cube.extract(Constraint('latitude': lambda lat_min <= cell <= lat_max))
    cube
= wrapped_extract(cube, 'longitude', lon_min, lon_max)

# or (eventually?) just ...
def select_my_data(cubes, extent):
    lon_min
, lon_max, lat_min, lat_max = extent
    cube
= extract(cube, 'latitude', lat_min, lat_max))

    cube = extract(cube, 'longitude', lon_min, lon_max)

Thoughts welcome!

Andrew Dawson

unread,
Feb 4, 2014, 8:16:08 AM2/4/14
to scitools...@googlegroups.com
My primary thought is that we should try and make wrapped extractions as "unspecial" as possible. I don't think it should be regarded as a special case that requires us to use a different API (i.e. special function). After all it is not a special case, although a lot of people seem to think it is... As we normally extract from cubes based on metadata (coordinates), so we should for circular coordinates since the information that tells us we can wrap is in the metadata.
 
I appreciate it is hard to see this seamless operation when you are the one who has to implement it! But I do think it is where we should be aiming.

RHattersley

unread,
Feb 4, 2014, 8:31:42 AM2/4/14
to scitools...@googlegroups.com
+1 for "unspecial" and "seamless"

But as things stand I'm not comfortable with trying to achieve wrapping whilst still specifying ranges using black-box functions - e.g. "lambda cell: a <= cell <= b". The intent behind a function such as this is obvious to a human because we can see the whole expression at once, but during execution we only have access to one comparison at a time with no real way to know if it's supposed to be half of a range.

In the context of various other discussions I'm wondering if we should be aiming to have a collection of utility functions/methods to help perform the various constraint/callback duties instead of ever more complex functional forms. In other words, an "extract" function would become "unspecial" because it'd be used everywhere.

Andrew Dawson

unread,
Feb 4, 2014, 8:36:56 AM2/4/14
to scitools...@googlegroups.com
In the context of various other discussions I'm wondering if we should be aiming to have a collection of utility functions/methods to help perform the various constraint/callback duties instead of ever more complex functional forms. In other words, an "extract" function would become "unspecial" because it'd be used everywhere.
 
Yep. I think it should be seamless but that does not necessarily mean we are wedded to the current style. If we can develop something new that covers all cases I would be extremely pleased. 

Matthew Mizielinski

unread,
Feb 4, 2014, 8:40:38 AM2/4/14
to scitools...@googlegroups.com

I'd second that, my preference would be for

cube
= cube.extract(Constraint(longitude=lambda l: -30 < l < 30))

to do the required work. However, I can imagine that this will introduce a set of magic numbers; iris will need to know that the full range of longitudes covers 360 degrees. I can imagine other circumstances where similar knowledge could be necessary/useful. For example if I've computed an annual cycle of precipitation I might want to be able to set the "month" or "day of year" coordinate to be circular and then smooth using a rolling window or Lanczos filter, in which case I'd need to have something associated with the coordinate to say what its maximum value is (easy for "month", but tricky for "day of year" in climate science, where this depends on the calendar used).




bblay

unread,
Feb 4, 2014, 8:52:38 AM2/4/14
to scitools...@googlegroups.com
an "extract" function would become "unspecial" because it'd be used everywhere.

If we can develop something new that covers all cases I would be extremely pleased.

That sounds best.
Is it too much to ask for the "eventually" scenario without precursors?
I think it's important to disallow lambdas in wrapped extraction.
Would the new style need to allow lambdas?

Message has been deleted

bblay

unread,
Feb 4, 2014, 10:03:57 AM2/4/14
to scitools...@googlegroups.com
iris will need to know that the full range of longitudes covers 360 degrees

The modulus can be set in the units, which I think should help in most situations.
E.g Iris uses that in nearest_neighbour_index.

RHattersley

unread,
Feb 5, 2014, 8:53:06 AM2/5/14
to scitools...@googlegroups.com
Next question ... what to do when extracting using a bounded dimension coordinate where the bounds don't align with the wrap value?

For example, consider a longitude dimension coordinate with just four cells: 0 to 90, 90 to 180, 180 to 270, and 270 to 360 (assume the point value is set to the mid-point). When requesting the range -60 to 100, what would one expect? What about the range -60 to 300?

Requesting -60 to 100 might produce:
 * 0 to 90 only (i.e. only cells entirely within the requested range)
 * -60 to 0, 0 to 90, and 90 to 100, where the first and last cells have had their extents modified.
 * -60 to 0 and 0 to 90, where only the first cell has had its extents modified
 * an exeception!

For reference, if the cells only had point values (i.e. 45, 135, 225, 315), then the result would be -45, 45.

To be honest, I don't really like any of those options!
 * Option one means the number of cells resulting from an extraction can vary depending on whether the bounds are defined.
 * Option two has no clear solution for how to update other coordinates which share the same dimension, (e.g. An AuxCoord containing cell areas.) and it still doesn't return the same number of cells as for the point-only case.
 * Option three has the problem with updating other coordinates on the dimension.
 * Option four is decidely unfriendly!

Can someone offer a different perspective?

RHattersley

unread,
Feb 5, 2014, 9:39:22 AM2/5/14
to
On Wednesday, 5 February 2014 13:53:06 UTC, RHattersley wrote:
For example, consider a longitude dimension coordinate with just four cells: 0 to 90, 90 to 180, 180 to 270, and 270 to 360 (assume the point value is set to the mid-point). When requesting the range -60 to 100, what would one expect? What about the range -60 to 300?

Requesting -60 to 100 might produce:
 * 0 to 90 only (i.e. only cells entirely within the requested range)
 * -60 to 0, 0 to 90, and 90 to 100, where the first and last cells have had their extents modified.
 * -60 to 0 and 0 to 90, where only the first cell has had its extents modified
 * an exeception!

Another option would be to expand the returned range to include the selected cells. This would return: -90 to 0, 0 to 90, and 90 to 180. Note how the returned values are not all within the requested -60 to 100 range. NB. With this approach, requesting -60 to 300 would result in -90 to 0, 0 to 90, 90 to 180 and 180 to 270, which is also asymmetric (possibly by always using the lower bound, possibly determined by point-within-range). Alternatively it could result in an exception.

bblay

unread,
Feb 5, 2014, 10:42:17 AM2/5/14
to scitools...@googlegroups.com
What kind of fiendish mind could devise such a question?!

Option two gives a cube in the specified range so that seems right in some circumstances; when it's ok to lose it's sibling dim coords.
Options one and five give a cube that can keep it's sibling dim coords, which seem right in other situations.

I can imagine wanting each of these at some point. Must they be mutually exclusive?
We have three valid flavours, perhaps called; "extract_cut", "extract_inner" and "extract_outer".

Andrew Dawson

unread,
Feb 5, 2014, 3:28:03 PM2/5/14
to bblay, scitools...@googlegroups.com
We have three valid flavours, perhaps called; "extract_cut", "extract_inner" and "extract_outer".

This is how CDAT does it, when you take a subset you also get to specify how the bounds should be used. See this (really really old) presentation (slide 21)  http://www2-pcmdi.llnl.gov/cdat/tutorials/training/cdat_2004/05-data-ingestion.pdf or the cdms 5 (not quite so old but still old) manual (page 50) http://www2-pcmdi.llnl.gov/cdat/manuals/cdms5.pdf.


--
You received this message because you are subscribed to the Google Groups "Iris-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scitools-iris-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ian Edwards

unread,
Feb 5, 2014, 3:40:58 PM2/5/14
to scitools...@googlegroups.com

In the GIS world...
"extract_cut" is called "clip"
"extract_inner" is called "contains"
"extract_outer" is called "intersects"

Reply all
Reply to author
Forward
0 new messages