Hello all,
I wanted to let the community know about some recent development I have been doing as part of my work in imglib2-ops. imglib2-ops is a library for supporting mathematical operations on data. It is used by IJ2 (for instance to support the Process >> Math >> * plugins, the Image Calculator, noise addition, etc.) and also by KNIP. I am hoping to solicit feedback and ideas from other ImgLib2 developers regarding this work. In particular, my work may be of interest to the authors of the ROI and labeling packages (Lee) as well as Views (Tobias), and I would value any insight or ideas for improvement regarding the design.
(I'll discuss similar imglib classes later)
The PointSet interface arose out of the need for a function to calculate values from a set of data. For instance a median function works on a list of values. It doesn't need to be limited to values pulled from a square 2d neighborhood. In the function package there are functions that take different kinds of input. The two most common are long[] input and PointSet input.
There are currently a couple basic implementations of the PointSet interface:
HyperVolumePointSet is one of them and it represents an n-dim region of space. (I'll address why this is not an Interval below). This is the most basic PointSet and was a natural outgrowth of code that relied on calculating values from fixed regions (origin/span kind of stuff).
GeneralPointSet is another PointSet implementation and it represents a list of long[] points. They are arbitrarily specified by the api user. One can define all kinds of neighborhoods using this facility.
There are classes that stretch the power of PointSets:
PointSets can filter values. A ConditionalPointSet will filter any PointSet by a Condition. Note that all you need to know about Conditions is that they have an isTrue() method and that it takes a long[] as input. One can construct any kind of Condition. For instance one define one that tests that sin(point[0]) > 0.5. So one can take an existing PointSet and filter it to only give values back that match the given condition.
PointSets can also be combined algebraically. There are a few PointSet implementations that handle these operations (PointSetUnion, PointSetIntersection, PointSetDifference, and PointSetComplement).
All of this is in the process of being tied together with a simple interface. A TextSpecifiedPointSet allows one to define complex PointSets using text input. The text input language is based upon list comprehension syntax from Haskell. I'll show some examples to give the flavor of what you can do.
PointSet ps = new TextSpecifiedPointSet("x=[1..5]");
You can iterate a single dimension 1,2,3,4,5
PointSet ps = new TextSpecifiedPointSet("x=[1,12,53], y=[-1,3..12]");
x can take on three values. y takes on these values (-1,3,7,11). Think of y's spec as saying "start at -1 and by 4 go to 12. 4 comes from the size of the step taken from -1 to 3.
PointSet ps = new TextSpecifiedPointSet("x=[1..50], y=[1..50], (x-15)^2 + (y-20)^2 <= 10");
(x,y) are those points that lie in a circle of radius 10 centered at (15,20).
PointSet ps = new TextSpecifiedPointSet("x=[1..50], y=[1..50], ((x-15)^2 + (y-20)^2 <= 10) || ((x-20)^2 + (y-20)^2 <= 10)");
(x,y) are those points that lie in the circle of radius 10 centered at (15,20) or in the circle of radius 10 centered at (20,20).
These last examples show how one can conditionally restrict the points to iterate. If one adds more conditions they are also applied. Note that conditional restriction currently gets parsed but not enforced. The pieces are all present but need to be put together. With some (a weekend of?) work I can get this running. (Note also that the current implementation uses a quickly hatched parser as a proof of concept which is meant to be replaced)
You can test the text definition capability by defining your own regions using the Plugins >> Sandbox >> PointSet Test plugin in IJ2. Note that because the conditional restriction code is not yet hooked up TextSpecifiedPointSet currently only supports the specification of dimensions. So its not yet super sexy but will be soon.
How the current design came to be:
When doing design work efficiency has not been the primary goal. Rather I've been looking to make small pieces of code that could be combined in powerful ways. So iterating a complicated PointSet might take a while (in fact I haven't tested complicated sets much). But allowing one to iterate arbitrarily complex point sets is useful.
If a complicated PointSet turns out to have a slow iteration it is possible to turn the PointSet into a GeneralPointSet by explode()'ing it. This can make sense if you're going to iterate a complicated region more than once and it's size is not too large. Calling GeneralPointSet.explode(pointSet) can create a new PointSet which is really just a list of long[] points.
The design does not use some classes from imglib2 that initially appear appropriate. I'll outline why now.
In the case of HyperVolumePointSet (noted before) one might represent this as an InterableInterval. But in ops there are times where you define a neighborhood and then slide it across an input Img and compute values for it. For instance you might create a AverageFunction that you want to sample at every point of input for a 3x3 neighborhood. With the PointSet implementation, for efficency reasons, you can relocate a PointSet in space and iterate the new set of locations. At the time I looked at this I could not see how to take an IterableInterval and redefine it on the fly by moving the whole interval 3 spaces left for instance. And whether existing cursors would just continue to work on the new space.
This is also why PointSets have iterators rather than Cursors or IntervalIterators. RandomAccess is used in the function package where appropriate.
Another case where the intervals are inapplicable is apparent in the function package. I don't want to get tangential here. I've wanted to support the idea of functions composed of other functions each with their own set of points. I didn't see an easy way to do so with Intervals.
I do see where PointSets overlap some with RegionsOfInterest. Since regions of interest are based on double[] I did not look carefully at what was there. I was hoping to visit the double[] support after I had fleshed out the discrete case. I have to admit I have not been very aware of the RegionOfInterest package. What PointSet brings to the table here are things like conditional membership iteration, text specification of input data, more relocatable sets, etc.
The PointSet implementation is an initial exploratory effort and that is why at the moment it only supports discrete data. I have not given much thought yet into how to expand to double[] data points. That depends upon how this code is received and how it evolves.
I hope the PointSet ideas result in useful tools for api users. Thanks for taking the time to read this and please let me know if you see any areas of commonality between areas of the ImgLib2 codebase that could be streamlined or unified. Perhaps there are other places in ImgLib2 where the PointSet classes could be useful?
Barry