What made us give up on FITS

285 views
Skip to first unread message

Perry Greenfield

unread,
Jul 19, 2013, 3:23:21 PM7/19/13
to astro...@googlegroups.com
I'm providing links to messages I posted a while ago on the referenced astrodataformat google group detailing the issues that eventually made us throw up our hands in disgust and give up. These center on the problems of handling WCS information well within FITS. I'd argue that these problems exist primarily because of the restrictions that FITS has intrinsically.

https://groups.google.com/forum/#!search/astrodataformat/astrodataformat/GhrsBeXm1G4/Z25eY1UfBvwJ
https://groups.google.com/forum/#!search/astrodataformat/astrodataformat/XcEs_yw9F4E/kBQe2YZu-SMJ

Perry Greenfield

unread,
Jul 19, 2013, 3:30:29 PM7/19/13
to astro...@googlegroups.com
It's been pointed out that only members can see this. I'll post the messaged directly.

Perry Greenfield

unread,
Jul 19, 2013, 3:31:10 PM7/19/13
to astro...@googlegroups.com
Previously I promised to give some specific examples of problems to help explain why we find FITS limiting. The best ones we have concern WCS issues.

I base this example on real HST needs. There is nothing contrived about it. The real situation is actually more complex than this example.

We wish to represent a WCS model using several component distortions/projections. These components, in the order that they need to be applied to each of the two ACS 4096x2048 detectors are:

1) correction for discontinuities in CCD pixels. This is only applied in one dimension. Every block of 32 pixels has a corresponding pixel offset correction due to fabrication errors in masking the pixels on the CCD (WFC3 has offsets in both dimensions).

2) a general polynomial distortion correction for two-dimensional coordinates. The terms are time dependent.

3) a residual distortion grid represented by a 65x33 array to be interpolated to apply to a a 4K by 2K science array.

4) a tangent projection onto sky coordinates (e.g., RA, Dec) with appropriate translation and rotation to the target coordinate system.

To properly align ACS images requires distortion corrections that are highly accurate. The current model (based on the above scheme) is accurate to roughly 0.01 pixel. Even so, residual systematic errors on the order of 0.003 pixels are apparent in analysis results.

There is no FITS WCS that handles this case, as presented, in the existing WCS standards. About the closest one can come is to use a pixel mapping array (as specified in paper III). This has significant drawbacks however.

1) it greatly bloats the size of the data file since two arrays the size of the science data must be included in the data files.

2) it requires all the separately modeled and fit distortion components be merged into one entity. Updating one component requires regenerating the pixel maps from the original model components. This is particularly annoying since we have to do this for essentially every case since one component is time dependent. This means keeping the components around someplace to regenerate the net WCS. That wouldn't be necessary if most of them could be individually stored in the header and only the time varying component (just a few coefficients) updated.

3) it still requires doing things not quite even to the conventions. For example, the CD matrix has to be applied separately to the grid correction since ideally, the grid correction should be applied after adding the polynomial terms, but before the CD matrix is applied, but the SIP model doesn't allow for that.

The HLA would like to store multiple versions of the WCS in the same file. This approach then becomes completely untenable given the size of it (never mind other complexities).

There were other conventions available that we used to try to keep these components separate. We used the SIP convention for polynomial distortion (another non-standard). The residual distortion grids were stored in two separate extensions (using the Paper III -TAB extension, as was the pixel offset table (based on a Paper IV construct, also not part of the standard yet). Although we use FITS standards, proposed standards, and conventions entirely, it is pretty much the case that no other software will understand our WCS except our own. Some tools may understand parts of it (e.g., DS9 will use the SIP component I believe). The result is a hodge-podge of kludges, really.

Finally, as mentioned the HLA wishes to store several versions of the WCS with the file. Yes, we could store extra sets of extensions with the file. But this leads to complex extension management issues. What names do we choose for these extensions and how do we make their connections apparent. In principle, it isn't important. We just give them random characters, and then have WCS elements in the primary header point to these extensions. Different WCS elements with the header keywords pointing to extensions wcs0001, wcs0002, wcs0003, A different WCS would point to wcs0004, wcs0005, wcs0006, and so forth.

But someone looking at the contents of the file won't see the apparent connections between the extensions and how they are related without examining in detail the relevant headers. So we should use a naming convention that illustrates that, perhaps wcsA0001, wcsA0002, wcsA0003 for a WCS associated with the name A, wcsB0001, etc, for the ones associated with WCS , and so forth. At least that shows the associations of extensions. But it is simply our ad hoc convention. Hopefully users find it transparent enough. Note that more than one detector is included in the same file so there are double all these, and the issue of transparency of which WCS belongs to which detector as well. This is an excellent example of the whole extension naming and grouping problem; making these associations transparent, i.e., not requiring users to look at details of a header to understand the grouping leaves the issue to extension names, and since there are often multiple groupings desired that renders any naming scheme that shows that completely ad hoc since there is no accepted scheme for grouping by naming conventions (nor is there likely any workable scheme for doing so).

Depending on the specifics, swapping the default WCS may involve massive keyword renamings, particularly if the polynomial coefficients have to be renamed. This presumably means the associated WCS extensions have to be renamed. In the end, we ended up grouping all the alternate WCS elements into its own FITS file, and then stuffing that file into a nonstandard FITS extension just to make management of these WCSs easier.

Yes, it is all possible in FITS, but it has become incredibly awkward and clumsy, and certainly not transportable in the sense that it isn't obvious within the file what is supposed to be done with all these WCS elements without documentation on that or our software to go with it. (I'd guess that the difficulty of doing WCS handling in FITS has added at least an FTE year to our getting ACS WCS definitions sorted out) So much code now must be involved in keyword and extension management and bookkeeping. Most of this is simply unnecessary using different approaches.

A large part of the problem is that FITS headers do not allow for any transparent grouping that doesn't require lots of explicit linkages in keywords and their values. Any grouping of items in the header has to be done explicitly by constructing keyword conventions. This leads to large numbers of keywords that have to be explicitly defined and documented.

I'd like to make this more concrete by illustrating how an alternate approach would work, at least in outline. There are lots of ways of doing this so don't take this as a specific proposal. (And don't get hung up on syntax details or omissions of some information; I'm trying to keep this fairly concise for the purposes of illustration)

wcs_default: {
transform: [
input_axes_labels: [x, y],
output_axes_labels: [ra, dec]
{
type: regular_offset_transform, # this one only for one axis
input_axes_labels: [y]
output_axes_labels: [yp]
block_size: 32,
block_start: 1,
offsets: [0.12124, -1.1043, ... 0.4321]},
{
type: translation_transform,
input_axes_labels: [x,yp],
output_axes_labels: [u,v],
offsets: [2048, 1024] # CRPIX effectively
{
type: polynomial_transform,
input_axes_labels: [u,v],
output_axes_labels: [up, vp]
degree: [3, 3], # even this is probably unecessary since it can be inferred
polynomial_type: chebyshev,
coefficients: [[1.7874343, 7.323534, 1.343242e-3],
[-2.788873, 1.224545, 0.0 ],
[3.3089498, 0.0, 0.0 ]],},
{
type: grid_transform,
input_axes_labels: [x,yp]
output_axes_labels: [ucp, vcp]
dimensions: 2
domain: [4096, 2048], # range of array size to apply to
grid: [[[ 0.21, 0.33, ... -1.03],
...
[-0.12,-0.03, ... 0.56]],
[[ 1.23,-0.42, ... -0.83],
...
[-0.01, 0.24, ... 0.94]]]},
{ type: add_coordinates,
input_axes_label_sets: [[up,vp],[ucp,vcp]],
output_axes_label: [ut,vt]}
{
type: linear,
input_axis_labels: [ut, vt],
output_axis_labels: [ur, vr],
matrix: [[0.0124232, 0.04567],
[-0.009323, 0.096736]]} # cd matrix equiv
{
type: tangent_projection,
input_axis_labels: [ur, vr],
output_axis_labels: [skyxrel, skyyrel]}
{
type: translation_rotation_transform,
input_axis_labels: [skyxrel, skyyrel],
output_axis_labels: [ra, dec],
offsets: [10.3456, -57.9000], # degrees
position_angle: 25.4343} # position angle of skyyrel axis in ra,dec coordinates
]
coordsys: FK5
}

In this case, the connection between input and output coordinates of each transform step is done by explicitly giving each a name, and the names are used to associate transform coordinates to the relevant quantities. There are many other ways to handle this; this is just one example.

Note that to make common cases more concise, many of these can be combined into standard wcs models. E.g., the standard tangent projection with the usual PC matrix, crval, crpix could be made one transform bundle. This would be very useful for images with no distortions (e.g., the product of astrodrizzle).

But this scheme is very flexible (it does require standard definitions of transform models of course), one can connect arbitrary transforms together, add corrections, etc. Constructing multiple versions of WCS requires no new keyword names (the explicit grouping mechanism allows attributes to be reused as often as one wants since they are local to that item. Nothing prevents some of the components from referring to binary arrays (e.g., the grid correction), but that's a whole different topic that I don't wish to go into here.

Perry

Perry Greenfield

unread,
Jul 19, 2013, 3:32:01 PM7/19/13
to astro...@googlegroups.com

Another realistic example we face (particularly for JWST) is having to deal with IFU data. In this case we have to map 2-d arrays into a 3-d space, and deal with discontinous areas of the 2-d image. In other words, each slice of the sky has a generally well behaved mapping to the cube, but there is a discontinuity in transitioning between slices. No simple smooth model can cover all slices.

A pixel mapping of the 2-d array will obviously suffice in handling discontinuities, but for reasons previously mentioned, it's a cumbersome way of representing the real underlying model for the WCS. It's also memory/disk-space inefficient in that the underlying model(s) are generally far more compact in the information sense (by orders of magnitude). Use of this sort of solution complicates iterative inversion techniques due to the discontinuities. As far as I can tell, there is no simple way of handling this in the current FITS standards, even including the unaccepted distortion paper.

And trying to include it in the FITS WCS framework would require some currently non-standard extensions.

Looking at how this could be framed in an alternative approach it would be useful to have a domain mapping mechanism that defines regions (of n-dimensions depending on the dimensionality of the original data) and associates different WCS models for each region. For JWST IFUs, each slice corresponds to a region in the original image, and that region has a smooth WCS useful only for that region. Ideally one can have a standard definition of the regions in two ways. Most abstract and compact would be a geometric definition (e.g., polygon, since the regions are rarely pure rectangles, and often cannot even be divided by rectangles since they overlap in at least one axis). A more efficient mechanism would be a region mapping array where corresponding to each pixel is a region number, and the number is used to index into a list of WCS models. Both representations could be used, presumably the former as the original definition, and the latter computed from it. Once computed such an array would compress greatly due to the integer few deltas in the array.

To extend the example given in my previous message one could have a WCS instance that was of a type that indicated it acted as a switch to other WCS instances, with the switch driven by the input coordinates.

{
transform: indirect,
domain_map: [
{
type: region,
form: polygon,
vertices: [(1,1), (2048,1), (2048,50), (1,50)] # simple rectangle},
...]
wcs_list: [
<list of wcs definitions, matching region list in size>
]
}

This could be done also in a few different ways.

The corresponding array mapping would have an array for the domain map (again, I don't want to get into how binary arrays could be stored, but presumably that is how it would be)

The inverse transform could use very much the same kind of mechanism, at least for the purposes of identifying which WCS is associated (forward if no inverse provided, for the purposes of iterative inverse computation).

To do this in FITS coming up with a way of defining regions (presumably limited to the region mapping array) and some means of mapping regions to wcs models. But here we run out of letters since there are going to be cases where there are more than 27 slices, so going to tables to represent polynomial terms is going to be needed (and again, it is very likely that other kinds of components will be needed). So it becomes pretty cumbersome here as well. And also requires new mechanisms not in the current standard.

Perry

James Turner

unread,
Jul 19, 2013, 7:36:00 PM7/19/13
to astro...@googlegroups.com
I'm just quickly checking emails from vacation before bed and
can't comment on all Perry's detailed points but FWIW I agree that
it's really WCS that will be the deal breaker in sticking to FITS.
I haven't considered this quite as extensively as STScI, but after
a couple of long days studying how to represent the co-ordinate
transformations for our new spectroscopic processing in FITS, it
was pretty clear that it can't be done (other than with via lookup
table, I suppose, with the associated limitations).

I'm also interested in IFUs, of course, but beyond that astronomers
commonly complain about unnecessary resampling and it's much better
to track and chain together individual co-ordinate transformations
rather than having to apply each one to the data in turn. That
includes being able to invert the transformation (actually it's
really the forward one) so that one can transform calibrations to
the science data rather than vice versa.

Of course this could be represented in FITS somehow, but not using
the FITS WCS standard. I think this is already recognized by others
here who are interested in working on a Python WCS object but
there's not much harm in emphasizing it.

Cheers,

James.
--
James E.H. Turner
Gemini Observatory Southern Operations Centre,
Casilla 603, Tel. (+56) 51 205609
La Serena, Chile. Fax. (+56) 51 205650

Tim Jenness

unread,
Jul 19, 2013, 7:51:50 PM7/19/13
to astro...@googlegroups.com, James Turner
On Fri, Jul 19, 2013 at 4:36 PM, James Turner <jtu...@gemini.edu> wrote:
I'm just quickly checking emails from vacation before bed and
can't comment on all Perry's detailed points but FWIW I agree that
it's really WCS that will be the deal breaker in sticking to FITS.
I haven't considered this quite as extensively as STScI, but after
a couple of long days studying how to represent the co-ordinate
transformations for our new spectroscopic processing in FITS, it
was pretty clear that it can't be done (other than with via lookup
table, I suppose, with the associated limitations).


The AST library can do it and does it quite often. Obviously you abandon the FITS-WCS model and you come up with a way of serializing your WCS object model so that it can fit in a FITS header. This is what the "Native" channel does in AST and that can round trip. DS9 can read these headers already because it uses AST for WCS handling.

 
I'm also interested in IFUs, of course, but beyond that astronomers
commonly complain about unnecessary resampling and it's much better
to track and chain together individual co-ordinate transformations
rather than having to apply each one to the data in turn. That
includes being able to invert the transformation (actually it's
really the forward one) so that one can transform calibrations to
the science data rather than vice versa.


Yes. AST can handle IFUs as well with a switchMap. Paul Hirst knows a bit about that as UKIRT had an IFU years ago.

 
Of course this could be represented in FITS somehow, but not using
the FITS WCS standard. I think this is already recognized by others
here who are interested in working on a Python WCS object but
there's not much harm in emphasizing it.

Yes, you have to abandon the FITS WCS keywords and model but it's obviously possible for people to simply collectively come up with an alternative way of specifying WCS in a FITS header if they share the tools. 

-- 
Tim Jenness

David Berry

unread,
Jul 22, 2013, 3:33:09 AM7/22/13
to astro...@googlegroups.com, James Turner
I second Tim's point - just because FITS-WCS has big limitations is
not a reason to abandon FITS. We've been using almost exactly the WCS
model being discussed here, based on chaining a wide variety of atomic
mappings, for pushing 17 years and find no serious difficulty in
representing it as a set of non-standard FITS headers.

BTW - from our experience the *big* problem with chaining atomic
mappings, is that it is very difficult to avoid them blowing up
exponentially in size as general purpose software concatenates
mappings together. You need to recognise when complex compound
Mappings can be simplified - e.g. by cancelling out mutually inverse
pairs of transformations, or combining adjacent mappings together into
one of a different class. This is easily the most fragile aspect of
this approach to WCS.

David
> --
> You received this message because you are subscribed to the Google Groups
> "astropy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to astropy-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

David Berry

unread,
Jul 22, 2013, 3:37:00 AM7/22/13
to astro...@googlegroups.com

Perry Greenfield

unread,
Jul 22, 2013, 8:50:34 AM7/22/13
to astro...@googlegroups.com

On Jul 22, 2013, at 3:33 AM, David Berry wrote:

> I second Tim's point - just because FITS-WCS has big limitations is
> not a reason to abandon FITS. We've been using almost exactly the WCS
> model being discussed here, based on chaining a wide variety of atomic
> mappings, for pushing 17 years and find no serious difficulty in
> representing it as a set of non-standard FITS headers.
>
I'm not quite so persuaded. How do switch maps get represented in FITS headers for example?

I haven't found the details of the serialization except for an example in one of the documents, but it does seem susceptible to card ordering issues, and namespace overlaps. It doesn't seem to provide the ability to share a wcs other than copying in headers. WCS isn't the only problem we've had with FITS files, just the most severe. I'm sure there is a way to put anything in a FITS files, but that doesn't mean that it is generically understood or adopted. And it also means a fair amount of code dealing with that mapping (e.g., you still have to deal with keyword size limitations, and such), and there are WCS models that will require a substantial amount of header space leading to a lot of bloat.

> BTW - from our experience the *big* problem with chaining atomic
> mappings, is that it is very difficult to avoid them blowing up
> exponentially in size as general purpose software concatenates
> mappings together. You need to recognise when complex compound
> Mappings can be simplified - e.g. by cancelling out mutually inverse
> pairs of transformations, or combining adjacent mappings together into
> one of a different class. This is easily the most fragile aspect of
> this approach to WCS.

I'm surprised at this. Exponentially? Really? I don't envision many operations that automatically concatenate transformations (particularly unknowingly). It sounds like I'm misunderstanding how these are being used. Can you give an example of how this happens in a little more detail?

Thanks, Perry

David Berry

unread,
Jul 22, 2013, 10:04:05 AM7/22/13
to astro...@googlegroups.com
On 22 July 2013 13:50, Perry Greenfield <stsci...@gmail.com> wrote:
>
> On Jul 22, 2013, at 3:33 AM, David Berry wrote:
>
>> I second Tim's point - just because FITS-WCS has big limitations is
>> not a reason to abandon FITS. We've been using almost exactly the WCS
>> model being discussed here, based on chaining a wide variety of atomic
>> mappings, for pushing 17 years and find no serious difficulty in
>> representing it as a set of non-standard FITS headers.
>>
> I'm not quite so persuaded. How do switch maps get represented in FITS headers for example?
>
> I haven't found the details of the serialization except for an example in one of the documents, but it does seem susceptible to card ordering issues, and namespace overlaps. It doesn't seem to provide the ability to share a wcs other than copying in headers. WCS isn't the only problem we've had with FITS files, just the most severe. I'm sure there is a way to put anything in a FITS files, but that doesn't mean that it is generically understood or adopted. And it also means a fair amount of code dealing with that mapping (e.g., you still have to deal with keyword size limitations, and such), and there are WCS models that will require a substantial amount of header space leading to a lot of bloat.

Some of what you say is certainly true in principle. But in practice
it doesn't really limit what we can do. For instance, whilst in
principle the ordering issue is a killer, in practice all software
I've come across follows the recommendation in the fits standard "It
is recommended that the order of the keywords
in FITS files be preserved during data processing operations". Given
that ordering is not a problem in practice, the namespace issue can be
solved too by the use of starting and ending marker cards.

Don't get me wrong - I'm not saying that a better system could not
easily be devised for storing serialised WCS info, all I'm saying is
that the limitations of FITS headers is not a show-stopper.


>> BTW - from our experience the *big* problem with chaining atomic
>> mappings, is that it is very difficult to avoid them blowing up
>> exponentially in size as general purpose software concatenates
>> mappings together. You need to recognise when complex compound
>> Mappings can be simplified - e.g. by cancelling out mutually inverse
>> pairs of transformations, or combining adjacent mappings together into
>> one of a different class. This is easily the most fragile aspect of
>> this approach to WCS.
>
> I'm surprised at this. Exponentially? Really? I don't envision many operations that automatically concatenate transformations (particularly unknowingly). It sounds like I'm misunderstanding how these are being used. Can you give an example of how this happens in a little more detail?

When you do any sort of geometric transformation to an array, you add
one or more extra mappings. For instance, a simple "rotate" command
would

1) Create a mapping (a MatrixMap in AST) that describes the rotation
between the old and new pixel axes.
2) Concatenate this mapping with the original pixel->WCS mapping read
from the input array, to get the pixel->WCS mapping for the output
array.
3) Generate the output resampled pixel values and store the new WCS with them.

So the output WCS would contain more mappings than the input WCS.
True, this is maybe linear rather than exponential growth, but there
are less common but still possible cases that *are* exponential. For
instance say you want to align pixel array A with pixel array B to
create array C - you would take the pixel->WCS mapping from A and
concatenate it with the WCS->pixel from B to get the pixel A->pixel B
mapping. This Mapping will be typically twice the size of either
input mapping (i.e. will contain twice the number of atomic mappings).
Having some means of simplifying a complex mapping (a la astSimplify)
has two benefits: 1) the transformation can be done faster and more
accurately as fewer numerical steps are involved, and 2) the mapping
occupies less space when serialised.

Whilst you may often just throw away the concatenated mapping in the
above example once the output were calculated just retaining the WCS
from array B in array C, there are certainly cases where you may want
to retain it. Say for instance, you wanted to record the the pixel
coordinates of array A as an extra coordinate frame in array C.

This is just one example. If you have a scheme that allows the
construction of arbitrarily complex mappings, people will find all
sorts of inventive ways of using it to create arbitrarily complex
mappings, which can get out of hand unless you have some way of
simplifying them.

David

Perry Greenfield

unread,
Jul 22, 2013, 10:10:28 AM7/22/13
to astro...@googlegroups.com

On Jul 22, 2013, at 10:04 AM, David Berry wrote:

> On 22 July 2013 13:50, Perry Greenfield <stsci...@gmail.com> wrote:
>>
>> On Jul 22, 2013, at 3:33 AM, David Berry wrote:
>>
>>> I second Tim's point - just because FITS-WCS has big limitations is
>>> not a reason to abandon FITS. We've been using almost exactly the WCS
>>> model being discussed here, based on chaining a wide variety of atomic
>>> mappings, for pushing 17 years and find no serious difficulty in
>>> representing it as a set of non-standard FITS headers.
>>>
>> I'm not quite so persuaded. How do switch maps get represented in FITS headers for example?
>>
>> I haven't found the details of the serialization except for an example in one of the documents, but it does seem susceptible to card ordering issues, and namespace overlaps. It doesn't seem to provide the ability to share a wcs other than copying in headers. WCS isn't the only problem we've had with FITS files, just the most severe. I'm sure there is a way to put anything in a FITS files, but that doesn't mean that it is generically understood or adopted. And it also means a fair amount of code dealing with that mapping (e.g., you still have to deal with keyword size limitations, and such), and there are WCS models that will require a substantial amount of header space leading to a lot of bloat.
>
> Some of what you say is certainly true in principle. But in practice
> it doesn't really limit what we can do. For instance, whilst in
> principle the ordering issue is a killer, in practice all software
> I've come across follows the recommendation in the fits standard "It
> is recommended that the order of the keywords
> in FITS files be preserved during data processing operations". Given
> that ordering is not a problem in practice, the namespace issue can be
> solved too by the use of starting and ending marker cards.
>
> Don't get me wrong - I'm not saying that a better system could not
> easily be devised for storing serialised WCS info, all I'm saying is
> that the limitations of FITS headers is not a show-stopper.
>
Sure. But what about switch maps? How are they represented in headers?
But if you are resampling, why not generate a new simple WCS model rather than keep a complex concatenation of models?

Cheers, Perry


Tim Jenness

unread,
Jul 22, 2013, 11:28:52 AM7/22/13
to astro...@googlegroups.com
On Mon, Jul 22, 2013 at 7:10 AM, Perry Greenfield <stsci...@gmail.com> wrote:

On Jul 22, 2013, at 10:04 AM, David Berry wrote:

> On 22 July 2013 13:50, Perry Greenfield <stsci...@gmail.com> wrote:
>>
>> On Jul 22, 2013, at 3:33 AM, David Berry wrote:
>>
>>> I second Tim's point - just because FITS-WCS has big limitations is
>>> not a reason to abandon FITS. We've been using almost exactly the WCS
>>> model being discussed here, based on chaining a wide variety of atomic
>>> mappings, for pushing 17 years and find no serious difficulty in
>>> representing it as a set of non-standard FITS headers.
>>>
>> I'm not quite so persuaded. How do switch maps get represented in FITS headers for example?

The WCS being stored in the FITS header is a serialization of the WCS object itself. To use it you read the header and recreate the WCS object. Then if you started with switch maps you still have switch maps. You don't get to look at individual FITS headers to extract a subset of the information.

 
>>
>> I haven't found the details of the serialization except for an example in one of the documents, but it does seem susceptible to card ordering issues, and namespace overlaps. It doesn't seem to provide the ability to share a wcs other than copying in headers. WCS isn't the only problem we've had with FITS files, just the most severe. I'm sure there is a way to put anything in a FITS files, but that doesn't mean that it is generically understood or adopted. And it also means a fair amount of code dealing with that mapping (e.g., you still have to deal with keyword size limitations, and such), and there are WCS models that will require a substantial amount of header space leading to a lot of bloat.
>

I think your new format still has to support conversion to FITS. Starlink had the same problem since they had the NDF format and they had the WCS object. In NDF-land the WCS object is stored in native serialization form as a string array. AST has Channel classes for import and export of the WCS objects. The FitsChan class handles the craziness that is the historical set of WCS as stored in FITS files for import. For export it attempts to convert the WCS object to a "standard" form (FITS-WCS, AIPS, CLASS) but it still has to deal with a WCS object that can not fit into the classical model. What happens then is that there is a "native" serialization which uses bizarre truncated 8 character FITS headers specifically chosen not to clash with other WCS headers. Most software will ignore it but GAIA and DS9 do understand the format and so can generate the WCS object properly. Obviously this also round-trips back into NDF format.

The native serialization of an AST object looks something like:

  Begin FrameSet
#    Title = "3-d compound coordinate system"   # Title of coordinate system
#    Naxes = 3  # Number of coordinate axes
#    Domain = "SKY-SPECTRUM"    # Coordinate system domain
#    Epoch = 2009.95136411369   # Julian epoch of observation
#    Lbl1 = "Right ascension offset"    # Label for axis 1
#    Lbl2 = "Declination offset"        # Label for axis 2
#    Lbl3 = "Wavelength"        # Label for axis 3
#    System = "Compound"        # Coordinate system type
#    Uni1 = "hh:mm:ss.sss"      # Units for axis 1
#    Uni2 = "ddd:mm:ss.ss"      # Units for axis 2
#    Uni3 = "m" # Units for axis 3 (metre)
#    Dig1 = 9   # Individual precision for axis 1
#    Dig2 = 9   # Individual precision for axis 2
#    Digits = 7 # Default formatting precision
#    Fmt1 = "hms.3"     # Format specifier for axis 1
#    Fmt2 = "dms.2"     # Format specifier for axis 2
#    Dir1 = 0   # Plot axis 1 in reverse direction
  IsA Frame     # Coordinate system description
     Nframe = 5 # Number of Frames in FrameSet
#    Base = 1   # Index of base Frame
     Currnt = 5 # Index of current Frame
     Nnode = 6  # Number of nodes in FrameSet
     Nod1 = 3   # Frame 1 is associated with node 3
     Nod2 = 4   # Frame 2 is associated with node 4
     Nod3 = 5   # Frame 3 is associated with node 5
     Nod4 = 6   # Frame 4 is associated with node 6
     Nod5 = 2   # Frame 5 is associated with node 2
     Lnk2 = 1   # Node 2 is derived from node 1
     Lnk3 = 1   # Node 3 is derived from node 1
     Lnk4 = 1   # Node 4 is derived from node 1
     Lnk5 = 1   # Node 5 is derived from node 1
     Lnk6 = 1   # Node 6 is derived from node 1
  Frm1 =        # Frame number 1
     Begin Frame
        Title = "Data grid indices; first pixel at (1,1,1)"     # Title of coordinate system
        Naxes = 3       # Number of coordinate axes
        Domain = "GRID" # Coordinate system domain
#       Lbl1 = "Data grid index 1"      # Label for axis 1
#       Lbl2 = "Data grid index 2"      # Label for axis 2
#       Lbl3 = "Data grid index 3"      # Label for axis 3
#       Uni1 = "pixel"  # Units for axis 1
#       Uni2 = "pixel"  # Units for axis 2
#       Uni3 = "pixel"  # Units for axis 3
     Ax1 =      # Axis number 1
        Begin Axis
           Label = "Data grid index 1"  # Axis Label
           Symbol = "g1"        # Axis symbol
           Unit = "pixel"       # Axis units
           Format = "%3.1f"     # Format specifier
...

You could conceive of having a version of this in YAML.

The FITS header equivalent would be (modulo the lack of fixed-width font):

COMMENT AST ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ AST
COMMENT AST                  WCS information in AST format                   AST
COMMENT AST                See http://www.starlink.ac.uk/ast/                AST
COMMENT AST            Beginning of AST data for FrameSet object             AST
COMMENT AST ................................................................ AST
BEGAST_A= 'FrameSet'           / Set of inter-related coordinate systems
NFRAME_A=                    2 / Number of Frames in FrameSet
CURRNT_A=                    2 / Index of current Frame
NOD1_A  =                    2 / Frame 1 is associated with node 2
NOD2_A  =                    1 / Frame 2 is associated with node 1
LNK2_A  =                    1 / Node 2 is derived from node 1
FRM1_A  = '        '           / Frame number 1
BEGAST_B= 'Frame   '           / Coordinate system description
TITLE_A = 'Data grid indices; first pixel at (1&' / Title of coordinate system
CONTINUE  ',1,1)   '
NAXES_A =                    3 / Number of coordinate axes
DOMAIN_A= 'GRID    '           / Coordinate system domain
AX1_A   = '        '           / Axis number 1
BEGAST_C= 'Axis    '           / Coordinate axis
LABEL_A = 'Data grid index 1'  / Axis Label
SYMBOL_A= 'g1      '           / Axis symbol
UNIT_A  = 'pixel   '           / Axis units
FORMAT_A= '%3.1f   '           / Format specifier
ENDAST_A= 'Axis    '           / End of object definition
AX2_A   = '        '           / Axis number 2
BEGAST_D= 'Axis    '           / Coordinate axis
LABEL_B = 'Data grid index 2'  / Axis Label
SYMBOL_B= 'g2      '           / Axis symbol
UNIT_B  = 'pixel   '           / Axis units
FORMAT_B= '%3.1f   '           / Format specifier
ENDAST_B= 'Axis    '           / End of object definition
AX3_A   = '        '           / Axis number 3
BEGAST_E= 'Axis    '           / Coordinate axis
LABEL_C = 'Data grid index 3'  / Axis Label
SYMBOL_C= 'g3      '           / Axis symbol
UNIT_C  = 'pixel   '           / Axis units
FORMAT_C= '%3.1f   '           / Format specifier
ENDAST_C= 'Axis    '           / End of object definition
ENDAST_D= 'Frame   '           / End of object definition
FRM2_A  = '        '           / Frame number 2
BEGAST_F= 'CmpFrame'           / Compound coordinate system description
FRAMEA_A= '        '           / First component Frame
BEGAST_G= 'SkyFrame'           / Description of celestial coordinate system


 
> Some of what you say is certainly true in principle. But in practice
> it doesn't really limit what we can do. For instance, whilst in
> principle the ordering issue is a killer, in practice all software
> I've come across follows the recommendation in the fits standard "It
> is recommended that the order of the keywords
> in FITS files be preserved during data processing operations". Given
> that ordering is not a problem in practice, the namespace issue can be
> solved too by the use of starting and ending marker cards.
>
> Don't get me wrong - I'm not saying that a better system could not
> easily be devised for storing serialised WCS info, all I'm saying is
> that the limitations of FITS headers is not a show-stopper.
>
Sure. But what about switch maps? How are they represented in headers?


See above. Exactly like any other mapping.
The output image would just have the WCS of the reference image. When you are doing the resampling you would like it to be an efficient mapping and the simplification step can have quite a large efficiency saving. Minimizing the intermediate transformations helps a lot and reduces rounding errors.

People have been known to set up mappings that do things like +1 -1 or "rotate by 45, followed by rotate by -45" and having the simplification system can really help. Simplification is not simple code but fifteen years of using the stacked mapping scheme in the real world has indicated that it is an extremely important part of the system.

-- 
Tim Jenness 

David Berry

unread,
Jul 22, 2013, 11:37:13 AM7/22/13
to astro...@googlegroups.com
On 22 July 2013 15:10, Perry Greenfield <stsci...@gmail.com> wrote:
>
> On Jul 22, 2013, at 10:04 AM, David Berry wrote:
>
>> On 22 July 2013 13:50, Perry Greenfield <stsci...@gmail.com> wrote:
>>>
>>> On Jul 22, 2013, at 3:33 AM, David Berry wrote:
>>>
>>>> I second Tim's point - just because FITS-WCS has big limitations is
>>>> not a reason to abandon FITS. We've been using almost exactly the WCS
>>>> model being discussed here, based on chaining a wide variety of atomic
>>>> mappings, for pushing 17 years and find no serious difficulty in
>>>> representing it as a set of non-standard FITS headers.
>>>>
>>> I'm not quite so persuaded. How do switch maps get represented in FITS headers for example?
>>>
>>> I haven't found the details of the serialization except for an example in one of the documents, but it does seem susceptible to card ordering issues, and namespace overlaps. It doesn't seem to provide the ability to share a wcs other than copying in headers. WCS isn't the only problem we've had with FITS files, just the most severe. I'm sure there is a way to put anything in a FITS files, but that doesn't mean that it is generically understood or adopted. And it also means a fair amount of code dealing with that mapping (e.g., you still have to deal with keyword size limitations, and such), and there are WCS models that will require a substantial amount of header space leading to a lot of bloat.
>>
>> Some of what you say is certainly true in principle. But in practice
>> it doesn't really limit what we can do. For instance, whilst in
>> principle the ordering issue is a killer, in practice all software
>> I've come across follows the recommendation in the fits standard "It
>> is recommended that the order of the keywords
>> in FITS files be preserved during data processing operations". Given
>> that ordering is not a problem in practice, the namespace issue can be
>> solved too by the use of starting and ending marker cards.
>>
>> Don't get me wrong - I'm not saying that a better system could not
>> easily be devised for storing serialised WCS info, all I'm saying is
>> that the limitations of FITS headers is not a show-stopper.
>>
> Sure. But what about switch maps? How are they represented in headers?

Switchmaps are necessarily quite complex. Sadly I've not yet added
wrappers for the relevant classes to pyast, so I'll need to illustrate
it using C code. This is a simple case of a 200x100 grid that is
divided down the middle into two equal square areas. The switchmap is
a mapping that transforms a 2D position, using one of these two
Mappings, depending on where in the array the position is located -
the left hand square is mapped using a ShiftMap (a shift of origin)
and the right hand square is mapped using a MatrixMap (a rotation in
this case). For simplicity I have not defined an inverse mapping in
this example. The code is attached. The native AST dump is shown in
dump.txt and the FITS header equivalent is shown in fits.txt.
How would you do that? All you know is that the original image had
some completely arbitrary pixel->WCS mapping, and the new WCS has to
be the same except for an extra rotation in the pixel plane. Where
would you begin, given that you can make no assumptions about the
nature of the original WCS?

David
dump.txt
code.c
fits.txt

Perry Greenfield

unread,
Jul 22, 2013, 11:58:57 AM7/22/13
to astro...@googlegroups.com

On Jul 22, 2013, at 11:37 AM, David Berry wrote:

> On 22 July 2013 15:10, Perry Greenfield <stsci...@gmail.com> wrote:
>>>
>>>
>>>
>> Sure. But what about switch maps? How are they represented in headers?
>
> Switchmaps are necessarily quite complex. Sadly I've not yet added
> wrappers for the relevant classes to pyast, so I'll need to illustrate
> it using C code. This is a simple case of a 200x100 grid that is
> divided down the middle into two equal square areas. The switchmap is
> a mapping that transforms a 2D position, using one of these two
> Mappings, depending on where in the array the position is located -
> the left hand square is mapped using a ShiftMap (a shift of origin)
> and the right hand square is mapped using a MatrixMap (a rotation in
> this case). For simplicity I have not defined an inverse mapping in
> this example. The code is attached. The native AST dump is shown in
> dump.txt and the FITS header equivalent is shown in fits.txt.

This is a good illustrative example. But it's not very useful for our needs in this form. Our regions are not simple, and while they can be described by a polygon pretty well, it can't easily be used efficiently in that representation. One needs to turn the region into an array mask essentially, and that can be a moderately expensive operation if one has to do that every time an image is opened. Ideally, one would like to keep the mask directly as part of the data file and wcs definition, but it's not necessarily easily done in a header. It make more sense to store it as a compressed binary object.
>>
>> But if you are resampling, why not generate a new simple WCS model rather than keep a complex concatenation of models?
>
> How would you do that? All you know is that the original image had
> some completely arbitrary pixel->WCS mapping, and the new WCS has to
> be the same except for an extra rotation in the pixel plane. Where
> would you begin, given that you can make no assumptions about the
> nature of the original WCS?

I'm not sure I understand. If I'm rotating the original image, I'm assuming it does have a WCS model attached to it. It may be a complex one (e.g., including distortions and other effects). But if I actually want to resample the image, I'd usually prefer the output to use a simple projection (e.g., TAN), and in doing that rotation plus original WCS model concatenated, I'd end up with a clear definition of how to map the pixels in the input image to the output WCS, and enough to do the resampling. The only place that I've seen distortions lead to huge variations in sampling sizes over the array is for spectral data, where dispersion may vary quite a bit. But almost all those cases are either one dimensional, or nearly aligned in one dimension. These are not usually rotated in the spatial sense we are thinking about.

On the other hand, if I'm constructing a WCS to eventually define a future resampling, then I don't try to simplify the WCS model until the resampling is selected. Or perhaps the WCS is being constructed to do fitting and other analysis in the raw pixels and using the WCS to map the fit results to world coordinates. But usually this doesn't involve huge numbers of complex transformations. (e.g., will one keep adding rotations, translations and other operations in a long chain? And even if one did, presumably the WCS machinery could recognize such chains and apply matrix simplifications to consecutive simple operations.

Cheers, Perry

David Berry

unread,
Jul 22, 2013, 12:40:41 PM7/22/13
to astro...@googlegroups.com
On 22 July 2013 16:58, Perry Greenfield <stsci...@gmail.com> wrote:
>
> On Jul 22, 2013, at 11:37 AM, David Berry wrote:
>
>> On 22 July 2013 15:10, Perry Greenfield <stsci...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>> Sure. But what about switch maps? How are they represented in headers?
>>
>> Switchmaps are necessarily quite complex. Sadly I've not yet added
>> wrappers for the relevant classes to pyast, so I'll need to illustrate
>> it using C code. This is a simple case of a 200x100 grid that is
>> divided down the middle into two equal square areas. The switchmap is
>> a mapping that transforms a 2D position, using one of these two
>> Mappings, depending on where in the array the position is located -
>> the left hand square is mapped using a ShiftMap (a shift of origin)
>> and the right hand square is mapped using a MatrixMap (a rotation in
>> this case). For simplicity I have not defined an inverse mapping in
>> this example. The code is attached. The native AST dump is shown in
>> dump.txt and the FITS header equivalent is shown in fits.txt.
>
> This is a good illustrative example. But it's not very useful for our needs in this form. Our regions are not simple, and while they can be described by a polygon pretty well, it can't easily be used efficiently in that representation. One needs to turn the region into an array mask essentially, and that can be a moderately expensive operation if one has to do that every time an image is opened. Ideally, one would like to keep the mask directly as part of the data file and wcs definition, but it's not necessarily easily done in a header. It make more sense to store it as a compressed binary object.

Fair enough - if your regions cannot be expressed as geometric shapes
or a list of explicit positions then you would need to use a binary
mask which AST does not support. The same is true for the "-TAB"
algorithm defined by FITS-WCS paper 3, it requires a binary table to
store the look-up table and so cannot be completely represented by a
set of headers. AST requires call-backs to be supplied to access such
binary data.


>>>
>>> But if you are resampling, why not generate a new simple WCS model rather than keep a complex concatenation of models?
>>
>> How would you do that? All you know is that the original image had
>> some completely arbitrary pixel->WCS mapping, and the new WCS has to
>> be the same except for an extra rotation in the pixel plane. Where
>> would you begin, given that you can make no assumptions about the
>> nature of the original WCS?
>
> I'm not sure I understand. If I'm rotating the original image, I'm assuming it does have a WCS model attached to it. It may be a complex one (e.g., including distortions and other effects). But if I actually want to resample the image, I'd usually prefer the output to use a simple projection (e.g., TAN),

Yes, that's what you would want to do some of the time, but what about
when you want to write general purpose software that simply resamples
image A onto the grid of image B, no matter what the WCS of image B
may be? That is, you want the output to have the WCS of image B
without any changes. I suspect we may be at crossed purposes, so I'll
be explicit: you want to create an output array C in which each pixel
has the same WCS position as the corresponding pixel in image B but
has a pixel value derived by resampling image A.

> and in doing that rotation plus original WCS model concatenated, I'd end up with a clear definition of how to map the pixels in the input image to the output WCS, and enough to do the resampling. The only place that I've seen distortions lead to huge variations in sampling sizes over the array is for spectral data, where dispersion may vary quite a bit. But almost all those cases are either one dimensional, or nearly aligned in one dimension. These are not usually rotated in the spatial sense we are thinking about.
>
> On the other hand, if I'm constructing a WCS to eventually define a future resampling, then I don't try to simplify the WCS model until the resampling is selected. Or perhaps the WCS is being constructed to do fitting and other analysis in the raw pixels and using the WCS to map the fit results to world coordinates. But usually this doesn't involve huge numbers of complex transformations. (e.g., will one keep adding rotations, translations and other operations in a long chain? And even if one did, presumably the WCS machinery could recognize such chains and apply matrix simplifications to consecutive simple operations.

That's what I'm describing, except it can't be restricted to just
matrix simplifications because you could in principle end up with all
sorts of other potential simplifications.

David

Perry Greenfield

unread,
Jul 22, 2013, 1:35:01 PM7/22/13
to astro...@googlegroups.com

On Jul 22, 2013, at 12:40 PM, David Berry wrote:

> On 22 July 2013 16:58, Perry Greenfield <stsci...@gmail.com> wrote:
>>
>>>
>>
>> I'm not sure I understand. If I'm rotating the original image, I'm assuming it does have a WCS model attached to it. It may be a complex one (e.g., including distortions and other effects). But if I actually want to resample the image, I'd usually prefer the output to use a simple projection (e.g., TAN),
>
> Yes, that's what you would want to do some of the time, but what about
> when you want to write general purpose software that simply resamples
> image A onto the grid of image B, no matter what the WCS of image B
> may be? That is, you want the output to have the WCS of image B
> without any changes. I suspect we may be at crossed purposes, so I'll
> be explicit: you want to create an output array C in which each pixel
> has the same WCS position as the corresponding pixel in image B but
> has a pixel value derived by resampling image A.
>
That's a common case actually (e.g., does this pixel in this image agree with one in an another, as one example). But again, whatever is in A is erased when forced onto B's WCS. B's may be complex or not, but it doesn't add to its complexity. So I think we agree here.

>> and in doing that rotation plus original WCS model concatenated, I'd end up with a clear definition of how to map the pixels in the input image to the output WCS, and enough to do the resampling. The only place that I've seen distortions lead to huge variations in sampling sizes over the array is for spectral data, where dispersion may vary quite a bit. But almost all those cases are either one dimensional, or nearly aligned in one dimension. These are not usually rotated in the spatial sense we are thinking about.
>>
>> On the other hand, if I'm constructing a WCS to eventually define a future resampling, then I don't try to simplify the WCS model until the resampling is selected. Or perhaps the WCS is being constructed to do fitting and other analysis in the raw pixels and using the WCS to map the fit results to world coordinates. But usually this doesn't involve huge numbers of complex transformations. (e.g., will one keep adding rotations, translations and other operations in a long chain? And even if one did, presumably the WCS machinery could recognize such chains and apply matrix simplifications to consecutive simple operations.
>
> That's what I'm describing, except it can't be restricted to just
> matrix simplifications because you could in principle end up with all
> sorts of other potential simplifications.

Yes, it depends on how smart you want to make the machinery. Right now, I just don't want it to be stupid :-).

Cheers, Perry

David Berry

unread,
Jul 23, 2013, 4:39:44 AM7/23/13
to astro...@googlegroups.com
On 22 July 2013 18:35, Perry Greenfield <stsci...@gmail.com> wrote:
>
> On Jul 22, 2013, at 12:40 PM, David Berry wrote:
>
>> On 22 July 2013 16:58, Perry Greenfield <stsci...@gmail.com> wrote:
>>>
>>>>
>>>
>>> I'm not sure I understand. If I'm rotating the original image, I'm assuming it does have a WCS model attached to it. It may be a complex one (e.g., including distortions and other effects). But if I actually want to resample the image, I'd usually prefer the output to use a simple projection (e.g., TAN),
>>
>> Yes, that's what you would want to do some of the time, but what about
>> when you want to write general purpose software that simply resamples
>> image A onto the grid of image B, no matter what the WCS of image B
>> may be? That is, you want the output to have the WCS of image B
>> without any changes. I suspect we may be at crossed purposes, so I'll
>> be explicit: you want to create an output array C in which each pixel
>> has the same WCS position as the corresponding pixel in image B but
>> has a pixel value derived by resampling image A.
>>
> That's a common case actually (e.g., does this pixel in this image agree with one in an another, as one example). But again, whatever is in A is erased when forced onto B's WCS. B's may be complex or not, but it doesn't add to its complexity. So I think we agree here.

Yes you would probably throw away the concatenated mapping after using
it to evaluate the output position, but my point is that there is
always the option not to - for instance the user may choose to retain
knowledge of the pixel coords in A as an alternate coordinate system
in the output. A "toolkit" WCS system should allow the user to do
things like that - we can't predict everything the user may want to do
with WCS. If a WCS system provides generic tools that give people the
freedom to construct and use custom WCS systems in creative and novel
ways, such as concatenating arbitrary mappings, then people *will* use
these facilities in ways you have not thought of, and this often
results in complex mappings being concatenated to form doubly complex
mappings, when in fact they can be simplified to much simpler mappings
to give faster, smaller, and more accurate transformations.


>>> and in doing that rotation plus original WCS model concatenated, I'd end up with a clear definition of how to map the pixels in the input image to the output WCS, and enough to do the resampling. The only place that I've seen distortions lead to huge variations in sampling sizes over the array is for spectral data, where dispersion may vary quite a bit. But almost all those cases are either one dimensional, or nearly aligned in one dimension. These are not usually rotated in the spatial sense we are thinking about.
>>>
>>> On the other hand, if I'm constructing a WCS to eventually define a future resampling, then I don't try to simplify the WCS model until the resampling is selected. Or perhaps the WCS is being constructed to do fitting and other analysis in the raw pixels and using the WCS to map the fit results to world coordinates. But usually this doesn't involve huge numbers of complex transformations. (e.g., will one keep adding rotations, translations and other operations in a long chain? And even if one did, presumably the WCS machinery could recognize such chains and apply matrix simplifications to consecutive simple operations.
>>
>> That's what I'm describing, except it can't be restricted to just
>> matrix simplifications because you could in principle end up with all
>> sorts of other potential simplifications.
>
> Yes, it depends on how smart you want to make the machinery. Right now, I just don't want it to be stupid :-).
>
> Cheers, Perry
>
>

James Turner

unread,
Jul 23, 2013, 7:28:44 AM7/23/13
to astro...@googlegroups.com
> Yes you would probably throw away the concatenated mapping after using
> it to evaluate the output position, but my point is that there is
> always the option not to - for instance the user may choose to retain
> knowledge of the pixel coords in A as an alternate coordinate system
> in the output.

(I'm only half following this but ) I think it's always going to
be useful to be able to ask "where did this value come from in the
raw data" (or each raw datset, where co-added), eg. to understand
whether some feature could be a residual artifact. Also, there are
other applications for transforming backwards after resampling,
eg. applying an averaged, possibly linearized/normalized flat
field or sky spectrum to science data in the raw co-ordinates (cf.
Kelson, 2003). So in general I probably wouldn't throw away the
concatenation of models after resampling, but obviously the
relationships I'm talking about no longer actually represent the
resampled WCS itself.

Cheers,

James.

Perry Greenfield

unread,
Jul 23, 2013, 7:35:32 AM7/23/13
to astro...@googlegroups.com
I agree completely. But the ability to automatically simplify shouldn't be an obstacle to acceptance, particularly given the alternative in FITS (often complex, but inflexible). It's something to consider in future enhancements of course. And perhaps simplification isn't always what is wanted if they want to know exactly the chain of transformations that led to the WCS (which reveals something in itself).

Perry

David Berry

unread,
Jul 23, 2013, 7:39:36 AM7/23/13
to astro...@googlegroups.com
OK. Agreed. A possible future enhancement. I agree that simplification
should be optional. Using AST, application code needs to call the
astSimplfy function explicitly, it doesn't happen automatically.

David

Marten van Kerkwijk

unread,
Jul 23, 2013, 7:44:09 PM7/23/13
to astro...@googlegroups.com, d.b...@jach.hawaii.edu
Hi Perry,

Thanks very much for your two posts. They really help clarify the problems! And I do certainly see the arguments for having a header file that is readable and thus against hdf5 (the number of times I just viewed FITS headers with `less` are countless). Still, having somewhat fond memories of starlink and its data format (vaguely recall using FIGARO to reduce CGS4 data...): Was that format considered as an alternative to FITS? It does seem Dave and Tim considered many of the same problems before...

All best wishes,

Marten

Tim Jenness

unread,
Jul 23, 2013, 11:29:55 PM7/23/13
to astro...@googlegroups.com, David Berry
On Tue, Jul 23, 2013 at 4:44 PM, Marten van Kerkwijk <m.h.van...@gmail.com> wrote:
Hi Perry,

Thanks very much for your two posts. They really help clarify the problems! And I do certainly see the arguments for having a header file that is readable and thus against hdf5 (the number of times I just viewed FITS headers with `less` are countless). Still, having somewhat fond memories of starlink and its data format (vaguely recall using FIGARO to reduce CGS4 data...): Was that format considered as an alternative to FITS? It does seem Dave and Tim considered many of the same problems before...


I did mention Starlink NDF to Perry a couple of years ago so I imagine he's had a look at it. There are two parts to NDF. The first is the underlying file format which is the Hierarchical Data System (HDS) developed in the early 80s (in the UK) around the same time as HDF was being developed. That could be replaced pretty easily with another system that can support hierarchical data structures. The NDF (N-Dimensional Data Format) part is the data model layer that describes the convention on how the structures represent astronomical data. NDF was developed in the early 90s to standardize the Starlink software interface to files. It supports errors, masks, the concept of pixel coordinates (CRPIX) distinct from data array coordinates, and a few other things. I discussed it at the last Astropy coordination meeting in the context of NDData. There's no reason why some of the concepts couldn't work in the new YAML format.

Some of the philosophy can be found here: http://www.starlink.ac.uk/docs/sgp38.htx/sgp38.html

The NDF API can be found at http://www.starlink.ac.uk/docs/sun33.htx/sun33.html and there is a python interface.

-- 
Tim Jenness
 

David Berry

unread,
Jul 24, 2013, 3:32:27 AM7/24/13
to astro...@googlegroups.com
Tim raises an important point regardless of what format may be adopted
- the distinction between a data format and a data model. I've heard
countless discussions at ADASS over the years about whether we should
replace FITS with something more modern, but they nearly always focus
on how you store and organise bytes in a file, with almost nothing on
what those bytes mean astronomically. When Starlink wrote HDS in the
early 80's people started producing all sorts of different HDS files,
giving their own names to components and designing their own
structures to suit their own needs. Exchanging data was difficult
because no one else knew what any of it meant. A few common
conventions emerged but it was all very informal and fragile.

To solve this situation we came up with NDF, which is a definition of
what components should/can exist in a data file, what they are called
and what they represent. Importantly it was extensible, so there was
somewhere to put any extra non-standard items you really needed to
keep. We produced (in fact Rodney Warren-Smith, the original creator
of AST, produced) an NDF library that implemented an API for accessing
this data model, which used HDS to store the bytes. Later we added
facilities to allow the bytes to be stored in FITS format.

FITS is equivalent to HDS - it's a means of storing bytes in a file
without any idea of what those bytes mean. There are some common
conventions, but no real data model. If FITS is to be replaced, I
think it's important that the question of the data model is discussed
as well.

Obviously the IVOA have already put a huge amount of effort into this.

David

Perry Greenfield

unread,
Jul 24, 2013, 9:02:52 AM7/24/13
to astro...@googlegroups.com
I fully agree that these are two important distinctions. I think I alluded to the fact that a new format doesn't solve the data model aspect by itself. A more flexible format can make the data model problem easier to address though. I think the solutions that the VO came up with with regard to data models require real contortions to get them into FITS. (And without being able to handle binary elements, arguably votable wasn't all that useful for data models either, unless you relegated all binary elements to FITS files pointed to by the votable).

Our initial goal would be to do the first so that experimentation and work could be advanced on data models. Yes, a lot of effort has been provided by the VO on this, with somewhat mixed results. Some of them appear to be more practical than others. When I was looking at the characterization data model some years ago, I somehow found that things like

exposure time: timeAxis.coverage.support.extent

and

image position angle: spatialAxis.coverage.bounds.limits.charBox.size2.PosAngle

may not be viewed as the most convenient or intuitive items. But that may be just me. But there is a lot of good work that should be reused.

I do think it is good to keep the data format framework separate from data model conventions.

Perry

Tim Jenness

unread,
Jul 25, 2013, 5:31:10 PM7/25/13
to astro...@googlegroups.com, David Berry, Norman Gray
On Tue, Jul 23, 2013 at 4:44 PM, Marten van Kerkwijk <m.h.van...@gmail.com> wrote:
Hi Perry,

Thanks very much for your two posts. They really help clarify the problems! And I do certainly see the arguments for having a header file that is readable and thus against hdf5 (the number of times I just viewed FITS headers with `less` are countless). Still, having somewhat fond memories of starlink and its data format (vaguely recall using FIGARO to reduce CGS4 data...): Was that format considered as an alternative to FITS? It does seem Dave and Tim considered many of the same problems before...


Just a final comment that Starlink actually worked on a standalone data model distinct from NDF and FITS but building upon them that they called NDX:


That is also worth having a look at. It also might be a useful read for people working on NDData.


-- 
Tim Jenness

Demitri Muna

unread,
Aug 1, 2013, 3:48:23 PM8/1/13
to astro...@googlegroups.com
Hi,

Sorry, I didn't mean to kick off this big thread and then run away (work and Life interfered).

One clear thing I'm getting from this discussion is the need for a standardized way to specify WCS data in FITS. This is clearly an issue (as evidenced by people jamming data into the headers). Despite my suggestion for a new format (well, I was really just trying to get a conversation going about it to identify the issues), I think it might be worth proposing a FITS 2 format. Here are a few proposals I'd have:

1. Relax the 1970s FORTRAN character limits in the FITS headers.
2. Add a new FITS extension type to store WCS data. This new HDU type would be specified to hold any WCS definition.
3. Add a new FITS extension type to store text. This new HDU type would contain usage notes, documentation, etc. The suggested formatting would be something that is human-readable but allows for some markup, e.g. Markdown. This is to address long series of "COMMENT" headers. Perhaps a single long text field could be included in each HDU as an alternate to a new type.
4. Disallow variable-length rows in tables.
5. Define standards specifically for astronomical metadata and other uses (e.g. HDU names).

My aim would be to minimize the work to modify existing code to work with "FITS 2". Point 1 is trivial to account for. New extensions could simply be ignored by existing software until it is updated. Point 4 drops a "feature", so existing software would not write new files with variable length rows. Point 5 is defining convention and does not change the file format.

I don't know how a WCS extension would be implemented, perhaps using Perry's idea of YAML (which would mean that it could simply be a text type with a specified header). I'm nowhere close to being a WCS expert, but my initial concern is the possibility of malicious code execution from an open-ended format.

How far would this address people's concerns/problems with FITS?

Cheers,
Demitri

_________________________________________
Demitri Muna

Department of Astronomy
Ohio State University

http://scicoder.org/


Erik Bray

unread,
Aug 1, 2013, 4:03:41 PM8/1/13
to astro...@googlegroups.com
On Thu, Aug 1, 2013 at 3:48 PM, Demitri Muna <demitr...@gmail.com> wrote:
> Hi,
>
> Sorry, I didn't mean to kick off this big thread and then run away (work and Life interfered).
>
> One clear thing I'm getting from this discussion is the need for a standardized way to specify WCS data in FITS. This is clearly an issue (as evidenced by people jamming data into the headers). Despite my suggestion for a new format (well, I was really just trying to get a conversation going about it to identify the issues), I think it might be worth proposing a FITS 2 format. Here are a few proposals I'd have:
>
> 1. Relax the 1970s FORTRAN character limits in the FITS headers.

Perry has proposed this repeatedly over the years and had it shot
down. Reason being that because the FITS format does not have any
sense of a format version, any FITS file ever written has to be
readable, on some level, by any existing FITS parser that has ever
existed pretty much.

If you're going to break the format in backwards incompatible ways why
stick with that format at all? Even if you do relax the character
limit you're still left with an extremely archaic format in which it
is virtually impossible to represent compound data structures in any
sensible way.

> 2. Add a new FITS extension type to store WCS data. This new HDU type would be specified to hold any WCS definition.

This is what we're already doing. It still requires our software to
understand the WCS stored this way. Existing software like ds9 can't
make any use of it. It's otherwise an opaque blob of data as far as
any existing software is concerned.

> 3. Add a new FITS extension type to store text. This new HDU type would contain usage notes, documentation, etc. The suggested formatting would be something that is human-readable but allows for some markup, e.g. Markdown. This is to address long series of "COMMENT" headers. Perhaps a single long text field could be included in each HDU as an alternate to a new type.

I've considered adding a specification for something like this before.
But I'm not sure what it really gains you. You can already doing this
by defining an array of characters. One could even support arbitrary
encodings by declaring it an array of bytes and specifying the
encoding in the header.

> 4. Disallow variable-length rows in tables.

Can't do this if you want to support the FITS tile compression format,
which relies on this heavily. I would gladly see a new compression
format defined that doesn't explicitly require the VLA table format.
In fact this would be required for any new file format as well. Most
of the details don't need to be any different from the existing
standard--just the means of locating specific compressed tiles within
the file.

> 5. Define standards specifically for astronomical metadata and other uses (e.g. HDU names).
>
> My aim would be to minimize the work to modify existing code to work with "FITS 2". Point 1 is trivial to account for. New extensions could simply be ignored by existing software until it is updated. Point 4 drops a "feature", so existing software would not write new files with variable length rows. Point 5 is defining convention and does not change the file format.
>
> I don't know how a WCS extension would be implemented, perhaps using Perry's idea of YAML (which would mean that it could simply be a text type with a specified header). I'm nowhere close to being a WCS expert, but my initial concern is the possibility of malicious code execution from an open-ended format.
>
> How far would this address people's concerns/problems with FITS?
>
> Cheers,
> Demitri
>
> _________________________________________
> Demitri Muna
>
> Department of Astronomy
> Ohio State University
>
> http://scicoder.org/
>
>

Demitri Muna

unread,
Aug 1, 2013, 5:49:55 PM8/1/13
to astro...@googlegroups.com
Hi Erik,

On Aug 1, 2013, at 4:03 PM, Erik Bray <erik....@gmail.com> wrote:

Perry has proposed this repeatedly over the years and had it shot
down. Reason being that because the FITS format does not have any
sense of a format version, any FITS file ever written has to be
readable, on some level, by any existing FITS parser that has ever
existed pretty much.

This is a non-answer. (I fully appreciate this is not *your* answer.) All one must do is add a format version. If you query a file and it doesn't respond with a format version, it's version 1. I take it then the official position is that FITS is locked forever? Can't we as a community fork FITS? It's time for the young ones to have a say here.

If you're going to break the format in backwards incompatible ways why
stick with that format at all?  Even if you do relax the character
limit you're still left with an extremely archaic format in which it
is virtually impossible to represent compound data structures in any
sensible way.

It's what astronomers are very familiar with, and updating existing tools is less effort than creating new ones from scratch (and we know how much effort is being made there).

If the alternate is a new format (which I'm not opposed to), I'd like to see one proposed, or at least some momentum to do so on a defined time frame (otherwise, nothing will change).

This is what we're already doing.  It still requires our software to
understand the WCS stored this way.  Existing software like ds9 can't
make any use of it.  It's otherwise an opaque blob of data as far as
any existing software is concerned.

Is this a specification that is defined somewhere? If one were to write a new FITS browser (I am), is there a reference one can go to that at least some group of people have agreed upon? If not, can it be turned into a specification?

3. Add a new FITS extension type to store text. This new HDU type would contain usage notes, documentation, etc. The suggested formatting would be something that is human-readable but allows for some markup, e.g. Markdown. This is to address long series of "COMMENT" headers. Perhaps a single long text field could be included in each HDU as an alternate to a new type.

I've considered adding a specification for something like this before.
But I'm not sure what it really gains you.  You can already doing this
by defining an array of characters.  One could even support arbitrary
encodings by declaring it an array of bytes and specifying the
encoding in the header.

Perhaps, but it's a hack. If we're going to make tweaks to the format, this seems like an easy one.

4. Disallow variable-length rows in tables.

Can't do this if you want to support the FITS tile compression format,
which relies on this heavily.  I would gladly see a new compression
format defined that doesn't explicitly require the VLA table format.
In fact this would be required for any new file format as well.  Most
of the details don't need to be any different from the existing
standard--just the means of locating specific compressed tiles within
the file.

Is this for images? I'm not really familiar with this? I'm more referring to tables of numbers. While we're at it, I'd get rid of support of n-dimensional arrays in table cells. Who thought of that?

Brian Kloppenborg

unread,
Aug 1, 2013, 6:14:38 PM8/1/13
to astro...@googlegroups.com, Demitri Muna

On 08/01/2013 11:49 PM, Demitri Muna wrote:
4. Disallow variable-length rows in tables.

Can't do this if you want to support the FITS tile compression format,
which relies on this heavily.  I would gladly see a new compression
format defined that doesn't explicitly require the VLA table format.
In fact this would be required for any new file format as well.  Most
of the details don't need to be any different from the existing
standard--just the means of locating specific compressed tiles within
the file.

Is this for images? I'm not really familiar with this? I'm more referring to tables of numbers. While we're at it, I'd get rid of support of n-dimensional arrays in table cells. Who thought of that?
Although I very much dislike this aspect of the FITS format, the Optical Interferometry FITS (OIFITS) convention uses it extensively. There is discussion about writing a new OIFITS standard to accommodate new data products so I could suggest that n-dimensional arrays in table cells be dropped. I would just need to suggest a clear alternative and implement a backwards-compatible layer for older software.

Brian

Erik Bray

unread,
Aug 1, 2013, 6:19:31 PM8/1/13
to astro...@googlegroups.com
On Thu, Aug 1, 2013 at 5:49 PM, Demitri Muna <demitr...@gmail.com> wrote:
> Hi Erik,
>
> On Aug 1, 2013, at 4:03 PM, Erik Bray <erik....@gmail.com> wrote:
>
> > Perry has proposed this repeatedly over the years and had it shot
> > down. Reason being that because the FITS format does not have any
> > sense of a format version, any FITS file ever written has to be
> > readable, on some level, by any existing FITS parser that has ever
> > existed pretty much.
>
>
> This is a non-answer. (I fully appreciate this is not *your* answer.) All
> one must do is add a format version. If you query a file and it doesn't
> respond with a format version, it's version 1. I take it then the official
> position is that FITS is locked forever? Can't we as a community fork FITS?
> It's time for the young ones to have a say here.

Yes but your proposed solution, while sensible, violates the
requirement against ever making "FITS" files that can't be parsed as
FITS files by older software. If you make something better, but still
call it FITS, then you're stuck having to support both your new
conventions on top of all the older FITS conventions and it just makes
something even uglier than you had before. If you're going to break
backwards compatibility with FITS it's better to just make a clean
break.

> > If you're going to break the format in backwards incompatible ways why
> > stick with that format at all? Even if you do relax the character
> > limit you're still left with an extremely archaic format in which it
> > is virtually impossible to represent compound data structures in any
> > sensible way.
>
> It's what astronomers are very familiar with, and updating existing tools is
> less effort than creating new ones from scratch (and we know how much effort
> is being made there).

If we were doing our job well in the first place astronomers would
seldom, if ever, need to worry much about the file format at all. The
only reason they do have to think hard about what to put into FITS
files is because the format is so fussy and difficult to extend.
There's too much unnecessary nuance to FITS. The model we're working
toward with Astropy, as well as other software is one in which
scientists developing software and pipelines work with semantically
meaningful object-oriented representations of their data, and don't
have to worry about limitations or details of any kind about the
serialization format until a result is being written to disk. This
is, of course, idealistic. But moving away from kludgy file formats
where what you do with the data *must* be centered around the file
format itself (see FITS WCS) is a necessary movement.

Astronomers may be familiar with FITS, but I think most astronomers
born since the obsolescence of punch cards recognize that it's not a
very good format and only use it because they're stuck with it.

> If the alternate is a new format (which I'm not opposed to), I'd like to see
> one proposed, or at least some momentum to do so on a defined time frame
> (otherwise, nothing will change).

Working on that. Once an initial draft of the format is completed and
some software libraries implemented I would like to start working with
archives like MAST [just speaking for myself here] to add support for
downloading files converted to the new format too. Even though nobody
will use them at first (because other software won't support the
format initially) at least it's *there* and available and accessible.

> > This is what we're already doing. It still requires our software to
> >
> > understand the WCS stored this way. Existing software like ds9 can't
> > make any use of it. It's otherwise an opaque blob of data as far as
> > any existing software is concerned.
>
>
> Is this a specification that is defined somewhere? If one were to write a
> new FITS browser (I am), is there a reference one can go to that at least
> some group of people have agreed upon? If not, can it be turned into a
> specification?

I believe this is the relevant documentation:
http://documents.stsci.edu/hst/HST_overview/documents/DrizzlePac/ch35.html

While documented and useful, it's still rather kludgy and opaque.

> > > 4. Disallow variable-length rows in tables.
>
>
> > Can't do this if you want to support the FITS tile compression format,
> > which relies on this heavily. I would gladly see a new compression
> > format defined that doesn't explicitly require the VLA table format.
> > In fact this would be required for any new file format as well. Most
> > of the details don't need to be any different from the existing
> > standard--just the means of locating specific compressed tiles within
> > the file.
>
> Is this for images? I'm not really familiar with this? I'm more referring to
> tables of numbers.

This is for images, but it's implemented in tables with
variable-length arrays (VLA). There is a VLA column for the
compressed data in which each row contains the compressed data for a
single tile (the number or rows depends on how the image is
tiled--information which is only captured in the header). Since each
tile compresses to a different number of bytes the use of a VLA is
required. There can be other columns in this table, such as a column
for uncompressable data (this occurs for example on tiles that fail
quantization in the lossy floating point compression algorthm
(H_COMPRESS).

That said, I'm also for getting rid of VLA support in tables--it's too
complicated and messy to support in software and its uses seem fewer
and fewer as storage space because cheaper. But that's just my
opinion. One could still develop different but otherwise equivalent
and just as efficient formats for storing compressed image tiles.

> While we're at it, I'd get rid of support of
> n-dimensional arrays in table cells. Who thought of that?

I always thought that seemed pretty useful to me. Say you have a
table containing a column of vectors--otherwise you'd have to create a
separate column for each element of the vector. To me that's much
uglier.

Best,

Erik

Perry Greenfield

unread,
Aug 2, 2013, 12:04:05 PM8/2/13
to astro...@googlegroups.com

On Aug 1, 2013, at 3:48 PM, Demitri Muna wrote:

> Hi,
>
> Sorry, I didn't mean to kick off this big thread and then run away (work and Life interfered).
>
> One clear thing I'm getting from this discussion is the need for a standardized way to specify WCS data in FITS. This is clearly an issue (as evidenced by people jamming data into the headers). Despite my suggestion for a new format (well, I was really just trying to get a conversation going about it to identify the issues), I think it might be worth proposing a FITS 2 format. Here are a few proposals I'd have:
>
> 1. Relax the 1970s FORTRAN character limits in the FITS headers.

As Erik mentions, it isn't so much Fortran as it is punched cards. But what does this mean? Just longer cards? A completely new syntax? As such, it isn't a proposal that says anything. Many, if not nearly all efforts along these lines will further complicate FITS header parsers beyond their current complexity, which I think is a very bad thing.

> 2. Add a new FITS extension type to store WCS data. This new HDU type would be specified to hold any WCS definition.

Again, a lot of new stuff to define (both the content of the extensions), and new keyword/extension naming conventions to settle. Plus, you have to keep supporting the old WCS models (Once there is substantial data out there that uses it, it aint [sic] going away). Yikes.

> 3. Add a new FITS extension type to store text. This new HDU type would contain usage notes, documentation, etc. The suggested formatting would be something that is human-readable but allows for some markup, e.g. Markdown. This is to address long series of "COMMENT" headers. Perhaps a single long text field could be included in each HDU as an alternate to a new type.

We're doing this anyway.

> 4. Disallow variable-length rows in tables.

Too late, they are already heavily used. They won't go away in FITS. And there are reasonable use cases regardless.

> 5. Define standards specifically for astronomical metadata and other uses (e.g. HDU names).
>
> My aim would be to minimize the work to modify existing code to work with "FITS 2". Point 1 is trivial to account for. New extensions could simply be ignored by existing software until it is updated. Point 4 drops a "feature", so existing software would not write new files with variable length rows. Point 5 is defining convention and does not change the file format.

I disagree on point 1.
>
> I don't know how a WCS extension would be implemented, perhaps using Perry's idea of YAML (which would mean that it could simply be a text type with a specified header). I'm nowhere close to being a WCS expert, but my initial concern is the possibility of malicious code execution from an open-ended format.
>
> How far would this address people's concerns/problems with FITS?

At this point, I'd say I'm not sure discussion helps. It's always difficult to convey an alternative in the abstract. (This works both ways; with an abstract alternative, some see the existing item as awful; others see the abstract alternative as threatening). We can argue about it forever. But I think that has been the flaw of much of the development of conventions in astronomical software. It's been very much a design-by-committee approach, and that's just not a good approach generally. And much of the discussion doesn't really prove much. I think it is much more productive to show a working alternative. It's when people can try it out and see what it's plusses and minuses are that they will really understand it (and which of the developer's claims are overblown!). Anyone can say" "it will work this way" but until there is a real working example, it hard to know how well it works in practice.

I think we (STScI) are well down this path and should have something to show (not nearly complete) within a couple months to show more concrete ideas of what we are suggesting, and then begin to contrast the advantages and disadvantages with FITS in a more easily understood way.

Tim Jenness

unread,
Aug 8, 2013, 1:01:39 PM8/8/13
to astro...@googlegroups.com
The discussion at astrodataformat is ongoing and we are currently discussing a paper to be presented at ADASS.

If anyone is interested in joining the discussion (or signing up to the ADASS paper in support) then please send an email to astrodatafor...@googlegroups.com

-- 
Tim Jenness


On Fri, Jul 19, 2013 at 12:30 PM, Perry Greenfield <stsci...@gmail.com> wrote:
It's been pointed out that only members can see this. I'll post the messaged directly.

On Jul 19, 2013, at 3:23 PM, Perry Greenfield wrote:

> I'm providing links to messages I posted a while ago on the referenced astrodataformat google group detailing the issues that eventually made us throw up our hands in disgust and give up. These center on the problems of handling WCS information well within FITS. I'd argue that these problems exist primarily because of the restrictions that FITS  has intrinsically.
>
> https://groups.google.com/forum/#!search/astrodataformat/astrodataformat/GhrsBeXm1G4/Z25eY1UfBvwJ
> https://groups.google.com/forum/#!search/astrodataformat/astrodataformat/XcEs_yw9F4E/kBQe2YZu-SMJ
Reply all
Reply to author
Forward
0 new messages