On Thu, Aug 1, 2013 at 5:49 PM, Demitri Muna <
demitr...@gmail.com> wrote:
> Hi Erik,
>
> On Aug 1, 2013, at 4:03 PM, Erik Bray <
erik....@gmail.com> wrote:
>
> > Perry has proposed this repeatedly over the years and had it shot
> > down. Reason being that because the FITS format does not have any
> > sense of a format version, any FITS file ever written has to be
> > readable, on some level, by any existing FITS parser that has ever
> > existed pretty much.
>
>
> This is a non-answer. (I fully appreciate this is not *your* answer.) All
> one must do is add a format version. If you query a file and it doesn't
> respond with a format version, it's version 1. I take it then the official
> position is that FITS is locked forever? Can't we as a community fork FITS?
> It's time for the young ones to have a say here.
Yes but your proposed solution, while sensible, violates the
requirement against ever making "FITS" files that can't be parsed as
FITS files by older software. If you make something better, but still
call it FITS, then you're stuck having to support both your new
conventions on top of all the older FITS conventions and it just makes
something even uglier than you had before. If you're going to break
backwards compatibility with FITS it's better to just make a clean
break.
> > If you're going to break the format in backwards incompatible ways why
> > stick with that format at all? Even if you do relax the character
> > limit you're still left with an extremely archaic format in which it
> > is virtually impossible to represent compound data structures in any
> > sensible way.
>
> It's what astronomers are very familiar with, and updating existing tools is
> less effort than creating new ones from scratch (and we know how much effort
> is being made there).
If we were doing our job well in the first place astronomers would
seldom, if ever, need to worry much about the file format at all. The
only reason they do have to think hard about what to put into FITS
files is because the format is so fussy and difficult to extend.
There's too much unnecessary nuance to FITS. The model we're working
toward with Astropy, as well as other software is one in which
scientists developing software and pipelines work with semantically
meaningful object-oriented representations of their data, and don't
have to worry about limitations or details of any kind about the
serialization format until a result is being written to disk. This
is, of course, idealistic. But moving away from kludgy file formats
where what you do with the data *must* be centered around the file
format itself (see FITS WCS) is a necessary movement.
Astronomers may be familiar with FITS, but I think most astronomers
born since the obsolescence of punch cards recognize that it's not a
very good format and only use it because they're stuck with it.
> If the alternate is a new format (which I'm not opposed to), I'd like to see
> one proposed, or at least some momentum to do so on a defined time frame
> (otherwise, nothing will change).
Working on that. Once an initial draft of the format is completed and
some software libraries implemented I would like to start working with
archives like MAST [just speaking for myself here] to add support for
downloading files converted to the new format too. Even though nobody
will use them at first (because other software won't support the
format initially) at least it's *there* and available and accessible.
> > This is what we're already doing. It still requires our software to
> >
> > understand the WCS stored this way. Existing software like ds9 can't
> > make any use of it. It's otherwise an opaque blob of data as far as
> > any existing software is concerned.
>
>
> Is this a specification that is defined somewhere? If one were to write a
> new FITS browser (I am), is there a reference one can go to that at least
> some group of people have agreed upon? If not, can it be turned into a
> specification?
I believe this is the relevant documentation:
http://documents.stsci.edu/hst/HST_overview/documents/DrizzlePac/ch35.html
While documented and useful, it's still rather kludgy and opaque.
> > > 4. Disallow variable-length rows in tables.
>
>
> > Can't do this if you want to support the FITS tile compression format,
> > which relies on this heavily. I would gladly see a new compression
> > format defined that doesn't explicitly require the VLA table format.
> > In fact this would be required for any new file format as well. Most
> > of the details don't need to be any different from the existing
> > standard--just the means of locating specific compressed tiles within
> > the file.
>
> Is this for images? I'm not really familiar with this? I'm more referring to
> tables of numbers.
This is for images, but it's implemented in tables with
variable-length arrays (VLA). There is a VLA column for the
compressed data in which each row contains the compressed data for a
single tile (the number or rows depends on how the image is
tiled--information which is only captured in the header). Since each
tile compresses to a different number of bytes the use of a VLA is
required. There can be other columns in this table, such as a column
for uncompressable data (this occurs for example on tiles that fail
quantization in the lossy floating point compression algorthm
(H_COMPRESS).
That said, I'm also for getting rid of VLA support in tables--it's too
complicated and messy to support in software and its uses seem fewer
and fewer as storage space because cheaper. But that's just my
opinion. One could still develop different but otherwise equivalent
and just as efficient formats for storing compressed image tiles.
> While we're at it, I'd get rid of support of
> n-dimensional arrays in table cells. Who thought of that?
I always thought that seemed pretty useful to me. Say you have a
table containing a column of vectors--otherwise you'd have to create a
separate column for each element of the vector. To me that's much
uglier.
Best,
Erik