On Thu, Mar 20, 2014 at 8:16 PM, Demitri Muna <
demitr...@gmail.com> wrote:
> Hi Erik,
>
> On 20 Mar 2014, at 5:25 PM, Erik Bray <
erik....@gmail.com> wrote:
>
>> My aim here is somewhere in-between developing a "standard" and working with FITS in Python.
>
> I say let's do both! I wasn't clear enough in my original email. I'm not suggesting using other schemes like JSON or XML to perform the actual validation - absolutely, make a Python tool because that's what people will use and that makes sense. What I'm saying is make validation description *independent* of Python. Any tool you write will read/parse this description using Python and can provide the same interface you're proposing. Don't have a validation rule that says "field x must be equal to the result of this Python function", because at that point the scheme is locked into Python specifically.
Yes, obviously.  See my previous comment that "The temptation, which I
admit is difficult to resist (especially for standards bodies :), is
to invent an entirely new syntax that is independent of any specific
programming environment or language"
In principle you would be correct and I would also have similar
preferences.  However, this still entirely ignores the point that
defining such a format immediately becomes an enormous constraint on
what you can actually do with it.  The more power and flexibility it
gains the more one spends time and effort implementing what ends up
amounting to a new programming language.  Believe me--I've put a lot
of thought into this.  If someone wants to spend the effort developing
a "language independent" format (which, as I've stated, will really be
a new language unto itself) I won't discourage them.  I think it could
still be better than tying specifically to Python.  But that's not an
effort I'm willing to spend my time one.
Also, I don't think I'm being too kind to astronomers.  I'm sure there
are some out there who would scoff at the need for such a thing.  But
I'm hardly the first one to think of this, not even in the context of
FITS.  That's partly what XDF and FITSML were about.  I suspect those
failed, in part, because they were constrained by what one could
describe in pure XML.
On Thu, Mar 20, 2014 at 9:24 PM, Michael Droettboom <
mdb...@gmail.com> wrote:
> Erik,
>
> I think this is really cool. I think I share some of Demitri’s concerns
> about the Python-specificness of this, but I also agree with you that any
> language sufficiently useful would also approach Turing completeness, and
> therein lies madness. So, just in terms of getting something up and running
> without a gargantuan effort, I think this makes a lot of sense. I do notice
> from the docs, however, that most of the rules are ultimately expressed in
> simple lambda expressions that really just use the basic expression syntax.
> I wonder how far you could get building the description on something like a
> JSON tree supplemented with functions that only include the basic binary
> operations (i.e. not looping constructs or anything more sophisticated). If
 I'm not sure that really gains much.  The only schemas I've
implemented so far implement the basic FITS Standard (plus the
checksum convention) and for the *most* part that does, thankfully,
only involve a few simple expressions in the lambdas.  However, if you
scroll through you'll note a few commented out rules that are marked
TODO.  I will get to those, but they involve a bit more complexity,
such as parsing table formats.  Or units (something that I think I'll
only bother with if/when this is merged into Astropy, although I might
just backport the FITS unit *parser*).
And the rules for the FITS Standard itself are still comparably
simple.  I've started on a prototype schema for FITS WCS, and the
kinds of rules one has to define grow considerably more complex,
requiring flow control constructs like branching and looping.  It all
still fits nicely into the schema organization, but the value checkers
are non-trivial.  One could argue, "Well, if this format is
FITS-specific anyways just make a few special rules for the FITS
Standard and FITS WCS and call it a day." But that would obviate the
need for a special schema format in the first place, and just becomes
a basic FITS checker.  I envision usefulness of this for conventions
beyond just the standard FITS--it should also be useful for
instrument-specific conventions, themselves which can be fairly
complex.
There are two points I could make that might point toward a compromise:
1. Most of the schema organization is *not* Python specific.  Yes, I
use Python classes to represent schemas, but that doesn't have to be
the case.  Most of the structure could fit just as easily in a
JavaScript object, for example.  The inheritance rules are distinctly
Python-influenced, but there's no reason they couldn't be implemented
in another language. That just leaves the validation functions
themselves.  Again, I believe there should be no *arbitrary*
restrictions on these that make certain things impossible.  But that
doesn't mean there could be *no* restrictions whatsoever.  Although I
don't see much value in devising some kind of expression microformat
one could, say, limit the allowed Python language constructs to ones
that can be very easily translated to other languages, just as
JavaScript through the Pyjs translator
<
http://pyjs.org/Translator.html>.
As a related alternative, the validation functions could be provided
as strings, but again rather than inventing a new language they might
as well use an existing language.  I chose Python because PyFITS is
written in Python, but it could just as easily be JavaScript or
something else.  As long as it's possible to fire up an interpreter
for that language to pass that string through it can be useful.  This
calls to mind how MongoDB uses JavaScript as its query language.
2. On the Python end of things one thing I still intend to do is
simplify how the validation functions or lambda are written.
Currently every function has to accept that **ctx dict of arbitrary
keyword arguments.  But there's a lot I can do with function signature
introspection to limit the number of arguments a function needs to
accept when executing the schema.
> that, plus a few convenience functions like the WCSAxes trick gets you most
> of what you need, you’d have a pretty simple language-agnostic format on
> your hands. In any case, practicality may trump that at least in the short
> term.
Unfortunately "a few convenience functions" can still only get you so
far.  It prevents implementation of validators for custom conventions,
of which there are countless in FITS.  Part of my goal in all this is
to try to introduce some sanity to the universe of FITS formats.
> To change the subject a bit — this directly relates to a couple of tools
> I’ve worked on that I’d love to punt off to a more robust and powerful tool
> such as this.
<...snip...>
Regarding tools like fitsblender and fits_generator, I have definitely
been thinking along the lines of integration with these kinds of
tools.  On the fitsblender end of things, PyFITS currently has very
loosey rules for combining FITS headers.  Currently, when using an
existing header in a new HDU it does things like strip out all the
keywords particular to the FITS data format, such as the BITPIX and
NAXIS keywords.  There's just a list of these (along with the
umpteenth block of code to loop over the NAXISn keywords).  But now I
have a very explicit way of pointing to which keywords are part of
some "standard" and should be kept, and which are custom keywords, or
keywords specific to another convention.  That still does not fully
answer how they should be combined. But I'm thinking along those
lines.
Likewise, on the fits_generator end of things, I am definitely
thinking about how this can be used to translate between different
FITS conventions, or even to/from FITS and a different data format
entirely (i.e. FINF).  Not sure about that yet, but just implementing
a standard ruleset for FITS itself was only the first step.