On Thu, Sep 19, 2019 at 1:13 PM Aldcroft, Tom <
tald...@gmail.com> wrote:
>
> Hi Eric,
>
> I think Christoph made good points and I'll just add a few more. ECSV is definitely active within astropy, and that is reflected on GitHub not astropy-dev. In particular I've done a fair bit of work (with quite a lot of support from Marten) to allow lossless serialization of mixin columns like Time, Quantity, SkyCoord. We have done this with an astropy-specific convention (ala FITS conventions, for better or worse) for putting particular meta into the ECSV output. This did not require changing the spec, and I specifically do not want any Python-specific convention in the ECSV spec itself.
Indeed, there's nothing about ECSV that need be Python-specific.
Shoehorning in application-specific conventions, as you say, maybe
slightly unfortunate, but also perfectly doable if there's a local
need; no worse than doing the same with a JSON file.
> But back to the main point of using ECSV outside of astropy, in particular with pandas DataFrame. As Christoph said, it is rather easy to do this right now as long as you accept the astropy dependency. These days I'm not nearly so concerned about "big dependencies" like astropy since "pip install astropy" just works in a matter of a couple of seconds. (I used to fret about pandas, but no more, and people are now comfortable doing "pip install tensorflow"...)
I don't necessarily find that as acceptable when it's such a
relatively simple thing to write a separate library for. Which is not
to say I'm volunteering anyone to do that. This is purely a matter of
taste though, I think, so I won't argue about it.
> This is all a way to say that I personally don't have much motivation to spend time *pushing* for ECSV adoption outside astropy. That said, of course I would be happy if someone wrote `read_ecsv` and `write_ecsv` methods in Pandas! I don't know if this would be accepted (no clue about their community embracing a slightly domain-specific format).
You and Christoph both referred to it as "domain-specific" but that's
only the case because its only implementation is buried in Astropy,
and it's unknown outside the Astropy user community (maybe that's why
you wrote "slightly"). Of course it's usable for any purpose.
> About TOPCAT, that is interesting, I just have no idea what kind of metadata is available in their table representation.
>
> About the idea of a standalone library for parsing, one of the other key motivations for ECSV was basically to make that not necessary. In effect you have two parts to the file:
>
> Header: just strip off the leading # character and drop into any YAML parser in your app (in java, C, perl, whatever)
> Data: read as CSV in your app
>
> So it is really just a few lines of code to get to a header data structure and the data. From there it is up to the app (e.g. TOPCAT) to coerce that into its own table representation.
Well that's just it--in theory you can write an ECSV "parser" in a few
lines of code. But if you also want validation, and conversion to a
native table representation, it takes a little more work. I could see
a case for a (still, very simple) library with built-in support for
conversion to some native table format given the CSV columns and the
datatype dict from the ECSV header. I think there are at least a few
bits there that can avoid repetition (especially if/when new versions
of the spec ever do come out :)
Anyways, I just wanted to make sure the format wasn't dead, and if
it's still being actively used, even in a niche application, that's
good enough for me. Perhaps I can write such a library and spread its
use to new domains :)
I'm also still considering using ASDF here. File format
standardization in the bioinformatics world (to the extent there is
any such thing as "file formats" at all) is quite all over the place,
and ASDF could have a lot of application there, if only it were
known...
Best,
Erik
> To view this discussion on the web visit
https://groups.google.com/d/msgid/astropy-dev/CAMtEP6wN-D1VAgOqNP9wg%3D-q%2BqOO7OASxDiztF0%3DUWUf3X_93A%40mail.gmail.com.