Status of ECSV?

109 views
Skip to first unread message

E. Madison Bray

unread,
Sep 19, 2019, 4:47:09 AM9/19/19
to astropy-dev
Hi folks,

I have a question / proposal for a topic of discussion: Do we know if
anyone is actively using and/or promoting the ECSV format supported by
Astropy?

Those of you who have been hear a while may know of the ECSV format
[1] spec'd by Tom Aldcroft, and included in Astropy as of quite some
time ago. It's a very simple, straightforward format to simply output
a CSV file with some metadata attached to it.

I searched this mailing list and haven't even seen mention of it since
2017 (The spec also mentions future versions of the format which don't
seem to have ever materialized.) Though maybe it has people just
using it happily without complaint? I don't know...

I ask because I'm working on a project where I have exactly this
problem: I have some data structure that consists of some JSON-like
structured metadata, and some table-like data (stored in CSV, since
it's not so large, and the users would like to be able to read it
manually).

So I would like to use ECSV, or something like it, but it never seems
to have gained traction as a format, much less outside of Astropy.
Although it works well *in* Astropy due to the great CSV engine and
Table format, I wonder if it would make sense to have a stand-alone
ECSV library not dependent on Astropy. Additionally, in the project
I'm working on, they're using Pandas to read and write these files, so
maybe instead of one stand-alone library it could be of interest to
add ECSV support to Pandas. Not sure if they would go for that though
if there isn't strong interest in the format...

Additionally, if there are any alternatives to ECSV that people have
been flocking to I'd be curious (not that I've heard of any). I know
of ASDF of course (I wrote the damn spec, to paraphrase an American
presidential candidate...), but I fear it might be too complicated for
me to be able to easily sell my users on. But I don't know, maybe I
should try...

Thanks for your input,
Erik


[1] https://github.com/astropy/astropy-APEs/blob/master/APE6.rst

Christoph Deil

unread,
Sep 19, 2019, 5:18:57 AM9/19/19
to astropy-dev
Hi Erik,

I’m a heavy and happy user of ECSV for the past years, and it’s used by several of my astronomy colleagues. I love it.

It’s normal that it’s only used by astronomers and Astropy users if that’s the only tool supporting it.
Probably it would be useful to promote ECSV more, and to add support for it in other tools (e.g. Java & TOPCAT) if wider adoption is the goal.
But having it as a great way to serialise tables within Astropy is also something, IMO could stay as-is forever if no-one has the time to do more.

Concerning pandas, the integration is pretty good: https://docs.astropy.org/en/stable/table/pandas.html
I guess wrapping the other way, i.e. ECSV support by pandas.read_csv` directly, could be nice.
But that’s only a small convenience and possibly they don’t want it because it’s too domain-specific.

The one thing I wish was in `table.to_pandas` was an option to drop things that can’t be represented in the pandas data frame, that’s set to True by default (maybe emit a warning).
This way that would “just work”, whereas now I often have tables read from FITS with an array column, and still want to process the other columns with pandas.
I know this column selection can be done with one line, but I can never remember how to do it, and others don’t know about it, so I think the extra option would add real value in practice.

IMO it would be nice if pandas DataFrame hat a meta dict just like Astropy table, then interoperability would be much higher.
Probably has been proposed and rejected years ago?

Concerning other formats, I see more and more use of HDF5, and also https://parquet.apache.org/ and https://arrow.apache.org/.
But that’s pretty different from ECSV, more in the same league as ASDF.

Hope that helps a bit as one user report data point with the choices for your project!

Christoph

--
You received this message because you are subscribed to the Google Groups "astropy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/astropy-dev/CAOTD34botym%3DRVOxaxo17ZikuU1_GaPcOba9EwzM6QoWcHFE7g%40mail.gmail.com.

Aldcroft, Tom

unread,
Sep 19, 2019, 7:13:20 AM9/19/19
to astropy-dev
Hi Eric,

I think Christoph made good points and I'll just add a few more.  ECSV is definitely active within astropy, and that is reflected on GitHub not astropy-dev.  In particular I've done a fair bit of work (with quite a lot of support from Marten) to allow lossless serialization of mixin columns like Time, Quantity, SkyCoord.  We have done this with an astropy-specific convention (ala FITS conventions, for better or worse) for putting particular meta into the ECSV output.  This did not require changing the spec, and I specifically do not want any Python-specific convention in the ECSV spec itself.  For example:

In [1]: astro
astropy=3.2.1
In [2]: tm = Time(['2000:001', '2000:002'])
In [3]: q = [3, 4] * u.m
In [4]: t = QTable([tm, q], names=['tm', 'q'])
In [5]: t.write('junk.ecsv', overwrite='True')
In [6]: cat junk.ecsv
# %ECSV 0.9
# ---
# datatype:
# - {name: tm, datatype: string}
# - {name: q, unit: m, datatype: float64}
# meta: !!omap
# - __serialized_columns__:
#     q:
#       __class__: astropy.units.quantity.Quantity
#       unit: !astropy.units.Unit {unit: m}
#       value: !astropy.table.SerializedColumn {name: q}
#     tm:
#       __class__: astropy.time.core.Time
#       format: yday
#       in_subfmt: '*'
#       out_subfmt: '*'
#       precision: 3
#       scale: utc
#       value: !astropy.table.SerializedColumn {name: tm}
# schema: astropy-2.0
tm q
2000:001:00:00:00.000 3.0
2000:002:00:00:00.000 4.0

The two of us have been remiss in not giving this "convention" proper documentation, but even now I don't think it belongs in the core ECSV spec.

But back to the main point of using ECSV outside of astropy, in particular with pandas DataFrame.  As Christoph said, it is rather easy to do this right now as long as you accept the astropy dependency.  These days I'm not nearly so concerned about "big dependencies" like astropy since "pip install astropy" just works in a matter of a couple of seconds.  (I used to fret about pandas, but no more, and people are now comfortable doing "pip install tensorflow"...)

Of course the big selling point of ECSV is carrying all the metadata.  AFAIK the only bit of ECSV that would be relevant for pandas is the dtype since everything else is not supported in DataFrame.  That's a shame and I'm sure people have asked, but I don't know where things stand.  But beyond that, it is very important to recognize that the prime directive of ECSV is that the files can be read by pretty-much any CSV parser, you just lose the metadata.  So there is never a problem *writing* files as ECSV.

This is all a way to say that I personally don't have much motivation to spend time *pushing* for ECSV adoption outside astropy.  That said, of course I would be happy if someone wrote `read_ecsv` and `write_ecsv` methods in Pandas!  I don't know if this would be accepted (no clue about their community embracing a slightly domain-specific format).

About TOPCAT, that is interesting, I just have no idea what kind of metadata is available in their table representation.

About the idea of a standalone library for parsing, one of the other key motivations for ECSV was basically to make that not necessary.  In effect you have two parts to the file:
  • Header: just strip off the leading # character and drop into any YAML parser in your app (in java, C, perl, whatever)
  • Data: read as CSV in your app
So it is really just a few lines of code to get to a header data structure and the data.  From there it is up to the app (e.g. TOPCAT) to coerce that into its own table representation.

Note that I have spent noticeable time recently working on interoperability of Table and DataFrame, and continuing in this direction is a good thing.  Christoph, your idea sounds good, open an issue!  Even better, a PR.  :-)

Cheers,
Tom

Mark Taylor

unread,
Sep 19, 2019, 7:47:43 AM9/19/19
to astropy-dev
On Thu, 19 Sep 2019, Aldcroft, Tom wrote:

> About TOPCAT, that is interesting, I just have no idea what kind of
> metadata is available in their table representation.
>
> About the idea of a standalone library for parsing, one of the other key
> motivations for ECSV was basically to make that not necessary. In effect
> you have two parts to the file:
>
> - Header: just strip off the leading # character and drop into any YAML
> parser in your app (in java, C, perl, whatever)
> - Data: read as CSV in your app
>
> So it is really just a few lines of code to get to a header data structure
> and the data. From there it is up to the app (e.g. TOPCAT) to coerce that
> into its own table representation.

TOPCAT (and its underlying table I/O library STIL) reads/writes and
generally has use for column metadata fields name, datatype, unit, and
description, as well as some VO-specific things like UCD and Utype;
it can also work with arbitrary per-column metadata for file formats
where that's appropriate. So having done a quick scan of the ECSV spec,
it looks in principle like STIL I/O handlers could fit the format
reasonably well.

However, I'm generally not too enthusiastic about adding support for
additional ASCII-like formats, partly because there are so many of
them out there. If it turned out there was huge demand for ECSV
from topcat users (or potential topcat users) I might consider it,
but I'd probably take a bit of persuasion (e.g. I don't currently
have a YAML parser).

Mark

--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.t...@bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/

Marten van Kerkwijk

unread,
Sep 19, 2019, 9:59:51 AM9/19/19
to astro...@googlegroups.com
Hi Mark,

This is slightly off-topic of the thread, but since you wrote: what ascii format with metadata would you think is the best? E.g., astropy currently can read but not write `cds` format - is that useful to have?

Thanks,

Marten

Mark Taylor

unread,
Sep 19, 2019, 11:22:07 AM9/19/19
to astro...@googlegroups.com
Marten,

the only human-readable format with non-minimal metadata that
TOPCAT/STIL really supports is IPAC.
It can also read/write CSV (with commas) and a format it calls "ASCII"
(basic whitespace space-separated values) but neither of those
has metadata beyond column name. There are a couple of others,
but they're really only of historical interest.
Details for reference are at
http://www.starlink.ac.uk/stil/sun252/tableBuilders.html

I have had requests for quite a few others over the years,
but no single format has had more than a handful of requests,
and many of them have not been very well defined, so I haven't
really considered it worth my while to implement any others.
In most cases I think people can get the data in some other
format than the one they're asking about from source;
probably in some other cases people use astropy to convert to
something more topcat-friendly.

I generally try to persuade people to use FITS where possible,
since topcat can handle it much more efficiently than text-based
formats, at least for large tables, though I admit that something
human-readable/-writable has benefits in some circumstances.

Tim Jenness

unread,
Sep 19, 2019, 12:37:11 PM9/19/19
to astro...@googlegroups.com
We recently adopted ECSV at LSST for serializing defect masks and QE curves in a format for easy human curation (we convert to FITS when we want to use them in production). We really like how easy it is to use. We had two requirements: one simple text format so we could see what was going on and git would track changes, and secondly that we had an easy way to include metadata in the file to let us record which sensor the data came from.

-- 
Tim Jenness

David Kirkby

unread,
Sep 19, 2019, 2:08:46 PM9/19/19
to astro...@googlegroups.com

E. Madison Bray

unread,
Sep 20, 2019, 4:50:26 AM9/20/19
to astropy-dev
On Thu, Sep 19, 2019 at 11:18 AM 'Christoph Deil' via astropy-dev
<astro...@googlegroups.com> wrote:
>
> Hi Erik,
>
> I’m a heavy and happy user of ECSV for the past years, and it’s used by several of my astronomy colleagues. I love it.
>
> It’s normal that it’s only used by astronomers and Astropy users if that’s the only tool supporting it.
> Probably it would be useful to promote ECSV more, and to add support for it in other tools (e.g. Java & TOPCAT) if wider adoption is the goal.
> But having it as a great way to serialise tables within Astropy is also something, IMO could stay as-is forever if no-one has the time to do more.

Thanks, that's good to know. Yes, it's simple enough that it would be
easy to implement in other languages, I think.

> Concerning pandas, the integration is pretty good: https://docs.astropy.org/en/stable/table/pandas.html
> I guess wrapping the other way, i.e. ECSV support by pandas.read_csv` directly, could be nice.
> But that’s only a small convenience and possibly they don’t want it because it’s too domain-specific.

That's exactly what I was thinking! Just look at the signature for
pandas.read_csv; it's a nightmare:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

Of course, any of us who have dealt with arbitrary "csv" files knows
why it's such a nightmare so that isn't a criticism. Just, one of the
many nice things about ECSV is that it *is* standardized, and doesn't
require as many bells and whistles to parse properly, at least in
principle. A pandas.read_ecsv would no doubt be simpler...

> IMO it would be nice if pandas DataFrame hat a meta dict just like Astropy table, then interoperability would be much higher.
> Probably has been proposed and rejected years ago?

True, but absent anything of the sort, the DataFrame could be returned
along with the metadata dict as a tuple, or something.

> Concerning other formats, I see more and more use of HDF5, and also https://parquet.apache.org/ and https://arrow.apache.org/.
> But that’s pretty different from ECSV, more in the same league as ASDF.

Indeed, pandas also supports reading from parquet. For my purposes
that's all overkill though.

> Hope that helps a bit as one user report data point with the choices for your project!

Thanks!
> To view this discussion on the web visit https://groups.google.com/d/msgid/astropy-dev/1A937FC5-2467-404C-AA37-DA580F05F3AC%40googlemail.com.

E. Madison Bray

unread,
Sep 20, 2019, 5:04:42 AM9/20/19
to astropy-dev
On Thu, Sep 19, 2019 at 1:13 PM Aldcroft, Tom <tald...@gmail.com> wrote:
>
> Hi Eric,
>
> I think Christoph made good points and I'll just add a few more. ECSV is definitely active within astropy, and that is reflected on GitHub not astropy-dev. In particular I've done a fair bit of work (with quite a lot of support from Marten) to allow lossless serialization of mixin columns like Time, Quantity, SkyCoord. We have done this with an astropy-specific convention (ala FITS conventions, for better or worse) for putting particular meta into the ECSV output. This did not require changing the spec, and I specifically do not want any Python-specific convention in the ECSV spec itself.

Indeed, there's nothing about ECSV that need be Python-specific.
Shoehorning in application-specific conventions, as you say, maybe
slightly unfortunate, but also perfectly doable if there's a local
need; no worse than doing the same with a JSON file.

> But back to the main point of using ECSV outside of astropy, in particular with pandas DataFrame. As Christoph said, it is rather easy to do this right now as long as you accept the astropy dependency. These days I'm not nearly so concerned about "big dependencies" like astropy since "pip install astropy" just works in a matter of a couple of seconds. (I used to fret about pandas, but no more, and people are now comfortable doing "pip install tensorflow"...)

I don't necessarily find that as acceptable when it's such a
relatively simple thing to write a separate library for. Which is not
to say I'm volunteering anyone to do that. This is purely a matter of
taste though, I think, so I won't argue about it.

> This is all a way to say that I personally don't have much motivation to spend time *pushing* for ECSV adoption outside astropy. That said, of course I would be happy if someone wrote `read_ecsv` and `write_ecsv` methods in Pandas! I don't know if this would be accepted (no clue about their community embracing a slightly domain-specific format).

You and Christoph both referred to it as "domain-specific" but that's
only the case because its only implementation is buried in Astropy,
and it's unknown outside the Astropy user community (maybe that's why
you wrote "slightly"). Of course it's usable for any purpose.

> About TOPCAT, that is interesting, I just have no idea what kind of metadata is available in their table representation.
>
> About the idea of a standalone library for parsing, one of the other key motivations for ECSV was basically to make that not necessary. In effect you have two parts to the file:
>
> Header: just strip off the leading # character and drop into any YAML parser in your app (in java, C, perl, whatever)
> Data: read as CSV in your app
>
> So it is really just a few lines of code to get to a header data structure and the data. From there it is up to the app (e.g. TOPCAT) to coerce that into its own table representation.

Well that's just it--in theory you can write an ECSV "parser" in a few
lines of code. But if you also want validation, and conversion to a
native table representation, it takes a little more work. I could see
a case for a (still, very simple) library with built-in support for
conversion to some native table format given the CSV columns and the
datatype dict from the ECSV header. I think there are at least a few
bits there that can avoid repetition (especially if/when new versions
of the spec ever do come out :)

Anyways, I just wanted to make sure the format wasn't dead, and if
it's still being actively used, even in a niche application, that's
good enough for me. Perhaps I can write such a library and spread its
use to new domains :)

I'm also still considering using ASDF here. File format
standardization in the bioinformatics world (to the extent there is
any such thing as "file formats" at all) is quite all over the place,
and ASDF could have a lot of application there, if only it were
known...

Best,
Erik
> To view this discussion on the web visit https://groups.google.com/d/msgid/astropy-dev/CAMtEP6wN-D1VAgOqNP9wg%3D-q%2BqOO7OASxDiztF0%3DUWUf3X_93A%40mail.gmail.com.

Mark Taylor

unread,
May 1, 2020, 5:39:26 PM5/1/20
to astropy-dev
Following up this topic from a while back - having had a few requests for
ECSV support in TOPCAT, I've gone ahead and implemented it.
This isn't in public release so far, but a pre-release is available at

ftp://andromeda.star.bris.ac.uk/pub/star/topcat/pre/topcat-full_ecsv.jar

if anybody wants to try it. Any feedback welcome.

Tom Aldcroft

unread,
May 2, 2020, 6:47:38 AM5/2/20
to astropy-dev mailing list
Hi Mark,

That's awesome, thanks!

- Tom

--
You received this message because you are subscribed to the Google Groups "astropy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.

Zhiyuan Ma

unread,
May 4, 2020, 1:27:35 PM5/4/20
to astro...@googlegroups.com
Thank you Mark and this is awesome.

I also would say that we (The TolTEC project) also use ECSV.

Here is a very simple C++ parser that I created: https://github.com/toltec-astro/common_utils/blob/kids_dev/src/utils/ecsv.h
It is within a larger package but is header only. I'll probably make it a stand alone package if I have the time.
> --
> You received this message because you are subscribed to the Google Groups "astropy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/astropy-dev/alpine.DEB.2.21.2005012238160.2890%40IT076926.

Reply all
Reply to author
Forward
0 new messages