HATS format in the IVOA upgrades

1 view
Skip to first unread message

Melissa DeLucchi

unread,
Aug 29, 2025, 3:00:48 PM8/29/25
to hat...@googlegroups.com, gwyn...@gmail.com, fxpi...@gmail.com, antara.r....@nasa.gov, Faisst, Andreas, bsi...@gmail.com, Vandana Desai, gp...@ipac.caltech.edu, Raen, Troy J., sgr...@ipac.caltech.edu, David Shupe, ctsl...@uw.edu, Melissa DeLucchi, Jeremy Kubica, Max West, mju...@uw.edu, Sean McGuire, Samuel Dillon Wyatt, Carlos Adean, Julia Gschwend, ldacosta, Wilson, Tom J., Kiessling, Alina A (3266), fri...@slac.stanford.edu, ga...@slac.stanford.edu, Brian Hayden, Erik Tollerud, Steve Lubow, Brian McLean, msan...@stsci.edu, Rick White, Bernie Shiao, Susan Mullally, Sharon Shen, Travis Berger, Tom Donaldson, Lovro Palaversa, mariano.ja...@gmail.com, emil...@gmail.com, yoclaudi...@gmail.com, Andy Tzanidakis, Eric C. Bellm, jca...@lsst.org, Gustavo Schwarz, Jonathan Hargis, luigi....@gmail.com, Manon Marchand, Mark Allen, Zach Claytor, ada....@astro.unistra.fr, Derek Jones, enrique.ut...@ext.esa.int, Francois-Xavier PINEAU, hector....@ext.esa.int, jos.de....@esa.int, Kostya Malanchev, Olivia Lynn, Pierre Fernique, Sara Nieto, Sandro Campos, xl...@fqa.ub.edu, Neven Caplar

The HATS IVOA note v1.0 has been published!


https://www.ivoa.net/documents/Notes/HATS/

https://github.com/ivoa/hats


Thank you to everyone in the working group for making this a reality!

There are several format upgrades that we've made in the lead-up to this milestone, in response to the recommendations of this working group, and of the IVOA Applications working group. Some changes are also in response to user feedback, or LINCC Frameworks team findings.


If you (or your organization) is hosting HATS data, we ask that you please update your datasets to use the latest HATS format and metadata. We hope that this process will be straightforward, and as always, if you have any questions throughout the process, please reach out.


  • The most recent version of HATS and LSDB libraries is v0.6.4. Those versions of libraries should be able to open all HATS catalogs you are hosting.


import lsdb

lsdb.open_catalog("path/to/catalog")



  • If you are still storing or hosting data in the old HiPSCat format, please convert this data into HATS as soon as you can. We have no plan to support the HiPSCat format in the present or in the future. We have provided a conversion tool to help you accomplish this.


Major changes:

  • properties -> hats.properties

    • We have renamed the top-level metadata properties file to hats.properties to better allow for co-hosting of HATS and HiPS catalog data. 

    • HATS libraries will continue to read properties files at least through versions released through July 2026, but we plan to phase out these files in favor of hats.properties

  • point_map.fits -> skymap.fits

    • We have renamed this healpix skymap file to better reflect the underlying data it stores. Additionally, we can provide down-sampled skymaps for UI tools to use. Alternative down-sampled files should be reflected in the hats_skymap_alt_orders property.

    • As with the properties file, we will continue to read the previous file at least through July 2026.

  • Drop Norder/Dir/Npix columns inside parquet files

    • These fields are no longer expected by LSDB APIs, and can be safely dropped from generation, if you're using anything other than hats-import pipeline to create catalogs.

  • Custom parquet extensions (or leaf directories)

    • Data partitions may be named with whatever extension you prefer, based on system requirements (e.g. .parq, .pq, or may even take the form of a directory, where all files within are considered data files for the leaf partition).

    • This is supported with the hats_npix_suffix.

  • data_thumbnail.parquet

    • This is an optional file to ease prototyping analysis pipelines. LSDB currently doesn't do anything with this file, but we've started generating it for catalogs in preparation.

  • Catalog Collections

    • We have a new idea of a catalog collection that is based on our internal conventions for storing and hosting catalog data with the associated supplemental margin and index tables.

    • If you're hosting margin catalogs or index tables alongside the primary object/source catalogs, I'd encourage creating collections to ease user access.

  • Increased support for list columns and nested formats

    • Complex nested data like lightcurves and spectra can be stored easily in parquet format (within the same tables as the primary source/object data), and we have been working on user-friendly access APIs. 

    • If you have complex data that you're interested in serving alongside basic catalog identifiers, we can help!

  • Import with alternative compression or row group creation

    • Parquet supports a lot of ways to optimize your files, for whatever storage requirements and access patterns you find most valuable. Catalogs imported via hats-import can customize the compression (which we've now defaulted to ZSTD, instead of snappy), and tweak the row groups as well.

    • Row groups customization is supported with row_group_kwargs. It accepts two alternatives: num_rows and subtile_order_delta. The first allows setting a custom number of rows per row group. The second allows setting the number of HEALPix order splits for the row groups (e.g. with subtile_order_delta=1, a partition of order 3 will comprise up to 4 row groups, each containing the data for the corresponding pixel at order 4).

    • Alternative compression schemes are still up to the parquet reader to support, so use caution!


I'm reminded of the refrain from IVOA interop in UMD this summer: If you want to go fast, go alone; If you want to go far, go together. This can feel like a long journey, but we're building something big together.

-Melissa, on behalf of the HATS/LSDB LINCC Frameworks team

--
=======
Melissa DeLucchi (duh-LOO-kee)
she/they
Reply all
Reply to author
Forward
0 new messages