Experience with HDF 5

18 views
Skip to first unread message

Berit Müller

unread,
Feb 24, 2021, 9:21:57 AM2/24/21
to openmod-i...@googlegroups.com

Dear open-mods,

 

As part of an overarching project, we will create a database for measurement data from the building sector from hundreds of projects. We actually started with the idea of using a PostGre SQL database. However, the previous project had started building the database with HDF 5.

Therefore, I am now looking for experiences with the use of HDF 5, especially which software is available for searching HDF 5.

 

I am looking forward to answers from your wealth of experience.

Berit

-------------------------------------------------------

Dipl.-Ing. Berit Müller

managing director

DGS - German society for solar energy

section Berlin Brandenburg 

Erich-Steinfurth-Straße 8          

D - 10243 Berlin

Phone: +49 30 293812 67

Fax: +49 30 293812 61

Mail: b...@dgs-berlin.de

www.dgs-berlin.de

 

District court of Berlin-Charlottenburg VR 7591 B

 

Jack Kelly

unread,
Feb 24, 2021, 9:47:52 AM2/24/21
to Berit Müller, openmod-i...@googlegroups.com
Hi Berit!

HDF5 is a great file format for dense n-dimensional arrays (like numerical weather predictions or satellite imagery).  And HDF5 is easy to use from a variety of programming languages.  For example, there are a variety of ways of searching HDF5 from Python.  For example, here's the documentation for Pandas' HDF5 functionality: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-hdf5

HDF5 files can be stored in two formats: 'fixed' and 'table'.  Fixed format is fastest when reading the entire file into memory.  But it's not searchable or appendable.  Table format is slower, but is appendable and searchable.

If you'd like us to help you decide on HDF5 vs SQL for your application, please could you provide some more info about your project, e.g.:
  • What's the typical read pattern for the data?  e.g. is it mostly searching for a small amount of data?  (e.g. "show me the temperature for house X at time Y").  Or will you be loading the entire dataset into memory?  (e.g. to train machine learning models; or to create an animated heatmap of temperatures across all buildings across time)?
  • What's the typical write pattern?  e.g. Will you be updating the data many times a day from lots of different computers?  Or just updating once a year from one computer?
  • Will you want to join across multiple tables?
  • Will the data be stored in a public cloud storage bucket?  (If so, something like Zarr might be better than HDF5)
  • What software tools will you want to use to access the data?  e.g. custom Python scripts?  Or 'no-code' visualisation tools like Tableau?
  • Roughly how much data will there be in total?  1 GB?  1 TB?  1 PB?!
Thanks!
Jack

--
You received this message because you are subscribed to the Google Groups "openmod initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openmod-initiat...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/AM9PR10MB4184510A13AF3BA5ABA4F43D999F9%40AM9PR10MB4184.EURPRD10.PROD.OUTLOOK.COM.

Robbie Morrison

unread,
Feb 24, 2021, 10:30:40 AM2/24/21
to openmod-i...@googlegroups.com

Hi Berit

PowerSystems.jl uses HDF5 for backroom storage.  Documentation here:

And the julia code somewhere on GitHub.

I guess electricity systems and built environment datasets are a bit different in nature?  But then the data model underpinning PowerSystems.jl is quite rich.

HTH, Robbie

To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/CAPEORoncjsv5CMfz-C7Jb7PWQMeYF%2BVTnigf8JqSQ%2BP%3D7bdz8Q%40mail.gmail.com.
-- 
Robbie Morrison
Address: Schillerstrasse 85, 10627 Berlin, Germany
Phone: +49.30.612-87617

Barrows, Clayton

unread,
Feb 24, 2021, 10:42:54 AM2/24/21
to Robbie Morrison, openmod-i...@googlegroups.com

@Robbie Morrison thanks for highlighting this.

Berit,

As Robbie mentioned, we’ve put a lot of effort into optimizing our use of HDF5 for time series data storage in PowerSystems.jl. Actually, the HDF5 data storage is managed by InfrastructureSystems.jl which is intended to be a general package that handles the common code specifying different infrastructure systems. So, if the types of functionalities that are enabled for power systems in PowerSystems.jl are of interest to you, it’s possible to build a specification for buildings (e.g. BuildingSystems.jl). If that’s of interest, please let me know and we can see if there are opportunities to help.

Thanks,

Clayton

 

From: openmod-i...@googlegroups.com <openmod-i...@googlegroups.com> on behalf of Robbie Morrison <robbie....@posteo.de>
Date: Wednesday, February 24, 2021 at 8:30 AM
To: openmod-i...@googlegroups.com <openmod-i...@googlegroups.com>
Subject: [openmod-initiative] ...

CAUTION: This email originated from outside of NREL. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Reply all
Reply to author
Forward
0 new messages