PUDL v0.4.0 (open US energy datasets)

Skip to first unread message

Zane Selvans

Sep 27, 2021, 8:32:54 PMSep 27
to openmod initiative
Hey y'all,

This is a bit belated and I suspect many of you saw our Twitter thread, but after more than a year we finally cut a new v0.4.0 release of the PUDL software and data. We integrated both older and newer data for the EIA 860/923, brought in 2019-2020 for the EPA CEMS hourly emissions, 2019 data for the FERC Form 1.  Added experimental integrations of the EIA 861 (2001-2019) and the hourly electricity demand reported in FERC Form 714 (2006-2019).

There are also some new derived data products that are incorporated, including hourly historical electricity demand estimates by state, based on the EIA 861 and FERC 714, along with the collection of counties that made up our best guesses for the historical utility and balancing authority territories.

You can see more details on our new and pretty up to date release notes page of our documentation. If you want to follow along and see what's happening as it hits our main branch that's a good resource to watch.

All the processed outputs are archived on Zenodo and collected within the Catalyst Cooperative Community there, including the historical state demand estimates and the main PUDL data release which includes the bulk data (as SQLite and Parquet) as well as a Docker container with the software environment you might want to use to access the data. The software itself can be installed separately using PyPI or conda (we recommend using conda)

We're trying to get a v0.5.0 out by the end of October that includes all the 2020 data for FERC 1, EIA 860/923, a crosswalk table linking the EIA data to the EPA CEMS robustly (based on the one published recently by EPA), and (on the back end) a much simpler ETL process that outputs directly to SQLite and Parquet, and a better system for managing all of the metadata and database schemas, using Pydantic.

Please feel free to open issues in the GitHub repo if you come across anything that seems broken, or you're having trouble working with the data.


Zane A. Selvans, PhD
Chief Data Wrangler
Catalyst Cooperative
Signal/WhatsApp/SMS: +1 720 443 1363
Twitter: @ZaneSelvans
PGP: 0x64F7B56F3A127B04
Reply all
Reply to author
0 new messages