Question on datasets

283 views
Skip to first unread message

Ines M. Azevedo

unread,
Mar 3, 2021, 5:52:50 PM3/3/21
to openmod-i...@googlegroups.com
Hi all, 

I suspect this was in previous emails, but I am looking for a publicly available dataset that includes all current power plants capacity, initial date of operation, country/regions, heat rate, and CO2 emissions or emissions rate. Could someone point me to what would be a good data source? 

Thank you!

-----------------------------------------------------------------------------------------------
Inês Azevedo
Associate Professor
Energy Resources Engineering 
School of Earth, Energy & Environmental Sciences
Green Earth Sciences Bldg. Rm 062
367 Panama Street, Stanford, CA 94305
Stanford University
phone: 6504970818
----------------------------------------------------------------------------------------------


Robbie Morrison

unread,
Mar 4, 2021, 11:22:30 AM3/4/21
to openmod-i...@googlegroups.com

Hi Inês, all


I can only talk in general terms.  I guess you are looking for global data.  Wikipedia has a list of open energy system databases, some span the planet:

Try also:

Not listed on that wikipedia page are the more recent PowerSystems.jl data libraries, the PowerGenome project, and the Open Energy Outlook relational database — all currently limited to the United States as I understand it.


Maybe there is a project on OpenStreetMap or the DBpedia folk can harvest from wikipedia infoboxes, specifically the  {{infobox power station}}  template?  See for example:

Tom Brown posted to this list two days ago on 2 March 2021 suggesting that this community embark on a data offensive:

I guess that is the real answer.  To roll up our collective sleeves!  There is good infrastructure over at the Open Energy Platform with guaranteed exemplary computer science support for 10 years — although others will know the mechanics of adding and curating data there better than me:

openmod infrastructure that can also be used is listed here (personally I think wikis have a limited role to play):

Someone or some few need to take the lead?


HTH, Robbie

--
You received this message because you are subscribed to the Google Groups "openmod initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openmod-initiat...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/BYAPR02MB461370E190C59D003455B989B8989%40BYAPR02MB4613.namprd02.prod.outlook.com.
-- 
Robbie Morrison
Address: Schillerstrasse 85, 10627 Berlin, Germany
Phone: +49.30.612-87617

Robbie Morrison

unread,
Mar 4, 2021, 1:07:47 PM3/4/21
to openmod-i...@googlegroups.com

Hi again


For completeness, to add this Global Electrification Program (GEP) led by the World Bank and ESMAP, covered in a post from Tom Brown a week back:

Robbie

Robbie Morrison

unread,
Mar 5, 2021, 6:20:49 AM3/5/21
to openmod-i...@googlegroups.com

Hi once again (and sorry for the drip‑feed)


Another good online information resource is PowerExplorer:

PowerExporer has global coverage and offers a list of power plants, originally sourced from this CSV dataset:

The PowerExplorer partners include the World Resources Institute, Google, and KTH Sweden.  Google provides funding too (30% if I recall correctly).


PowerExplorer also harvests from public portals in Europe and unilaterally adds Creative Commons CC‑BY‑4.0 licenses.  So if you expressly need open data covering the European energy sector, this might be a good route to take.


---


For those interested in the legal background, databases in Europe are subject to protection under the 1996 Database Directive 96/9/EC.  This widely‑regarded as ill‑considered legislation provides automatic catch‑22 protection to public databases located specifically within the EEA.  Catch‑22‑based because the thresholds for protection (based on direct investment and exposure to commercial risk) and resultant limits on download and reuse (extraction and re‑utilization) cannot be determined by users.  Nor have legislators or courts remotely identified what "substantial" might mean in these contexts, should users have access to all the necessary technical and commercial detail.  Returning to PowerExplorer, a legal department from one of the affected European portals contacted the site over its licensing policy and was rebuffed (sorry, that is all I can say in public).  The United Kingdom also has database protection written into its current copyright legislation, but, post‑Brexit, is now at liberty to remove those provisions.  On the other hand, the UK adopts a "sweat‑of‑the‑brow" threshold for copyright (unlike the United States or Germany), so copyright, rather than database projection, may still apply to public datasets served from within the UK.  In contrast, the United States Copyright Office (2017) explicitly states that it is highly unlikely that publicly‑served datasets are protected by copyright (§727.1 and following):

"The notion of technical databases being creative is largely mutually exclusive.  Orthodox database are highly structured, but they are not much selected and arranged.  Nonorthodox databases, while not highly structured, are similarly even less likely to be selected and arranged."

Moreover, the United States never implemented database protection (despite three attempts). But two further considerations may apply.  There are carve‑outs for scientific research, particularly in Europe.  And quasi‑property rights, like misappropriation, are also available as civil remedies.  Exceptions for science are useful enough within that domain, but researchers cannot then republish using open licenses to facilitate community curation or contribute to the wider public good.


In any case, an entirely workable solution is to apply CC‑BY‑4.0 licenses to datasets and CC0‑1.0 waivers to the associated metadata.  There is a common effort within this community to secure those instruments on public sector information and information under statutory reporting within Europe.  That effort includes liaising with the European Commission, the JRC, ENTSO‑E, and various Regional Security Coordinators.  Please note that this entire email covers only public non‑personal data.  Indeed, private data, whether personal or commercial, introduces a whole raft of other considerations and difficulties.


Finally, having a number of portals collecting and serving overlapping secondary information is normally poor practice.  And I wonder if a crowd‑curated repository like Wikidata might represent a better solution.  Wikidata data is licensed CC0‑1.0.  Those interested in this approach might like to dig into the DBpedia project based in Germany.


with best wishes, Robbie

Matteo De Felice

unread,
Mar 6, 2021, 5:40:45 AM3/6/21
to openmod initiative
Good morning,
we have released at the JRC a database for European power plants: https://zenodo.org/record/3574566#.YENbv9zTWt8

Best regards,

Robbie Morrison

unread,
Mar 6, 2021, 12:54:52 PM3/6/21
to openmod-i...@googlegroups.com

Hi again all

A few complete references follow, some having been mentioned earlier as URLs.

Note that Gotzens et al (2019) describe a method of stripping together different power plant databases to produce a least wrong version!  Their python code is on GitHub IIRC.

Just to note that some major portals are two hops from the primary source with no way of propagating corrections back upstream.  And probably no methods in place of updating changes from upstream either.  Moreover researchers will make their own combined and corrected lists and doubtless deposit them somewhere static for reproducibility.  And then someone else will use that particular deposit as a starting point because it represents either the easiest option or the least worst option.  There has to be a better way?

Robbie

References

Byers, Logan (31 May 2018). Release Announcement: Global Power Plant Database (WRI, Google, and many others). Open Energy Modelling Initiative forum. Germany. Creative Commons CC‑BY‑4.0.

Byers, Logan, Johannes Friedrich, Roman Hennig, Aaron Kressig, Xinyue Li, Laura Malaguzzi Valeri, and Colin McCormick (April 2018). A global database of power plants — Technical note. Washington DC, USA: World Resources Institute (WRI). Creative Commons CC‑BY‑4.0.

Gotzens, Fabian, Heidi Heinrichs, Jonas Hörsch, and Fabian Hofmann (1 January 2019). "Performing energy modelling exercises in a transparent way: the issue of data quality in power plant databases". Energy Strategy Reviews. 23: 1–12. ISSN 2211-467X. doi:10.1016/j.esr.2018.11.004. Creative Commons CC-BY-NC-ND-4.0.

Kanellopoulos, Kostis, Matteo De Felice, Ignacio Hidalgo, A Bocin, and A Uihlein (2019). The Joint Research Centre power plant database (JRC-PPDB) — Version 0.9 — EUR 29806 EN. Luxembourg: Publications Office of the European Union. ISBN 978-92-76-08849-3. doi:10.2760/5281. Catalogue number KJ-NA-29806-EN-N. Reuse according to Commission Decision 2011/833/EU.

--
You received this message because you are subscribed to the Google Groups "openmod initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openmod-initiat...@googlegroups.com.

dan.t...@sheffield.ac.uk

unread,
Mar 9, 2021, 5:45:30 PM3/9/21
to Robbie Morrison, openmod-i...@googlegroups.com

Hi,

 

You might find some of what you are looking for in these data sets from Ember.

https://ember-climate.org/project/global-power-2020/

 

They combine various datasets for monthly and annual outputs.  Ember is part of the Subak accelerator for non-profits, where I do some work, so if you are interested to know more, I could probably put you in touch with relevant people.

 

Regards,

Dan

Robbie Morrison

unread,
Mar 10, 2021, 2:03:55 PM3/10/21
to openmod-i...@googlegroups.com, Dan Travers (Sheffield Uni)

Hi Dan, all

I recorded key parts of this thread on the openmod forum for future reference:

(Note too the cute contents feature on the right side, a new addition but available only for the first post in the topic.)

On 09/03/2021 23.45, dan.t...@sheffield.ac.uk wrote:

Hi,

 

You might find some of what you are looking for in these data sets from Ember.

https://ember-climate.org/project/global-power-2020/

 

They combine various datasets for monthly and annual outputs.  Ember is part of the Subak accelerator for non-profits, where I do some work, so if you are interested to know more, I could probably put you in touch with relevant people.

I have talked to Ember in the past about a closer involvement with this community and about the merits of open data and the use of suitable licensing.  Perhaps @Dan could prod them again on these issues?

For some reason, the United Kingdom seems to be a complete desert when it comes to CC‑BY‑4.0 licenses on energy sector datasets.

cheers, Robbie

Robbie Morrison

unread,
Mar 13, 2021, 10:00:40 AM3/13/21
to openmod-i...@googlegroups.com, Jack Kelly (Open Climate Fix)

Hello all

My rather intemperate remark about the United Kingdom being a CC‑BY‑4.0 licensing desert for energy sector datasets is now being pursued on twitter and elsewhere:

[twitter-screenshot-1]

If you have something to contribute, either respond to this thread, cheep over at twitter, or reply to me offlist.

So far just one dataset published in 2017 from a PhD project on household demand disaggregation.  And some cleaned and aggregated distribution transformer block datasets in the pipeline for CC‑BY‑4.0 licensing — with the type of licensing to be applied to the raw logs currently subject to discussion with the relevant network operator.

I will summarize the results when the various searches finish and report back here.

Lack of CC‑BY‑4.0 data licensing is a particular problem for the United Kingdom with its effort‑based (or sweat‑of‑the‑brow) threshold for copyright and the presence of 96/9/EC database protection.  The opposite applies to the United States where the US Copyright Office stated in 2017 that computer databases are very unlikely to meet the copyright threshold for creativity — and where sui generis database protection was never enacted.  European Union countries fall somewhere in between — but also with conflicting rules governing their public sector information.  In contrast, work produced by US federal employees is public domain by default, at least internally.

If a dataset is legally encumbered and not open licensed, then you — as someone who has downloaded it from a website or portal or otherwise obtained a copy — have no right to re‑publish it in original, modified, or combined form.  Some categories of scientific usage and other usage may be excepted within limits that do not extend to making material available for anonymous download — and which therefore rules out deposits on zenodo to support open science.  Determining whether a particular dataset is legally encumbered is normally a catch‑22 due to the way copyright legislation is worded.

with best wishes, Robbie

Reply all
Reply to author
Forward
0 new messages