Presenters: Rich Wolski and Kerem Celik, UCSB
Increasingly, communities focused on climate change are adopting data
contribution models designed to facilitate shared, collaborative research.
Unlike "open data" where data is simply published on line with a permissive
license, community science is turning towards a contribution model similar
to the open repository model for code implemented by github or bitbucket.
To facilitate community data contribution, curation, sharing, and access
control we discuss Depot: Dependency-Eager Platform of Transformations -- an
open source "data lake" for facilitating the development and sharing of
community-contributed data. Depot is an experimental study of data lake design
that supports full data provenance and versioning as well as data management and
owner-defined access control policies. In particular, it includes policy
mechanisms for data retention and data quotas that span access groups to
facilitate efficient cloud-based implementation. It also uses dependency tracking
and lazy "data materialization" to optimize storage footprint with the goal of
enabling community sustainability as a long-running service.
We describe the Depot abstractions and provide a short demonstration of
the current Depot prototype service using EIA and other publicly available
data sets.
Zoom details:
https://psu.zoom.us/j/94926867289
Password: MMAtalk
Best regards
Adam Brazier (for SCiMMA)