DatasetFacets vs InputFacets and OutputFacets

39 views
Skip to first unread message

Julien Le Dem

unread,
Feb 17, 2021, 7:41:44 PM2/17/21
to OpenLineage
Following up on a previous discussion:
This proposal and the accompanying PR add the notion of InputFacets and OutputFacets: https://github.com/OpenLineage/OpenLineage/issues/20
In summary, we are collecting metadata about jobs and datasets.
At the Job level, when it’s fairly static metadata (not changing every run, like the current code version of the job) it goes in a JobFacet. When it is dynamic and changes every run (like the schedule time of the run), it goes in a RunFacet.
This proposal is adding the same notion at the Dataset level: when it is static and doesn’t change every run (like the dataset schema) it goes in a Dataset facet. When it is dynamic and changes every run (like the input time interval of the dataset being read, or the statistics of the dataset being written) it goes in an inputFacet or an outputFacet.
This enables Job and Dataset versioning logic, to keep track of what changes in the definition of something vs runtime changes

Julien Le Dem

unread,
Feb 19, 2021, 2:37:25 PM2/19/21
to OpenLineage
Reply all
Reply to author
Forward
0 new messages