Streaming and OpenLineage

182 views
Skip to first unread message

Ross Black

unread,
May 3, 2021, 2:01:54 AM5/3/21
to OpenLineage

Hi,

I am only just starting to look at various metadata tools and standards for data governance, lineage, etc.  (so please forgive my lack of understanding).

A significant amount of our data and processing is currently in streaming platforms (Kafka streaming, Apache Flink, Spark, etc).

Since that the Core Model of OpenLineage includes "Run" and "Job" which are more batch-related concepts, I am confused as to how OpenLineage might be applied to streaming data systems.
Is OpenLineage only meant for batch data systems?
Are there ideas/plans on how it would apply to streaming?

Thanks,
Ross


Julien Le Dem

unread,
May 4, 2021, 11:32:04 PM5/4/21
to Ross Black, OpenLineage
Yes, it is planned to cover streaming as well.
Even if we think of a streaming job as running continuously it still has a lifecycle.
A streaming job will still have runs as it gets stopped, upgraded and started again.
The job version will track whether the code has been updated.
A dataset could be a kafka topic. In that sense you might want to capture metadata like the offsets at which you a=started or stopped the job.
Facets are meant to allow capture metadata specific to certain types of jobs or datasets.

The difference is that batch jobs usually consumes and produce a predefined amount of data when a streaming job does not.


--
You received this message because you are subscribed to the Google Groups "OpenLineage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlineage...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlineage/e9bc2e87-ca5c-43ed-b84b-5fef93f6cfafn%40googlegroups.com.

Yaniv Ben-Hamo

unread,
Jun 19, 2022, 3:08:00 PM6/19/22
to OpenLineage
Hey,
Love OpenLineage.
Same question here. At the moment we built our own mechanism on top of our message broker but would love to integrate OpenLineage in the future instead.

Reply all
Reply to author
Forward
0 new messages