📣 Community Update
Workflow Orchestration for Biocomputing ~ adopting Flyte @ LatchBio [Kenny Workman - LatchBio]
⚡Life of a Workflow⚡[Katrina Rogan - Union.ai]
Notes (Sandra):
Community update: Slides / Recording
Ketan starts with an update on Engineering Labs, Flyte’s Hackathon with the MLOPs community. Teams are set, projects are underway, and winners will be acknowledged on April 5th. Save the date for a MeetUp with MLOps on April 20th!
Ketan then gives a reminder on office hours, a sneak-peek into upcoming talks, and a quick summary of the upcoming v0.19.3 release, namely UI, Flytekit and Flyteadmin improvements.
LatchBio: Recording
Kenny Workman introduces LatchBio - a 10-person company founded out of Berkley, that builds software and data infrastructure for companies in the Biotech industry. He explains what they do at LatchBio, how it touches Flyte, and how workflow orchestration is motivated by needs in biology.
Kenny gives an overview of the pricing and accessibility trends in computing classes over time, the performance of file systems, and similarly the rate of “reading” DNA and “writing” DNA, which are all getting cheaper.
The foundations for LatchBio are in synthetic biology, where biology is reduced to constituent building blocks and engineering principles are applied (like the Biobricks project), where genetic components are taken and treated as Legos, then later rebuilt. The logical genetic components are then translated into things that exist in organisms.
Kenny talks a bit about the rise of synthetic biology, metabolic engineering, and genetic circuitry. The philosophy in the thought shift in biology is where Latch is resting right now.
He then explains more about:
Tools: ways to measure cell properties, create libraries of genetic designs
LatchBio’s Market: Cell & Gene Therapy companies
Biological data is big, heterogenous, difficult to interpret, and could take days to process.
LatchBio’s platform enables this computation at scale with heterogenous biological data for the companies, and relies on Flyte’s workflow execution engine for highly scalable Kubernetes-native Type-Safe workflows to generate no-code interfaces for the biologists to use directly, dynamically from Flytekit, expose bioinformatics tools for end-users, and provide a rich in-browser suite of visualization and file manipulation tools.
On top of the platform, biologists are given their own toolkits to dynamically generate Latch interfaces. Components include:
A Managed data store, where users upload a DNA file to the platform, run a Flyte workflow automatically to parse, run quality control, generate visualizations on top of that file. Users can simply double click to see visualizations.
Users’ own file system, support for their own network protocol, and their own absolute pass system to pass object files to workflows directly
Complied Type Safe UIs: for relevant HTML-native validation for rich client-side type validation
Serverless fine-grained scheduling (which is why Flyte and Kubernetes are so important): since there is the need for per-task control over how to schedule tasks for genetic assembly operations.
Kenny then shows a rough architecture of the system and how Flyte fits in. Reasons for choosing Flyte were mainly:
Flyte being K8s native
Language-independent type safety
Independently deployed tasks
Flyte’s open-source nature and well-maintained codebase
Kenny also touches on parameter metadata modifications and user-defined construction of biological types, motivating docstring metadata, and gives an example of internal modifications.
Latch’s contributions to Flyte include the following:
Typing metadata - https://github.com/flyteorg/flytekit/pull/759
Docstring metadata - https://github.com/flyteorg/flyte/pull/1856
Union Types - https://github.com/flyteorg/flyte/pull/1926
Kenny then confirms LatchBio’s confidence in the Flyte model as the de facto workflow orchestration engine, with respect to architecting, deployment of workflows, and scheduling on per-task granularity.
He wraps up with LatchBio’s future with Flyte in developing server-side containerization to improve the local Docker construction experience and plans to release specialized container build services as open-source soon.
A discussion follows about Private / Protected data, how the current architecture allows files to be moved without being accessed, how metadata is not stored, but instead the type interface is what is stored at LatchBio.
⚡Life of a Workflow⚡: Slides / Recording
Katrina dives into what happens to the Flyte workflow when users finish writing it and want to execute it, typically writing/authoring, registering and executing a workflow.
Detailed steps include:
Writing/testing python code
Registering - which is done by a compilation pass (for validity), versioning (to produce an artifact that we can share in our database), and visualization (for a concrete executable definition of the workflow)
Serializing to protobuf, to build container images
Compiling workflow protobuf representations, run site analysis and validation, to produce executable artifacts to be stored in the database
Katrina then demos Flytekit serialize, and an example of a versioned workflow saved in the databased. She then demos several examples of executing a Flyte workflow using a Kubenetes extension (Kubernetes CRD), namely:
Python example using FlytePropeller and Kubenetes pod
SQL query example (remote service example EXTERNAL SERVICE CALL)
Dynamic workflow: dynamic node
Resources:
Understand data flow between tasks
Overview of the workflow state machine
A discussion about backend plugins follows.