Re: Lineage Os

0 views
Skip to first unread message
Message has been deleted

Avery Blaschko

unread,
Jul 17, 2024, 9:07:02 AM7/17/24
to qualphiwunpe

Knowing where your data comes from is key to trusting the data, and knowing who else uses it means you can analyze the impact of changes to data in your environment. The lineage feature in Tableau Catalog helps you do both these things.

lineage os


Descargar Zip https://jfilte.com/2yPGT7



When you have a Data Management license and Tableau Catalog enabled, you have access to lineage information for your content. For more information about Tableau Catalog, see "About Tableau Catalog" in the Tableau Server(Link opens in a new window) or Tableau Cloud(Link opens in a new window) Help.

Lineage shows dependencies in relationship to the lineage anchor, which is the asset selected. A lineage anchor can be a database, table, workbook, published data source, virtual connection, virtual connection table, Pulse metric definition, or flow. (In the image above, the anchor is the "Orders (superstore)"data source, and in the image below, the anchor is the "Batters" table.) All the assets below the anchor depend, either directly or indirectly, on the anchor and are called outputs or downstream assets. The assets above the anchor are the assets the anchor is either directly or indirectly dependent on and are called inputs or upstream assets.

Starting in Tableau Cloud June 2023 and Tableau Server 2023.3, lineage pages for data sources include search and filtering (in the top-right of the fields list) that allow you to quickly find fields of interest or relevance.

When you select a field in a data source or a column in a table, the lineage is filtered to show only downstream assets that depend on the field (or column) or upstream inputs to the field (or column), as in this 'Batters' table example that shows the lineage filtered for the 'Games' column:

You can select an upstream or downstream asset in the Lineage pane to see its details. For example, when you select Data Sources, the list of data sources that depend on this table appear to the left of the Lineage pane.

Cube data sources (also known as multidimensional or OLAP data sources) are not supported by Tableau Catalog. Tableau content (such as a data source, view, or workbook) that relies on cube data does not display any cube metadata or cube lineage in Catalog.

When metadata is blocked because of limited permissions, or the asset is in a Personal Space, Catalog still counts the workbook. But instead of seeing some of the sensitive metadata, you see Permissions required. For more information, see Access lineage information.

You can use Unity Catalog to capture runtime data lineage across queries run on Databricks. Lineage is supported for all languages and is captured down to the column level. Lineage data includes notebooks, workflows, and dashboards related to the query. Lineage can be visualized in Catalog Explorer in near real time and retrieved programmatically using the lineage system tables and the Databricks REST API.

Lineage is aggregated across all workspaces attached to a Unity Catalog metastore. This means that lineage captured in one workspace is visible in any other workspace sharing that metastore. Users must have the correct permissions to view the lineage data. Lineage data is retained for 1 year.

You might need to update your outbound firewall rules to allow for connectivity to the Amazon Kinesis endpoint in the Databricks control plane. Typically this applies if your Databricks workspace is deployed in your own VPC or you use AWS PrivateLink within your Databricks network environment. To get the Kinesis endpoint for your workspace region, see Kinesis addresses. See also Configure a customer-managed VPC and Enable private connectivity using AWS PrivateLink.

Because lineage is computed on a one-year rolling window, lineage data collected more than one year ago is not displayed. For example, if a job or query reads data from table A and writes to table B, the link between table A and table B is displayed for one year only. You can filter lineage data by time frame within the one-year window.

Workflows that use the Jobs API runs submit request are unavailable when viewing lineage. Table and column level lineage is still captured when using the runs submit request, but the link to the run is not captured.

Column lineage is supported only when both the source and target are referenced by table name (Example: select * from ..). Column lineage cannot be captured if the source or the target are addressed by path (Example: select * from delta."s3:///").

Unity Catalog captures lineage from Delta Live Tables pipelines in most cases. However, in some instances, complete lineage coverage cannot be guaranteed, such as when pipelines use the APPLY CHANGES API or TEMPORARY tables.

To view an interactive graph of the data lineage, click See Lineage Graph. By default, one level is displayed in the graph. You can click on the icon on a node to reveal more connections if they are available.

Click on an arrow connecting nodes in the lineage graph to open the Lineage connection panel. The Lineage connection panel shows details about the connection, including source and target tables, notebooks, and workflows.

To show the notebook associated with the dinner table, select the notebook in the Lineage connection panel or close the lineage graph and click Notebooks. To open the notebook in a new tab, click on the notebook name.

Lineage graphs share the same permission model as Unity Catalog. If a user does not have the BROWSE or SELECT privilege on a table, they cannot explore the lineage. Additionally, users can only see notebooks, workflows, and dashboards that they have permission to view. For example, if you run the following commands for a non-admin user userA:

When userA views the lineage graph for the lineage_data.lineagedemo.menu table, they will see the menu table. They will not be able to see information about associated tables, such as the downstream lineage_data.lineagedemo.dinner table. The dinner table is displayed as a masked node in the display to userA, and userA cannot expand the graph to reveal downstream tables from tables they do not have permission to access.

For more information about managing access to securable objects in Unity Catalog, see Manage privileges in Unity Catalog. For more information about managing access to workspace objects like notebooks, workflows, and dashboards, see Access control lists.

To delete lineage data, you must delete the metastore managing the Unity Catalog objects. For more information about deleting the metastore, see Delete a metastore. Data will be deleted within 90 days.

The data lineage API allows you to retrieve table and column lineage. However, if your workspace is in a region that supports the lineage system tables, you should use system table queries instead of the REST API. System tables are a better option for programmatic retrieval of lineage data. Most regions support the lineage system tables.

For example, imagine you have an online store where you record every purchase ina single SQL table. To make it easier for your analysts to work with the data,you start running jobs that extract information from this single tableand produce smaller tables by region, brand, or sale price. Your analysts thenstart doing the same: they perform further transformations, merging these smallertables with other data sources to produce even more tables.

Dataplex works with the Data Lineage API to identify entries whosefully qualified name matches entities recognized by data lineage.For matched Dataplex entries, you can access the Lineagetab on their details page and view the graph.

In its basic form, lineage is a record of data being transformed from sources to targets. Data Lineage API collects that informationand organizes it into a hierarchical data model using the concepts of processes,runs, and events.

A run is an execution of a process. Processes can have multiple runs.Runs contain details such as start and end times, state, or additional attributes.For more information, see therun resource reference.

Events contain a list of links that define which entry was the sourceand which was the target in a particular event. While events are used to computelineage visualization graphs, they are not directly exposed on the Google Cloud console.You can create, read, and delete (but not update) them using Data Lineage API.

Each execution of that SQL statement would constitute an individual run. Runs contain events - these record which tables were used as the sources and which as the targets. In this example, the tablescustomer_year and customers are both the sourcefor the target top_customer table.

When you enable Data Lineage API, Google Cloud systems that support data lineage start reporting their data movement.Each integrated system can submit lineage information for a different range of data sources. See the following sections for more details onevery supported product.

BigQuery copy, query, and load jobs are representedas processes (click the looking-glass iconon the lineage visualization graph to see processdetails). Each process contains the BigQuery job_idin theattributeslist for the most recent BigQuery job.

Dataplex can create visualization graphs for manually recorded lineage if you use afullyQualifiedNames that match the fullyqualified names of existing Data Catalog entries. If you want to recordlineage for a custom data source, first create a custom Data Catalog entry.

Each process for custom data source may contain sql key in the attributeslist. The value of such key will be used to render code highlight in detailspanel of the data lineage graph. SQL statement will be displayed as it wasprovided. The user is responsible for filtering out sensitive information. Thekey name sql is case-sensitive.

d3342ee215
Reply all
Reply to author
Forward
0 new messages