Lineage 2 C6

0 views

Skip to first unread message

Liora Putcha

unread,

Aug 4, 2024, 9:02:32 PM8/4/24

to lugibneonat

Ihave created a new project and all is working as expected. When I generate the docs and serve them, the lineage looks as I would expect on the documentation page, however there is no lineage in the editor. Looks blank like this :

image1191403 19.8 KB

I also have a similar problem, when I run some of the SQL I cannot preview the results. Nothing appears in the Query Results screen. The query runs on the database, it also runs and generates the table using dbt run.

Just a heads up for others that encounter this issue. I shut down VS Code and created a new project, then went and opened the original project and viola, it works again - queries works and lineage works - so not sure of exact order - but try closing folder, shutdown vs code and reopen vs code and open folder with your project.

I am looking to get a sense of the data lineage for all of our datasets/dataflows in an attempt to consolidate and properly organize. Is there a way get this in list form so I do not have to use the data lineage function in each dataflow individually?

There is a recently recent Connector called Domo Governance Datasets that will allow you to generate a Dataflow Details report. This creates a dataset with each of your dataflows, its input datasets, and its output datasets. You can then use this data to create cards or tables to view these relationships instead of using the visual lineage tool.

I do not know if this is intended behavior of data lineage but for me it is weird.

When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream still telling me that the view is based on the two tables.

When I read the documentation, it looks like data lineage is based on history for the view for the last 30 days, but a view does not work with history the same way as a table, so to me this is weird.

Can anyone give some more details on this strange behavior?

This is my script for testing this:

When I checked view defination under details after altering the view to depend only on table_1. I found that view defination was updated with latest view but it's not getting reflected in lineage graph.

For example, imagine you have an online store where you record every purchase ina single SQL table. To make it easier for your analysts to work with the data,you start running jobs that extract information from this single tableand produce smaller tables by region, brand, or sale price. Your analysts thenstart doing the same: they perform further transformations, merging these smallertables with other data sources to produce even more tables.

Dataplex works with the Data Lineage API to identify entries whosefully qualified name matches entities recognized by data lineage.For matched Dataplex entries, you can access the Lineagetab on their details page and view the graph.

In its basic form, lineage is a record of data being transformedfrom sources to targets. Data Lineage API collects that informationand organizes it into a hierarchical data model using the concepts of processes,runs, and events.

A run is an execution of a process. Processes can have multiple runs.Runs contain details such as start and end times, state, or additional attributes.For more information, see therun resource reference.

Events contain a list of links that define which entry was the sourceand which was the target in a particular event. While events are used to computelineage visualization graphs, they are not directly exposed on the Google Cloud console.You can create, read, and delete (but not update) them using Data Lineage API.

Each execution of that SQL statement would constitute an individual run.Runs contain events - these record which tables were used as the sources andwhich as the targets. In this example, the tablescustomer_year and customers are both the sourcefor the target top_customer table.

When you enable Data Lineage API, Google Cloud systems that supportdata lineage start reporting their data movement.Each integrated system can submit lineage information fora different range of data sources. See the following sections for more details onevery supported product.

BigQuery copy, query, and load jobs are representedas processes (click the looking-glass iconon the lineage visualization graph to see processdetails). Each process contains the BigQuery job_idin theattributeslist for the most recent BigQuery job.

Dataplex can create visualization graphs for manually recordedlineage if you use afullyQualifiedNames that match the fullyqualified names of existing Data Catalog entries. If you want to recordlineage for a custom data source, first create acustom Data Catalog entry.

Each process for custom data source may contain sql key in the attributeslist. The value of such key will be used to render code highlight in detailspanel of the data lineage graph. SQL statement will be displayed as it wasprovided. The user is responsible for filtering out sensitive information. Thekey name sql is case-sensitive.

If you're already using OpenLineage to collect lineage information from otherdata sources, you can import OpenLineage events into Dataplex anddisplay these events in the Google Cloud console. For details, seeIntegrate with OpenLineage.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Knowing where your data comes from is key to trusting the data, and knowing who else uses it means you can analyze the impact of changes to data in your environment. The lineage feature in Tableau Catalog helps you do both these things.

When you have a Data Management license and Tableau Catalog enabled, you have access to lineage information for your content. For more information about Tableau Catalog, see "About Tableau Catalog" in the Tableau Server(Link opens in a new window) or Tableau Cloud(Link opens in a new window) Help.

Lineage shows dependencies in relationship to the lineage anchor, which is the asset selected. A lineage anchor can be a database, table, workbook, published data source, virtual connection, virtual connection table, Pulse metric definition, or flow. (In the image above, the anchor is the "Orders (superstore)"data source, and in the image below, the anchor is the "Batters" table.) All the assets below the anchor depend, either directly or indirectly, on the anchor and are called outputs or downstream assets. The assets above the anchor are the assets the anchor is either directly or indirectly dependent on and are called inputs or upstream assets.

Starting in Tableau Cloud June 2023 and Tableau Server 2023.3, lineage pages for data sources include search and filtering (in the top-right of the fields list) that allow you to quickly find fields of interest or relevance.

When you select a field in a data source or a column in a table, the lineage is filtered to show only downstream assets that depend on the field (or column) or upstream inputs to the field (or column), as in this 'Batters' table example that shows the lineage filtered for the 'Games' column:

You can select an upstream or downstream asset in the Lineage pane to see its details. For example, when you select Data Sources, the list of data sources that depend on this table appear to the left of the Lineage pane.

Cube data sources (also known as multidimensional or OLAP data sources) are not supported by Tableau Catalog. Tableau content (such as a data source, view, or workbook) that relies on cube data does not display any cube metadata or cube lineage in Catalog.

When metadata is blocked because of limited permissions, or the asset is in a Personal Space, Catalog still counts the workbook. But instead of seeing some of the sensitive metadata, you see Permissions required. For more information, see Access lineage information.

I'm new to working at SAS, but not working with SAS. I recently worked with a customer who had a need centered on the lineage of SAS Visual Analytics (SAS VA) reports. I found this great SAS Blog post Discover Visual Analytics Report Paths with REST APIs from my colleague @cindywang. Cindy's article was helpful in learning about making API calls for reports and folders. I used those calls, incorporated them into my SAS code, and extended the functionality to filter columns as needed. The purpose of this article is to outline what I did and provide access to the code I created and used.

You may ask yourself: what exactly do we use do we use lineage for? Consider the relationship between data and the data usage to better understand how they're affected by changing the underlaying source data. Also, most companies like to have an insight in when, how and where their data is used and even more important which data isn't used.

Therefore, we will build here a complete script to gather this information from the SAS VA. First, one needs to obtain and understand the information that explains dependencies. While column level lineage is researched and developed within SAS, a role out would probably need to wait until end of year. As an alternative one can use the SAS RA. Here I will explain how to create o the following report:

Using the REST API to collect report information and content has been a subject of many articles, such as how to access information via SAS VA. For instance use the code below to retrieve the content of a report: