One of the most compelling use cases for erwin Data Modeler lies in itsconnection to the increasingly important and evolving realm of datagovernance. erwin Data Modeler Workgroup Edition provides a controlledcentral repository of data definitions and visibility into where and howproperly defined data should be used.
erwin Data Modeler (stylized as erwin but formerly as ERwin) is computer software for data modeling. Originally developed by Logic Works, erwin has since been acquired by a series of companies, before being spun-off by the private equity firm Parallax Capital Partners, which acquired and incorporated it as a separate entity, erwin, Inc., managed by CEO Adam Famularo.
In April 2016, Parallax Capital Partners, a private equity firm, acquired the software from CA Technologies[13] and appointed Adam Famularo as CEO.[14] The company now operates under a new name stylized as erwin, Inc.[15] In September 2016, erwin announced that it had acquired Corso, a British enterprise architecture service provider.[16] In December of the same year, erwin acquired the business process modeling software Casewise, with a plan to integrate the two.[17] In 2017, erwin released its Data Modeler NoSQL, an enterprise-class data modeling solution for MongoDB. In April 2018, NoSQL data modeling support for Couchbase was added.[18] Also that year, erwin launched a data governance solution with impact analysis and integrations to its business process, enterprise architecture and data modeling suites.[19][20] In January 2018, the company acquired data harvesting technology and data governance consulting services company A&P Consulting.[21]
As customers modernize their data estate to Databricks, they are consolidating various data marts and EDWs into a single scalable lakehouse architecture which supports ETL, BI and AI. Usually one of the first steps of this journey starts with taking stock of the existing data models of the legacy systems and rationalizing and converting them into Bronze, Silver and Gold zones of the Databricks Lakehouse architecture. A robust data modeling tool that can visualize, design, deploy and standardize the lakehouse data assets greatly simplifies the lakehouse design and migration journey as well as accelerates the data governance aspects.
We are pleased to announce our partnership and integration of erwin Data Modeler by Quest with the Databricks Lakehouse Platform to serve these needs. Data modelers can now model and visualize lakehouse data structures with erwin Data Modeler to build Logical and Physical data models to fast-track migration to Databricks. Data Modelers and architects can quickly re-engineer or reconstruct databases and their underlying tables and views on Databricks. You can now easily access erwin Data Modeler from Databricks Partner Connect!
A Data Model reverse engineering is creating a data model from an existing database or script. The modeling tool creates a graphical representation of the selected database objects and the relationships between the objects. This graphical representation can be a logical or a physical model.
Overall, reverse engineering is valuable and a foundational step for data modeling. Reverse engineering enables a deeper understanding of an existing system and its components, controlled access to the enterprise design process, full transparency through modeling lifecycle, improvements in efficiency, time and cost savings, and better documentation which leads to better governance objectives.
The above scenarios assume you are working with a single data source, but most enterprises have different data marts and EDWs to support their reporting needs. Imagine your enterprise fits this description and is now embarking on creating a Databricks Lakehouse to consolidate its data platforms in the cloud in one unified platform for BI and AI. In that situation, it will be easy to utilize erwin Data Modeler to convert your existing data models from a legacy EDW to a Databricks data model. In the example below, a data model built for an EDW like SQL Server, Oracle or Teradata can now be implemented in Databricks by altering the target database to Databricks.
As you can see in the marked circle area, this model is built for SQL Server. Now we will convert this model and migrate its deployment to Databricks by changing the target server. This kind of easy conversion of your data models helps organizations quickly and safely migrate data models from legacy or on-prem databases to the cloud and govern those data sets throughout their lifecycle.
Above picture, we tried to convert a legacy SQL server-based data model to Databricks with a few simple steps. This kind of easy migration path allows and helps organizations to quickly and safely migrate their data and assets to Databricks, encourages remote collaboration, and enhances security.
Now let's move on to our final part; once ER Model is ready and approved by the data architecture team, you can quickly generate a .sql file from erwin DM or connect to Databricks and forward engineer this model to Databricks directly.
erwin Data Modeler Mart also supports GitHub. This support enables your DevOps team's requirement to control your scripts to your choice of enterprise source control repositories. Now with Git support, you can easily collaborate with developers and follow version control workflows.
In this blog, we demonstrated how easy it is to create, reverse engineer or forward engineer data models using erwin Data Modeler and create visual data models for migrating your table definitions to Databricks and reverse engineer data models for Data Governance and Semantic layer creation.
Overview
Data model is a conceptual representation of data structures(tables) required for a database and is very powerful in expressing and communicating the business requirements.
A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design.
Data model helps functional and technical team in designing the database. Functional team normally refers to one or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.
Data Modeling Tools
There are a number of data modeling tools to transform business requirements into logical data model, and logical data model to physical data model. From physical data model, these tools can be instructed to generate sql code for creating database.
Tools:What to Learn?
Data modeling tools are the only way through which we can create powerful data models. Following are the various options that we have to know and learn in data modeling tools before start building data models
DM Tools - Erwin
All Fusion Erwin Data Modeler commonly known as Erwin , is a powerful and leading data modeling tool from Computer Associates. Computer Associates delivers several softwares for enterprise management, storage management solutions, security solutions, application life cycle management, data management and business intelligence.
DM Tools - Xcase
Xcase is a powerful and intuitive data modeling tool. It automates database creation and maintenance, making the modeling process simple and visual. The tool provides tight support for the leading DBMS and an impressive set of capabilities.
Using Xcase, data modelers can create data models from scratch or from existing databases using the Reverse Engineering module. Once the model is complete, XcaseForward Engineering module generates all the DDL scripts necessary to create a new database or update an existing one.
Development Cycle
Gathering Business Requirements - First Phase
Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.
Conceptual Data Modeling(CDM) - Second Phase
This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.
Logical Data Modeling(LDM) - Third Phase
This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.
DM Standards
Standardization Needs Modeling data
Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
Nowadays, business to business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.
For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.
Table Names Standardization:
Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organization's standards. If the table name's length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.
Create a Data Model
These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise's requirements, some of the steps may be excluded or included in addition to these.
Data Modeler Role
Business Requirement Analysis:
Interact with Business Analysts to get the functional requirements.
Interact with end users and find out the reporting needs.
Conduct interviews, brain storming discussions with project team to get additional requirements
Gather accurate data by data analysis and functional analysis.
Modeling Reports
From Data Modeling tools, reports can be easily generated for technical and business needs. The reports that have been generated from logical data model and physical data model are called as business reports and technical reports respectively. Most of the data modeling tools provide default reports like subject area reports, entity reports, attribute reports, table reports, column reports, indexing reports, relationship reports etc. The advantage of these reports is, whether they are technical or non-technical, everybody would understand what is going on within the organization.
Conceptual DM
Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.
Enterprise DM
The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.
Logical DM
This is the actual implementation and extension of a conceptual data model. A Logical data model is the version of a data model that represents the business requirements(entire or part) of an organization and is developed before the physical data model.
Physical DM
Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and denormalization are important parameters of a physical model.
Logical vs Physical
When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compares the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.
A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.
Relational(OLTP) DM
Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model.v
Dimensional DM
Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts(measures) are stored in these tables. For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables.
Relational vs Dimensional
Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.
Dimensions
Dimension Table
Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
Location Dimension
In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.
Slowly Changing Dimensions
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.
Slowly Changing Dimensions are often categorized into three types namely Type1 , Type2 and Type3 . The following section deals with how to capture and handling these changes over time.
The "Product" table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1's price changes from $150 to $350. With this information, let us explain the three types of Slowly Changing Dimensions.