Database Concepts

0 views

Skip to first unread message

Alayna Rother

unread,

Aug 4, 2024, 7:41:58 PM8/4/24

to liametosar

Adatabase intends to have a collection of data stored together to serve as multiple applications as possible. Hence a database is often conceived of as a repository of information needed for running certain functions in a corporation or organization. Such a database would permit not only the retrieval of data but also the continuous modification of data needed for the control of operations. It may be possible to search the database to obtain answers to queries or information for planning purposes.

A database should be a repository of data needed for an organization's data processing. That data should be accurate, private, and protected from damage. It should be accurate so that diverse applications with different data requirements can employ the data. Different application programmers and various end-users have different views upon data, which must be derived from a common overall data structure. Their methods of searching and accessing data will be different.

As the database may be viewed through three levels of abstraction, any change at any level can affect other levels' schemas. Since the database keeps on growing, then there may be frequent changes at times. This should not lead to redesigning and re-implementation of the database. The concepts of data independence prove beneficial in such types of contexts.

Neo4j uses a property graph database model.A graph data structure consists of nodes (discrete objects) that can be connected by relationships.Below is the image of a graph with three nodes (the circles) and three relationships (the arrows).

For example, all nodes representing users could be labeled with the label User.With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes.A Suspended label could be used to denote bank accounts that are suspended, and a Seasonal label can denote vegetables that are currently in season.

Relationships always have a direction.However, the direction can be disregarded where it is not useful.This means that there is no need to add duplicate relationships in the opposite direction unless it is needed to describe the data model properly.

To find out which movies Tom Hanks acted in according to the tiny example database, the traversal would start from the Tom Hanks node, follow any ACTED_IN relationships connected to the node, and end up with the Movie node Forrest Gump as the result (see the black lines):

Software engineering jobs are a good fit for people capable of dealing with diverse concepts. These concepts range from requirements analysis, team leadership, project management, scripting languages, testing techniques, to continuous integration, just to name a few. Then, there are a bunch of important database concepts for a software engineer to know: normalization, denormalization, SQL, No-SQL, ERDs, query optimization, etc. The list goes on!

In short, software engineering is for those who can do a little bit of everything while paying a lot of attention and care to each task. If you are one of those people, congratulations! You have a great career full of interesting challenges ahead of you.

Also, a data model needs to fit a set of requirements and serve for generating a database with good performance and data integrity. Normally, software engineers do not spend time creating data models on their own. But they need to be able to sit down with a data modeler and analyze a model. They also need to determine if the model is well done. That is why ERD knowledge stays among the most critical database concepts for a software engineer.

Entity-relationship diagrams (ERDs) provide a graphical representation of the relationships between the objects that make up a data model. Data modelers use them primarily as tools to document and communicate design decisions. As a software engineer, you should, at a minimum, be able to read an ERD, understand its logic, know what it represents, and determine if it correctly reflects the requirements of the software product being developed.

Having all that information, you can ask the data modelers the reasons for their design decisions and verify if those decisions are the right ones. You also need to detect, by looking at an ERD, if the database designer has misinterpreted a requirement or introduced an error in the data model that may lead to a serious bug in the software.

Concepts related to the interpretation of an ERD you should understand are the cardinality of relationships (one-to-one, one-to-many, or many-to-many), the choice of primary keys, the meaning of certain schema structures such as parent-child relationships, and common data warehousing schema types.

ERDs are presented at three different levels: conceptual, logical, and physical. It is usually sufficient for the software engineer to be able to read and understand the conceptual and logical models since the physical models are derived from them, only adding information necessary to implement the model in a particular database system.

On the other hand, with the aid of an intelligent data modeling tool such as Vertabelo, the physical diagrams can be generated automatically from the logical diagrams with complete confidence they are error-free. For this reason, a software engineer usually does not need to worry about reviewing physical diagrams.

One important thing every software engineer needs to be able to see in an ERD is whether the database schema is normalized and whether it needs to be. This brings us to the next item on our checklist.

In transactional databases, normalization ensures database insert/update/delete operations do not produce anomalies or compromise the quality and integrity of the information. For identifying whether a design is normalized, important database concepts for a software engineer include primary keys, foreign keys, attribute dependencies, and surrogate keys.

An example of the problems associated with a non-normalized database is the potential anomalies that may appear in an e-commerce application. Such problems include the same product appearing twice in a sales report with two different names as if they were two different products.

In databases intended for analytical processing rather than transactional processing, you may need to make concessions to normalization so that you improve the performance of certain queries. These concessions are known as denormalization techniques. They usually involve adding some redundant attributes to avoid an excess of lookup tables. This helps deal with queries that add complexity and cost (in time and processing resources) for their resolution by the database engine.

Every software engineer needs to have a basic knowledge of SQL (Structured Query Language) for querying databases or for creating or modifying tables, indexes, views, or even a stored procedure or a trigger when needed. This knowledge allows you to perform some basic database tasks without taking time away from a DBA or database programmer.

To make good use of the database engine for writing efficient queries, you need to grasp the logic behind the relationships between the tables in an ERD. This helps you write the JOINs correctly in queries that include multiple tables. As a basic rule of thumb, fields involved in foreign key relationships between two tables are usually best suited for JOINs between them in a SELECT. For example, in the following ERD, you see the tables PAINTINGS and BUYERS are linked by the fields BUYER_NAME in PAINTINGS and NAME in BUYERS.

There are two other commonly used groups of commands. DML (Data Manipulation Language) is used to insert, delete, or update rows in tables. DDL (Data Definition Language) is used to alter the structure of objects in a database. Examples where DDL is used include creating new tables, creating new fields in a table, and creating a view.

Software engineers have too many things to do on a day-to-day basis for making performance improvement of a query a priority. Ideally, they should delegate this task to a SQL programmer, a DBA, a data modeler, or better yet, all of them together.

But even so, it is good to know what optimizing a query consists of, and in particular, how the creation of an index sometimes reduces the time a query takes to execute from hours to seconds. It is also good to be able to assess whether a DBA is telling the truth or just wants to avoid the task when he/she tells you a query cannot be optimized any more.

Optimizing a query often consists of finding the most time-consuming steps in the query execution plan and creating indexes to speed them up. You can read all about the very basics of index creation and solve some basic database performance problems yourself.

When you analyze a query execution strategy applied by an RDBMS, pay special attention to the steps that require the most work from the RDBMS. These include traversal of all the records in a table (called full table scan) or sequential traversal of the entries in an index (index scan).

When an application sends data to a database, it commonly sends a sequence of insert, update, and delete operations. For example, recording data for an invoice may involve inserting rows in some tables, updating rows in others, and perhaps deleting rows in others.

All of these operations must be completed in their entirety or not run at all. If an error interrupts the sequence of these operations and it fails while executing, the information in the database can become inconsistent. This causes all sorts of data errors.

Transactions avoid this problem by preventing a sequence of interrelated operations from being partially executed. When a transaction is started, any error in the middle of the sequence causes the database to roll back to the time before the start of the sequence, leaving the data as it was before.