Data Structures For Graphs

1 view

Skip to first unread message

Macedonio Heninger

unread,

Aug 4, 2024, 4:49:46 PM8/4/24

to actothafoo

Graphsin data structures are non-linear data structures made up of a finite number of nodes or vertices and the edges that connect them. Graphs in data structures are used to address real-world problems in which it represents the problem area as a network like telephone networks, circuit networks, and social networks. For example, it can represent a single user as nodes or vertices in a telephone network, while the link between them via telephone represents edges.

If each pair of nodes or vertices in a graph G=(V, E) has only one edge, it is a simple graph. As a result, there is just one edge linking two vertices, depicting one-to-one interactions between two elements.

An undirected graph comprises a set of nodes and links connecting them. The order of the two connected vertices is irrelevant and has no direction. You can form an undirected graph with a finite number of vertices and edges.

It's also known as a directed acyclic graph (DAG), and it's a graph with directed edges but no cycle. It represents the edges using an ordered pair of vertices since it directs the vertices and stores some data.

Graphs in data structures are used to represent the relationships between objects. Every graph consists of a set of points known as vertices or nodes connected by lines known as edges. The vertices in a network represent entities.

The adjacency matrix of a simple labeled graph, also known as the connection matrix, is a matrix with rows and columns labeled by graph vertices and a 1 or 0 in position depending on whether they are adjacent or not.

A finite graph is represented by an adjacency list, which is a collection of unordered lists. Each unordered list describes the set of neighbors of a particular vertex in the graph within an adjacency list.

The process of visiting or updating each vertex in a graph is known as graph traversal. The sequence in which they visit the vertices is used to classify such traversals. Graph traversal is a subset of tree traversal.

"Breadth-first search or BFS "will be your next topic, where you will learn about the breadth-first search algorithm and how to traverse tree and graph data structure using BFS. If you want to learn more about data structures and programming languages, check out simplilearn's Post Graduate Program in Full Stack Web Development Automation Testing Masters might just be what you need. The bootcamp is offered in collaboration with Caltech CTME and will provide you with the work-ready software development skills, industry credentials and global recognition you need to succeed now.

You most likely utilise social networking platforms such as Facebook, LinkedIn, Instagram, and others. A wonderful example of a graph in usage is social media. Graphs are used in social media to hold information about each user. Every user is a node in this case, just like in Graph. Similarly, Google Maps is another application that makes use of graphs. In the case of Google Maps, each place is referred to as a node, and the roads that connect them are referred to as edges.

A graph is a non-linear data structure composed of nodes and edges. They come in a variety of forms. Namely, they are Finite Graphs, Infinite Graphs, Trivial Graphs, Simple Graphs, Multi Graphs, Null Graphs, Complete Graphs, Pseudo Graphs, Regular Graphs, Labeled Graphs, Digraph Graphs, Subgraphs, Connected or Disconnected Graphs, and Cyclic Graphs.

A graph is considered to be complete if there is an edge between every pair of vertices in the graph. In other words, all of the graph's vertices are connected to the remainder of the graph's vertices. A full graph of 'n' vertices has precisely nC2 edges and is written as Kn.

A directed acyclic graph (DAG) is a graph that is directed and has no cycles linking the other edges in computer science and mathematics. This indicates that traversing the complete graph from one edge is impossible. The edges of the directed graph can only move in one direction. The graph is a topological sorting in which each node has a specific order.

A graph is a type of non-linear data structure made up of vertices and edges. Vertices are also known as nodes, while edges are lines or arcs that link any two nodes in the network. In more technical terms, a graph comprises vertices (V) and edges (E). The graph is represented as G(E, V).

A graph is a non-linear data structure made up of vertices (or nodes) linked by edges (or arcs), which can be directed or undirected. Graphs are used in computer science to depict the flow of computation.

Graphs are a popular way to visually depict data connections. A graph's objective is to convey too many or intricate facts to be fully expressed in words and in less space. However, do not use graphs for little quantities of data that may be expressed in a phrase.

Adjacency lists are generally preferred for the representation of sparse graphs, while an adjacency matrix is preferred if the graph is dense; that is, the number of edges E \displaystyle is close to the number of vertices squared, V 2 ^2 , or if one must be able to quickly look up if there is an edge connecting two vertices.[5][6]

The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio.[8][9] The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability. In the following, shared and distributed memory architectures are considered.

In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case,[10] since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory.

Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning[11] But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used.

1D partitioning: Every processor gets n / p \displaystyle n/p vertices and the corresponding outgoing edges. This can be understood as a row-wise or column-wise decomposition of the adjacency matrix. For algorithms operating on this representation, this requires an All-to-All communication step as well as O ( m ) \displaystyle \mathcal O(m) message buffer sizes, as each PE potentially has outgoing edges to every other PE.[12]

Graphs with trillions of edges occur in machine learning, social network analysis, and other areas. Compressed graph representations have been developed to reduce I/O and memory requirements. General techniques such as Huffman coding are applicable, but the adjacency list or adjacency matrix can be processed in specific ways to increase efficiency.[13]

Breadth-first search (BFS) and depth-first search (DFS) are two closely-related approaches that are used for exploring all of the nodes in a given connected component. Both start with an arbitrary node, the "root".[14]

I am currently writing my master's thesis. It's about graphs. My algorithm is ready. But now I have to think about useful data structures to represent the graph and the rest that I need for a good runtime. I am not allowed to use the adjacency matrix because of the large amount of memory. Since I have to check in every iteration whether a certain edge exists, adjacency lists also make no sense.

First I thought about two hash tables nested inside one another. All nodes are stored in the first table and all neighboring nodes in the second.But since I have to be able to choose a random neighbor in my algorithm, that is not optimal either.In addition, I have to be able to save edge weights in every iteration of the algorithm.

For the weights of the edges you have a few options. You can insert tuples into the sets of the the adjacency list where say the first value is the neighboring node and the second value is the weight of the edge connecting those two nodes. The weight can be updated in O(1) time since both the map and the sets are implemented as hashtables.

You can store the degrees of the nodes in a separate map data structure where the keys (implemented as hashtable) are the nodes and the values are the degrees. This will provide O(1) access/delete/update time to each node's degree.

To achieve O(1) time for random edges connected to a node you need to use an additional data structure. The reason is that the edges stored in the sets (as hashtables), won't give us a way to get a random number in O(1). But if we store each node's edges in a list/array like data structure, we can get a random index between 0 and the length of that array in O(1) and access that edge in O(1).

That's a map from each vertex to an array of its neighboring vertices -- i.e., an adjacency list representation of the graph, ensuring the each adjacency list is in a structure that supports random access.

I've considered creating a Vertices table and an Edges table but would building graphs in memory and traversing sub-graphs require a large number of lookups? I'd like to avoid excessive database reads. Is there any other way of persisting a graph?

The answer is unfortunately: Your consideration is completely right in every point. You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once. You have to traverse from Node to Edge to Node to Edge to Node...and so on (Recursively, while SQL is working with Sets).

Relational, Graph oriented, Object oriented, Document based are different types of data structures that meet different requirements. Thats what its all about and why so many different NoSQL Databases (most of them are simple document stores) came up, because it simply makes no sense to organize big data in a relational way.