Graphs Networks And Algorithms Pdf

1 view

Skip to first unread message

Jonelle Rycroft

unread,

Aug 4, 2024, 5:31:39 PM8/4/24

to brasurenten

Itis defined as the number of shortest paths in the graph that passes through the node divided by the total number of shortest paths. The implemented algorithm is described in the paper "A Faster Algorithm for Betweenness Centrality" by Ulrik Brandes of the University of Konstanz.

Matching in bipartite graphs (bipartite matching) is described as a set of edges that are picked in a way to not share an endpoint. Furthermore, maximum matching is such matching of maximum cardinality of the chosen edge set. The algorithm runs in O(V*E) time, where V represents a set of nodes and E represents a set of edges.

This little tool can become extremely handful when calculating assignments between entities. Assigning stuff between two sets of entities is done in a large number of industries, and therefore having a method to find it can make the developing process easier.

As in the real world, the definition of a bridge in graph theory denotes something that divides an entity into multiple components. Thus, more precisely, the bridge in graph theory denotes an edge that, when removed, divides the graph into two separate components.

The notion of community in a graph represents similarly to what it represents in the real world. Different social circles are examples of such communities. Analogously, in graphs, community represents a partition of a graph, ie a set of nodes. M. Girvan and M. E. J. Newman argue that nodes are more strongly connected within a community, i.e. there are more edges, while on the other hand, nodes are sparsely connected between communities themselves.

In graph theory, a cycle represents a path within the graph where only starting and ending nodes are similar. Furthermore, cycles can be double-connected links between neighboring nodes or self-loops.

The simplest concept of a solution for finding a cycle was defined by Robert W. Floyd in his tortoise and hare algorithm, where a hare moves at twice the speed of a tortoise and thus encounters it if there is a cycle. There are many implementations of cycle detection, and among them, the fastest is Professor Donald B. Johnson from the Pennsylvania State University - Finding all the elementary circuits of a directed graph.

Cycles are not only popular in graph structures but also play an important role in number theory and cryptography. Nevertheless, graph theory concepts are used in other disciplines, and cycle detection is placed as an extremely important tool in initial graph analysis.

Of course, the main part of the problem is in assigning a minimum number of labels. There are greedy algorithms that solve the problem, however not optimal. Using dynamic programming, the fastest algorithm guarantees O(2.44 ^ n) complexity. While on the other hand, there are heuristic applications like the one with genetic algorithms.

The notion of similarity can be described in several different ways, but within graph theory, it encompasses several generally accepted definitions. The similarity of graph nodes is based on a comparison of adjacent nodes or the neighborhood structure. These are traditional definitions that take into account only the layout of the graph. If we extend the definition of a node with additional properties, then it is possible to define comparisons based on these properties as well, but this is not the topic of the methods mentioned here.

The result of this type of algorithm is always a pair of nodes and an assigned value indicating the match measure between them. We will mention only the most popular measures based on neighborhood nodes with their brief explanations.

In the domain of centrality measurements, PageRank is arguably the most popular tool. Today, the most popular search engine in the world, Google, owes its popularity solely to this algorithm, developed in the early days by its founders.

If we present nodes as pages and directed edges between them as links, the PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.

PageRank can be used as a measure of influence that can be used on a variety of applications, not just on website rankings. A popular type of PageRank is Personalized PageRank, which is extremely useful in recommendation systems.

This is yet another important graph analytics algorithm. By using a disjoint-set - a data structure that keeps track of non-overlapping sets, the algorithm enables the user to quickly check whether a pair of given nodes are in the same or different connected components. The benefit of having this structure is that a check on a pair of nodes is effectively executed in O(1) time.

The implementation of the disjoint-set data structure and its operations uses the union by rank and path splitting optimizations described in "Worst-case Analysis of Set Union Algorithms", developed by Robert Tarjan and Jan van Leeuwen.

Dynamic Node2Vec is a random walk-based method that creates embeddings for every new node added to the graph. For every new edge, there is a recalculation of probabilities (weights) that are used in walk sampling. A goal of the method is to enforce that the embedding of node v is similar to the embedding of nodes with the ability to reach node v across edges that appeared one before the other.

Many methods, like node2vec, DeepWalk, focus on computing the embedding for static graphs which has its qualities but also some big drawbacks.Networks in practical applications are dynamic and evolve constantly over time. For example, new links are formed (when people make new friends on social networks) and old links can disappear. Moreover, new nodes can be introduced into the graph (e.g., users can join the social network) and create new links to existing nodes.Naively applying one of the static embedding algorithms leads to unsatisfactory performance due to the following challenges:

In the domain of estimating the importance of graph nodes, PageRank is arguably the most popular tool. Today, the most popular search engine in the world, Google, owes its popularity solely to this algorithm, developed in the early days by its founders.

If we present nodes as pages and directed edges between them as links the PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.

The need for its dynamic implementation arose at the moment when nodes and edges arrive in a short period of time. A large number of changes would result in either inconsistent information upon arrival or restarting the algorithm over the entire graph each time the graph changes. Since such changes occur frequently, the dynamic implementation allows the previously processed state to be preserved, and new changes are updated in such a way that only the neighborhood of the arriving entity is processed at a constant time. This saves time and allows us to have updated and accurate information about the new values of the PageRank.

There are also some disadvantages of this approach, and that is that such an approach does not guarantee an explicitly correct solution but its approximation. Such a trade-off is common in computer science but allows fast execution and guarantees that at a large scale such an approximation approaches the correct result.

Valuable work explaining how to quickly calculate these values was developed by Bahmani et. al., engineers from Stanford and Twitter. The paper is worth reading at [Fast Incremental and Personalized PageRank]( -readings/bahmani10pagerank.pdf ?).

To address the hidden relations among the nodes in the graph, especially those not connected directly, community detection can provide help. This familiar graph analytics method is being solved in various different ways. However, demand for scale and speed has increased over the years and therefore led to the construction of dynamic community detection algorithms. To leverage the needs for scalability and speed, community detection lends itself well to dynamic operations for two reasons:

Machine learning methods are based on data. Because of everyday encounters with data that are audio, visual, or textual such as images, video, text, and speech - the machine learning methods that study such structures are making tremendous progress today.

Connection-based data can be displayed as graphs. Such structures are much more complex than images and text due to multiple levels of connectivity in the structure itself which is completely irregular and unpredictable. With the development of neural networks organized in the structure of graphs, the field of graph machine learning is improving.

Applying the same principle used, for example, in images (convolutions) to graphs would be a mistake. Such principles are based on grid structures, while on graphs of arbitrary sizes, complex topologies, and random connections applying the same strategy would result in a disaster.

All convolutional network graph methods are based on message propagation. Such messages carry information through a network composed of nodes and edges of the graph, while each node entity carries its computational unit. The task of each node is to process the information and pass it on to the neighbors.

Link prediction is the process of predicting the probability of connecting the two nodes that were not previously connected in a graph. A wide range of different solutions can be applied to such a problem.

Solving methods range from traditional to machine learning-based. Traditional methods are mostly based either on neighborhood similarity. A link between two nodes is more likely to exist if there are many common neighbors. These are:

On the other hand, such a prediction can be learned from the node endpoints by looking at similarity metrics. The more similar the nodes are, the greater the likelihood of connectivity between them. Cosine similarity and Euclidean distance are one example of such.

Then, there are the most advanced models so far and they are based on graph embeddings, which serve as features for further classification tasks. Likewise, it is possible to drive graph neural network (GNN) methods for the same task.