Graph Theory in Data structure and Algorithms

Vineet RajVineet Raj
16 min read

Graphs ! an important from DSA for interview preparation point of view as it’s assumed to be most complicated topic that most learner fail to recognize patterns of questions or you can say they are not able to relate other topics like tree,matrices etc to correlate it with graphs but but in this article we will be starting from basic and then will be going beyond basics by that i mean will try to cover all important algorithms and problems on graph theory however for problems will be writing problem title and basic idea or you can say an hint to how can you approach it and then will be attaching a link if want to see complete solution. for now solutions are in only c++

Introduction

So here we are introduction to graph a non-linear data structure, or you can say non linear way to represent data

A graph consists of Nodes and Edges

Further graphs are categorized in multiple types

  • Directed Graph : When edges are directed

  • Weighted graph : When edges have a weight assigned to it

  • Cyclic Graph : when nodes are connected such a way that when you start from one node you can reach that node again after traversing all the nodes

  • Acyclic Graph : Opposite to cyclic one i.e when you can’t reach at initial node

graph_component_and_types

In above image it’s shown types of graph along with components


Intuition

Consider it as a state now think of all the cities as node and roads as edge now

  • If you can move on road from going cityX → cityY and cityY → cityX then it could mean graph is undirected graph because of those undirected edges

  • If you can go from cityX → cityY but if cityY → cityX is not possible then it means it’s undirected graph

  • Now if you have to pay taxes to travel on roads then it means a weighted edge or graph is a weighted graph

  • And if you go like cityX → cityY → cityZ → cityX : it means you can reach back to cityX without touching other cities twice A cyclic graph is here ( in above figure there are cycles in undirected graph but for directed one it’s not forming a single cycle )

  • If no cycles then Acyclic graph

So you can see whatever it name suggests just add that attribute to edge or node and graph becomes that

Degree of graph

$$graphDegree = \sum_{i=start}^{end} nodeDegree$$

So here nodeDegree means number of edges connected to ith node

For Directed Graph it could be said as

  • Indegree : for incoming edge to node

  • outdegree : for outgoing edge to node

Example :

How we take input and store graphs ?

Adjacency Matrix : when we store data in a 2d matrix of size n X n where n is number of nodes

 // For Weighted , undirected graph
{
  int numberOfNodes = 0;
  cin >> numberOfNodes;

  vector<int> adjMatrix[numberOfNodes];

  int numberOfEdges = 0;
  cin >> numberOfEdges;

  for (int i = 0; i < numberOfEdges; i++)
  {
    // input in format startingNodeOfEdge , ending node of edge , weight of edge
    int starting, ending, weight;  // for unweighted graph, edge weight is assumed 1
    cin >> starting >> ending >> weight;

    adjMatrix[starting][ending] = weight;
    adjMatrix[ending][starting] = weight; // if it is a directed graph then we will not do this
  }
}

Analysis

Time Complexity : O(E)

Space Complexity : O(N²) {best,avg,optimal case }

Adjacency Lists : Here we store a vector mapped to every node storing the info about which nodes are connected to which node

{
  // undirected and unweighted graph
  int numberOfNodes = 0;
  cin >> numberOfNodes;

  vector<vector<int>> adjList(numberOfNodes);

  int numberOfEdges = 0;
  cin >> numberOfEdges;

  for (int i = 0; i < numberOfEdges; i++)
  {
    // input in format startingNodeOfEdge , ending node of edge
    int starting, ending; 
    cin >> starting >> ending;

    adjList[starting].push_back(ending);
    adjList[ending].push_back(starting);  // if it is a directed graph then we will not do this
  }
}

Analysis

Time Complexity : O(E)

Space Complexity : O(N²) {worst case : when all the nodes are connected to each other }

Implement adjList for weighted graphs

Hint ????

For a weighted graph instead of mapping nodes with each node we could have mapped it with pair of node and it’s weight


Algorithms related to graphs

Note : Just remember if applying graph you would need an Array to track visited nodes a queue or a stack

Traversal Technique : Breadth First Search ( BFS )

Focus on word breadth !

It means

  • if we arrive at any node then we will go through each and every neighbour node first say neighbours are at level 1

  • then we get back to first neighbour

  • Now we apply step one with assuming the first neighbour at level 1 is our starting node

  • We travers all node at level 2 now when done

  • when we traverse back we go back to the second neighbour of level 1

Implementation


   vector<int> bfsOfGraph(int V, vector<int> adj[]) {
        // to keep track of visited nodes
        vector<bool> visited(V,false);
        vector<int> ans;

        // this loop will take care of disconnetcted components
        for(int i = 0; i < V; i++){
            // if a node is not visited means there is one graph component is there which is not traversed
            if(!visited[i]){
                //to cover that one component
                queue<int> q;
                q.push(i); // assumed that one unvisited node as source node
                ans.push_back(i);
                visited[i] = true;

                // now traversing it's negihbouring nodes
                while(!q.empty()){
                    int currNode = q.front();
                    q.pop();

                    for(auto it:adj[currNode]){
                        if(!visited[it]){
                            q.push(it);
                            ans.push_back(it);
                            visited[it] = true;
                        }
                    }
                }
            }
        }

        return ans;
    }

Focus on word Depth

It means we will do something like inorder traversal of tree, i.e

  • Will take a source node

  • go to it’s first neighbouring node which is not visited

  • Will assume it is our source node and then will go to it’s first not visited neighbouring node

  • and will keep going to depth till we don’t get neighbouring nodes

  • and then when we don’t have any node to visit we trace back

  • and try to cover other non visited nodes

    void dfs(
        int node,
        vector<int> adj[],
        vector<bool> &visited,
        vector<int> &traversal
    ){
        visited[node] = true;

        traversal.push_back(node);

        for(auto it:adj[node]){
            if(!visited[it]){
                // calling the recursionjust after finding , first unvisited node
                dfs(it,adj,visited,traversal);
            }
        }
    }

// got number of node and adj list as input to the function
    vector<int> dfsOfGraph(int V, vector<int> adj[]) {
        vector<bool> visited(V,false);
        vector<int> traversal;
        // traversing visited array to track unvisted graph nodes
        for(int i = 0; i < V;i++){
            if(!visited[i]){
                // for unvisted node 
                // taking it as a source and traversing the graph such that it visits all connected nodes
                dfs(i,adj,visited,traversal);
            }
        }
        return traversal;
    }

Topological Sorting : using DFS and stack

Topological order is in which if node u is connected to node v then u must appear before node v, no matter how earlier but must be earlier then node v

Note : It is only possible for a directed acyclic graph ( DAG ) , as for others it will not be possible to print the order

Here we are just

  • using a stack that will keep track of nodes in which they are traversed

  • Last or terminal node will be inserted into stack first as we are inserting node after DFS is complete

  • Later will just omit the stack and will print their order in which stack popped the set of nodes

    void dfs(int node,vector<int> adj[],stack <int> &st,int vis[]){
        vis[node] = 1;
        for(auto it:adj[node]){
            if(!vis[it])  dfs(it,adj,st,vis);
        }
        // normal DFS with this additional stack call
        st.push(node);
    }
    vector<int> dfsTopo(int V,vector<int> adj[]){
        int vis[V] = {0};
        stack<int> st;

        // traversing like normal DFS but with additional argument of stack
        for(int i = 0; i < V; i++){
            if(vis[i] == 0){
                dfs(i,adj,st,vis);
            }
        }

        // storing ther order when Traversal is completed
        vector<int> order;
        while(!st.empty()){
            order.push_back(st.top());
            st.pop();
        }

        return order;
    }

Topological Sorting : Khan’s Algo. ( using BFS )

Topological order is in which if node u is connected to node v then u must appear before node v, no matter how earlier but must be earlier then node v

Note : It is only possible for a directed acyclic graph ( DAG ) , as for others it will not be possible to print the order

Here additional ideas are

  • Store count of indegree edges as if it i zero means there are not any other node on which current node depends

  • And if a node does not depends on any other node then it means

    • The node is root node

    • Parent nodes are already processed

  • Hence we can start processing and recording those nodes with zero indegree

  • rest is simple BFS

 vector<int> khansAlgo(int V,vector<int> adj[]){
        // to track indegree edges to track if a node is depends on other node
        int indegreeEdges[V] = {0};
        int vis[V] = {0};

        for(int i = 0; i < V;i++){
            for(auto node:adj[i]){
                indegreeEdges[node]++;
            }
        }

        vector<int> ans;

        // normal bfs with addition of indegreeEdges array 
        queue<int> q;
        for(int i = 0; i < V; i++){
            if(indegreeEdges[i] == 0){
                q.push(i);
                vis[i] = 1;
            }
        }

        while(!q.empty()){
            int currNode = q.front();

            ans.push_back(currNode);
            q.pop();

            for(int it:adj[currNode]){
                if(!vis[it]){
                    indegreeEdges[it]--;
                    if(indegreeEdges[it] == 0){
                        q.push(it);
                        vis[it]=1;
                    }
                }
            }
        }

        return ans;
    }

Disjoint Set Union

» Why ?

» Because , sometime we don’t have to print path between two nodes we just have to tell that if a path is possible or not. And BFS or DFS takes significant time for doing this like O(E+V) time complexity for processing each of the pair so if n pair then O(n*(E + V ) ) which is waste here because we just had to say yes or no for weather path exist or not

Here Comes the Disjoint Set Union in picture that takes constant time or O( 4 * alpha ) Time complxity where alpha is almost equal to 1 hence Constant Time

How it Works ?

It stores the parent or ultimate parent of each node and say if ultimate parent of both nodes is different than it says path not possible else it’s possible

And for parent it is decided based on rank of each node or size of each node like

  • if size_u > size_v

    • u will be parent of v
  • if size_u == size_v

    • any one can become anyone’s parent but

    • one that became parent , you must increase it’s rank or size by one.

» keep in mind that we never compare parent we compare parents of parent until node becomes parent of itself or you can say ultimate parent of node

» We can even store ultimate parent of node in our parent array as we don’t need to know direct parent we just want to know root of that cluster or tree or whatever data structure it’s forming, and when we store ultimate parent directly it is called Path Compression

How to use it ?

Basically it’s a class that comes with 5 function or methods

  • DisjointSet(int n) : it initializes required arrays and fills in req. values

  • findUPar(int u) : it returns ultimate parent of u

  • find(int u, int v) : it returns a boolean value for weather u and v are connected by a path or not

  • unionByRank(int u, int v) : it joins clusters whose roots are node u and node v based on there rank

  • unionByRank(int u, int v) : it joins clusters whose roots are node u and node v based on there size

class DisjointSet {
    vector<int> rank,parent,size;

public:
    DisjointSet(int n) {
        rank.resize(n+1,0);
        parent.resize(n+1);       
        size.resize(n + 1, 1);
        for(int i = 0;i < n;i++){
            parent[i] = i;
        }
    }

    int findUPar(int node){
        if(node == parent[node])
            return node;

        return parent[node] = findUPar(parent[node]);
    }

    bool find(int u, int v) {
        return (findUPar(u) == findUPar(v));
    }

    void unionByRank(int u, int v) {
        int ulp_u = findUPar(u);
        int ulp_v = findUPar(v);

        if(ulp_u == ulp_v) return;

        if(rank[ulp_u] < rank[ulp_v]){
            parent[ulp_u] = ulp_v;
        }else if(rank[ulp_u] > rank[ulp_v]){
            parent[ulp_v] = ulp_u;
        }else{
            parent[ulp_v] = ulp_u;
            rank[ulp_u]++;
        }
    }

    void unionBySize(int u, int v) {
        int ulp_u = findUPar(u);
       int ulp_v = findUPar(v);

       if (ulp_u == ulp_v) return;

       if (size[ulp_u] < size[ulp_v]) {
           parent[ulp_u] = ulp_v;

           size[ulp_v] += size[ulp_u];
       }
       else {
           parent[ulp_v] = ulp_u;
           size[ulp_u] += size[ulp_v];

       }
    }
};

Spanning Tree

A spanning tree is a subset of a

  • weighted graph

  • in which there are N nodes(i.e. all the nodes present in the original graph) and

  • N-1 edges and

  • all nodes are reachable from each other.

Example

For above graph below is a possible spanning tree

» Note : If we draw all possible spanning trees and compared the sum of there edge weights then spanning tree with minimum sum of edge weights is called a Minimum Spanning Tree or MST


Minimum Spanning Tree : Prim’s Algo.

It is same as doing BFS on a connected graph as a Spanning tree exist only for connected graphs

Difference with BFS is

  • Instead of queue we use priority queue ( min-heap )

  • It helps us to visit a node only when reachable at minimum cost from certain node

 int primsMST(int V, vector<vector<int>> adj[]) {
        // your min heap that will record all the nodes with wt from parent 
       priority_queue<P,vector<P>,greater<P>> pq;  // {weight, node} 
       pq.push({0,0}); // inserted first node as it's wt from parent will be 0 as no parent initally

       int sum = 0;
       vector<bool> vis(V,false); // visited array to keep track of node taht are vsted once
       while(!pq.empty()){
            int wt = pq.top().first;
            int node = pq.top().second;

            pq.pop(); // popping element of smallest wt from it's parent

            // if node is visted previously means we recorded an edge with 
            // comarativly smaller weight and hence we will not record the current edge and will pop it out
            if(vis[node]) continue; 

            // if it was not recorded previously means it's the shortest edge weight possible
            vis[node] = true;
            sum+=wt; // record th weight

            // push neighbour node with wt according to dist between neighours and current node as it's parent
            for(auto it:adj[node]){
                if(!vis[it[0]]){
                    pq.push({it[1],it[0]}); // if neighbour is unvisted then push it in min-heap
                }
            }
       }

       return sum;
    }

Minimum Spanning Tree : Kruskal Algo.

  • For this we use DisjointSet ( already discussed in article )
int kruskalMST(int V,vector<vector<int>> adj[]){
        vector<pair<int, pair<int, int>>> edges;
        for (int i = 0; i < V; i++) {
            for (auto it : adj[i]) {
                int v = it[0]; 
                int wt = it[1];
                int u = i;
                edges.push_back({wt, {u, v}});
            }
        }

        DisjointSet ds(V);
        sort(edges.begin(), edges.end());
        int sum = 0;

        for (auto it : edges) {
            int wt = it.first; 
            int u = it.second.first; 
            int v = it.second.second; 

            if (ds.findUPar(u) != ds.findUPar(v)) {
                sum += wt;
                ds.unionBySize(u, v);
            }
        }

        return sum;
    }

Problems

Number of provinces : code

Same as BFS or DFS

  • Just with additional counter

  • Increment counter when going to traverse new component

Number of islands : code

Simple traversal but with

  • queue that show cell address like {row,col} means queue of pair<int,int>

  • And it’s adj node would be based on delta values from below array

    int dRow[] = {-1, -1, -1, 0, 0, 1, 1, 1};
    int dCol[] = {-1, 0, 1, -1, 1, -1, 0, 1};

Flood fill algorithm : code

Simple traversal

  • Here we don’t need any visited array

  • just use given image matrix it self

  • if given new color is same as initial color means already satisfied so we will just return

  • Allowed directions

  •         int dRow[] = {-1, -1, -1, 0, 0, 1, 1, 1}; 
            int dCol[] = {-1, 0, 1, -1, 1, -1, 0, 1};
    

Number of enclaves : code

Question asks return number of cells from which we can’t get on boundary for any number of steps in 4 direction

  • Just apply BFS or DFS on possible boundary land cells

  • And it will mark all the cell visited, where we can reach from boundary

  • For cells that we can’t return their number

Rotten Oranges : code

Question says if an orange is rotten then it will rot other fresh oranges in horizontal and vertical direction in one minute so return the total time taken to rot all fresh oranges

  • Must be a graph such that in all of his component atleast one rotten orange must exist

  • BFS would be required with source node as rotten oranges at time t = 0

  • It’s queue will be tracking time at which inserted orange was rotten

  • And for BFS insert all the rotten oranges in the queue before processing BFS

  • As every rotten orange at a moment will simultaneously rot other connected fresh oranges

  • At every insertion of rotten orange you have to increment the time at which his parent was rotten

Distance of nearest cell having one : code

Same as Rotten Orange but instead of time here you have to store currDist

  • Firstly insert all the available ones in queue and that’s done

  • now just apply bfs with that queue and a visted array

Surrounded Regions : code

Question says return a matrix after converting all the ‘O’s as ‘X’ if the set from which that ’O’ belong is surrounded by ‘X’

Intiution

Just see boundary ‘O’s are not surrounded by ‘X’ so just consider those ‘O’s and traverse on matrix assuming those ‘O’s as source and for every other ‘O’ that could be approached will be marked as ‘O’s in set which not surrounded by ‘X’ and for other ‘O’s just convert them to ‘X’

Number of distinct Islands : code

In question it says distinct islands by which it means they must be distinct in their orientation too

example

Below matrix has only 3 distinct islands

Below matrix has only 1 distinct islands

Intiution

Unique → use set

  • For Solving this question you can use BFS and insert the node to temp array

  • insert that temp array into a set

  • but before inserting that temp array make sure you converted it’s top - left element as base {2,3} → {0,0} and now relative address of othr cell of that island would be {2,4},{3,3},{3,4} to {0,1},{1,0} and {1,1}

  • Now sort the temp array and then insert it to set

  • return size of set as your ans

Detect a cycle in an undirected graph : code

In undirected graph it is really easy to detect a cycle

  • as if a node going to visit a nnode [nnonde => neighbour node] but if that nnode says it’s already visited but if that nnode was not the parent of the node which means someone else visited the node so it is a cycle

Detect a cycle in an directed graph : code

Now here the confusion comes like if a node is visited and it’s not a parent to current node that does not mean it’s a cycle as here it’s directed from node to nnode right ?

So here we can apply DFS ( as it goes in straight line ( in depth ) )

  • We will be also keeping a pathtraversed array that will keep track if a visited node is from same path on which i am running

  • That is when i am going in depth i have to mark pathTraversed as 1 and when i am backtracking i have to keep converting pathTraversed back to 0

  • and hence if i reach a node which is visited also for which pathTraversed is 1

  • Means i did encountered that node in same path and again i am here so it is a cycle

  • therefore return true


I am Vineet Raj on a journey to complete DSA in depth and I am writing such article for every topic so if you want to revise complete topic from single article you would love my articles !

1
Subscribe to my newsletter

Read articles from Vineet Raj directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vineet Raj
Vineet Raj