Graph Theory in Data structure and Algorithms

Table of contents
- Introduction
- Algorithms related to graphs
- Problems
- Number of provinces : code
- Number of islands : code
- Flood fill algorithm : code
- Number of enclaves : code
- Rotten Oranges : code
- Distance of nearest cell having one : code
- Surrounded Regions : code
- Number of distinct Islands : code
- Detect a cycle in an undirected graph : code
- Detect a cycle in an directed graph : code

Graphs ! an important from DSA for interview preparation point of view as it’s assumed to be most complicated topic that most learner fail to recognize patterns of questions or you can say they are not able to relate other topics like tree,matrices etc to correlate it with graphs but but in this article we will be starting from basic and then will be going beyond basics by that i mean will try to cover all important algorithms and problems on graph theory however for problems will be writing problem title and basic idea or you can say an hint to how can you approach it and then will be attaching a link if want to see complete solution. for now solutions are in only c++
Introduction
So here we are introduction to graph a non-linear data structure, or you can say non linear way to represent data
A graph consists of Nodes and Edges
Further graphs are categorized in multiple types
Directed Graph : When edges are directed
Weighted graph : When edges have a weight assigned to it
Cyclic Graph : when nodes are connected such a way that when you start from one node you can reach that node again after traversing all the nodes
Acyclic Graph : Opposite to cyclic one i.e when you can’t reach at initial node
In above image it’s shown types of graph along with components
Intuition
Consider it as a state now think of all the cities as node and roads as edge now
If you can move on road from going cityX → cityY and cityY → cityX then it could mean graph is undirected graph because of those undirected edges
If you can go from cityX → cityY but if cityY → cityX is not possible then it means it’s undirected graph
Now if you have to pay taxes to travel on roads then it means a weighted edge or graph is a weighted graph
And if you go like cityX → cityY → cityZ → cityX : it means you can reach back to cityX without touching other cities twice A cyclic graph is here ( in above figure there are cycles in undirected graph but for directed one it’s not forming a single cycle )
If no cycles then Acyclic graph
So you can see whatever it name suggests just add that attribute to edge or node and graph becomes that
Degree of graph
$$graphDegree = \sum_{i=start}^{end} nodeDegree$$
So here nodeDegree
means number of edges connected to ith
node
For Directed Graph it could be said as
Indegree : for incoming edge to node
outdegree : for outgoing edge to node
Example :
How we take input and store graphs ?
Adjacency Matrix : when we store data in a 2d matrix of size n X n
where n is number of nodes
// For Weighted , undirected graph
{
int numberOfNodes = 0;
cin >> numberOfNodes;
vector<int> adjMatrix[numberOfNodes];
int numberOfEdges = 0;
cin >> numberOfEdges;
for (int i = 0; i < numberOfEdges; i++)
{
// input in format startingNodeOfEdge , ending node of edge , weight of edge
int starting, ending, weight; // for unweighted graph, edge weight is assumed 1
cin >> starting >> ending >> weight;
adjMatrix[starting][ending] = weight;
adjMatrix[ending][starting] = weight; // if it is a directed graph then we will not do this
}
}
Analysis
Time Complexity : O(E)
Space Complexity : O(N²) {best,avg,optimal case }
Adjacency Lists : Here we store a vector mapped to every node storing the info about which nodes are connected to which node
{
// undirected and unweighted graph
int numberOfNodes = 0;
cin >> numberOfNodes;
vector<vector<int>> adjList(numberOfNodes);
int numberOfEdges = 0;
cin >> numberOfEdges;
for (int i = 0; i < numberOfEdges; i++)
{
// input in format startingNodeOfEdge , ending node of edge
int starting, ending;
cin >> starting >> ending;
adjList[starting].push_back(ending);
adjList[ending].push_back(starting); // if it is a directed graph then we will not do this
}
}
Analysis
Time Complexity : O(E)
Space Complexity : O(N²) {worst case : when all the nodes are connected to each other }
Implement adjList for weighted graphs
Hint ????
For a weighted graph instead of mapping nodes with each node we could have mapped it with pair of node and it’s weight
Algorithms related to graphs
Note : Just remember if applying graph you would need an
Array to track visited nodes
aqueue
or astack
Traversal Technique : Breadth First Search ( BFS )
Focus on word breadth !
It means
if we arrive at any node then we will go through each and every neighbour node first say neighbours are at level 1
then we get back to first neighbour
Now we apply step one with assuming the first neighbour at level 1 is our starting node
We travers all node at level 2 now when done
when we traverse back we go back to the second neighbour of level 1
Implementation
vector<int> bfsOfGraph(int V, vector<int> adj[]) {
// to keep track of visited nodes
vector<bool> visited(V,false);
vector<int> ans;
// this loop will take care of disconnetcted components
for(int i = 0; i < V; i++){
// if a node is not visited means there is one graph component is there which is not traversed
if(!visited[i]){
//to cover that one component
queue<int> q;
q.push(i); // assumed that one unvisited node as source node
ans.push_back(i);
visited[i] = true;
// now traversing it's negihbouring nodes
while(!q.empty()){
int currNode = q.front();
q.pop();
for(auto it:adj[currNode]){
if(!visited[it]){
q.push(it);
ans.push_back(it);
visited[it] = true;
}
}
}
}
}
return ans;
}
Traversal Technique : Depth First Search
Focus on word Depth
It means we will do something like inorder traversal of tree, i.e
Will take a source node
go to it’s first neighbouring node which is not visited
Will assume it is our source node and then will go to it’s first not visited neighbouring node
and will keep going to depth till we don’t get neighbouring nodes
and then when we don’t have any node to visit we trace back
and try to cover other non visited nodes
void dfs(
int node,
vector<int> adj[],
vector<bool> &visited,
vector<int> &traversal
){
visited[node] = true;
traversal.push_back(node);
for(auto it:adj[node]){
if(!visited[it]){
// calling the recursionjust after finding , first unvisited node
dfs(it,adj,visited,traversal);
}
}
}
// got number of node and adj list as input to the function
vector<int> dfsOfGraph(int V, vector<int> adj[]) {
vector<bool> visited(V,false);
vector<int> traversal;
// traversing visited array to track unvisted graph nodes
for(int i = 0; i < V;i++){
if(!visited[i]){
// for unvisted node
// taking it as a source and traversing the graph such that it visits all connected nodes
dfs(i,adj,visited,traversal);
}
}
return traversal;
}
Topological Sorting : using DFS and stack
Topological order is in which if node u is connected to node v then u must appear before node v, no matter how earlier but must be earlier then node v
Note : It is only possible for a directed acyclic graph ( DAG ) , as for others it will not be possible to print the order
Here we are just
using a stack that will keep track of nodes in which they are traversed
Last or terminal node will be inserted into stack first as we are inserting node after DFS is complete
Later will just omit the stack and will print their order in which stack popped the set of nodes
void dfs(int node,vector<int> adj[],stack <int> &st,int vis[]){
vis[node] = 1;
for(auto it:adj[node]){
if(!vis[it]) dfs(it,adj,st,vis);
}
// normal DFS with this additional stack call
st.push(node);
}
vector<int> dfsTopo(int V,vector<int> adj[]){
int vis[V] = {0};
stack<int> st;
// traversing like normal DFS but with additional argument of stack
for(int i = 0; i < V; i++){
if(vis[i] == 0){
dfs(i,adj,st,vis);
}
}
// storing ther order when Traversal is completed
vector<int> order;
while(!st.empty()){
order.push_back(st.top());
st.pop();
}
return order;
}
Topological Sorting : Khan’s Algo. ( using BFS )
Topological order is in which if node u is connected to node v then u must appear before node v, no matter how earlier but must be earlier then node v
Note : It is only possible for a directed acyclic graph ( DAG ) , as for others it will not be possible to print the order
Here additional ideas are
Store count of indegree edges as if it i zero means there are not any other node on which current node depends
And if a node does not depends on any other node then it means
The node is root node
Parent nodes are already processed
Hence we can start processing and recording those nodes with zero indegree
rest is simple BFS
vector<int> khansAlgo(int V,vector<int> adj[]){
// to track indegree edges to track if a node is depends on other node
int indegreeEdges[V] = {0};
int vis[V] = {0};
for(int i = 0; i < V;i++){
for(auto node:adj[i]){
indegreeEdges[node]++;
}
}
vector<int> ans;
// normal bfs with addition of indegreeEdges array
queue<int> q;
for(int i = 0; i < V; i++){
if(indegreeEdges[i] == 0){
q.push(i);
vis[i] = 1;
}
}
while(!q.empty()){
int currNode = q.front();
ans.push_back(currNode);
q.pop();
for(int it:adj[currNode]){
if(!vis[it]){
indegreeEdges[it]--;
if(indegreeEdges[it] == 0){
q.push(it);
vis[it]=1;
}
}
}
}
return ans;
}
Disjoint Set Union
» Why ?
» Because , sometime we don’t have to print path between two nodes we just have to tell that if a path is possible or not. And BFS or DFS takes significant time for doing this like O(E+V) time complexity for processing each of the pair so if n pair then
O(n*(E + V ) )
which is waste here because we just had to say yes or no for weather path exist or not
Here Comes the Disjoint Set Union in picture that takes constant time or O( 4 * alpha )
Time complxity where alpha is almost equal to 1 hence Constant Time
How it Works ?
It stores the parent or ultimate parent of each node and say if ultimate parent of both nodes is different than it says path not possible else it’s possible
And for parent it is decided based on rank of each node or size of each node like
if size_u > size_v
- u will be parent of v
if size_u == size_v
any one can become anyone’s parent but
one that became parent , you must increase it’s rank or size by one.
» keep in mind that we never compare parent we compare parents of parent until node becomes parent of itself or you can say ultimate parent of node
» We can even store ultimate parent of node in our parent array as we don’t need to know direct parent we just want to know root of that cluster or tree or whatever data structure it’s forming, and when we store ultimate parent directly it is called Path Compression
How to use it ?
Basically it’s a class that comes with 5 function or methods
DisjointSet(int n) : it initializes required arrays and fills in req. values
findUPar(int u) : it returns ultimate parent of u
find(int u, int v) : it returns a boolean value for weather u and v are connected by a path or not
unionByRank(int u, int v) : it joins clusters whose roots are node u and node v based on there rank
unionByRank(int u, int v) : it joins clusters whose roots are node u and node v based on there size
class DisjointSet {
vector<int> rank,parent,size;
public:
DisjointSet(int n) {
rank.resize(n+1,0);
parent.resize(n+1);
size.resize(n + 1, 1);
for(int i = 0;i < n;i++){
parent[i] = i;
}
}
int findUPar(int node){
if(node == parent[node])
return node;
return parent[node] = findUPar(parent[node]);
}
bool find(int u, int v) {
return (findUPar(u) == findUPar(v));
}
void unionByRank(int u, int v) {
int ulp_u = findUPar(u);
int ulp_v = findUPar(v);
if(ulp_u == ulp_v) return;
if(rank[ulp_u] < rank[ulp_v]){
parent[ulp_u] = ulp_v;
}else if(rank[ulp_u] > rank[ulp_v]){
parent[ulp_v] = ulp_u;
}else{
parent[ulp_v] = ulp_u;
rank[ulp_u]++;
}
}
void unionBySize(int u, int v) {
int ulp_u = findUPar(u);
int ulp_v = findUPar(v);
if (ulp_u == ulp_v) return;
if (size[ulp_u] < size[ulp_v]) {
parent[ulp_u] = ulp_v;
size[ulp_v] += size[ulp_u];
}
else {
parent[ulp_v] = ulp_u;
size[ulp_u] += size[ulp_v];
}
}
};
Spanning Tree
A spanning tree is a subset of a
weighted graph
in which there are N nodes(i.e. all the nodes present in the original graph) and
N-1 edges and
all nodes are reachable from each other.
Example
For above graph below is a possible spanning tree
» Note : If we draw all possible spanning trees and compared the sum of there edge weights then spanning tree with minimum sum of edge weights is called a Minimum Spanning Tree or MST
Minimum Spanning Tree : Prim’s Algo.
It is same as doing BFS on a connected graph as a Spanning tree exist only for connected graphs
Difference with BFS is
Instead of queue we use priority queue ( min-heap )
It helps us to visit a node only when reachable at minimum cost from certain node
int primsMST(int V, vector<vector<int>> adj[]) {
// your min heap that will record all the nodes with wt from parent
priority_queue<P,vector<P>,greater<P>> pq; // {weight, node}
pq.push({0,0}); // inserted first node as it's wt from parent will be 0 as no parent initally
int sum = 0;
vector<bool> vis(V,false); // visited array to keep track of node taht are vsted once
while(!pq.empty()){
int wt = pq.top().first;
int node = pq.top().second;
pq.pop(); // popping element of smallest wt from it's parent
// if node is visted previously means we recorded an edge with
// comarativly smaller weight and hence we will not record the current edge and will pop it out
if(vis[node]) continue;
// if it was not recorded previously means it's the shortest edge weight possible
vis[node] = true;
sum+=wt; // record th weight
// push neighbour node with wt according to dist between neighours and current node as it's parent
for(auto it:adj[node]){
if(!vis[it[0]]){
pq.push({it[1],it[0]}); // if neighbour is unvisted then push it in min-heap
}
}
}
return sum;
}
Minimum Spanning Tree : Kruskal Algo.
- For this we use DisjointSet ( already discussed in article )
int kruskalMST(int V,vector<vector<int>> adj[]){
vector<pair<int, pair<int, int>>> edges;
for (int i = 0; i < V; i++) {
for (auto it : adj[i]) {
int v = it[0];
int wt = it[1];
int u = i;
edges.push_back({wt, {u, v}});
}
}
DisjointSet ds(V);
sort(edges.begin(), edges.end());
int sum = 0;
for (auto it : edges) {
int wt = it.first;
int u = it.second.first;
int v = it.second.second;
if (ds.findUPar(u) != ds.findUPar(v)) {
sum += wt;
ds.unionBySize(u, v);
}
}
return sum;
}
Problems
Number of provinces : code
Same as BFS or DFS
Just with additional counter
Increment counter when going to traverse new component
Number of islands : code
Simple traversal but with
queue that show cell address like
{row,col}
means queue of pair<int,int>And it’s adj node would be based on delta values from below array
int dRow[] = {-1, -1, -1, 0, 0, 1, 1, 1};
int dCol[] = {-1, 0, 1, -1, 1, -1, 0, 1};
Flood fill algorithm : code
Simple traversal
Here we don’t need any visited array
just use given image matrix it self
if given new color is same as initial color means already satisfied so we will just return
Allowed directions
int dRow[] = {-1, -1, -1, 0, 0, 1, 1, 1}; int dCol[] = {-1, 0, 1, -1, 1, -1, 0, 1};
Number of enclaves : code
Question asks return number of cells from which we can’t get on boundary for any number of steps in 4 direction
Just apply BFS or DFS on possible boundary land cells
And it will mark all the cell visited, where we can reach from boundary
For cells that we can’t return their number
Rotten Oranges : code
Question says if an orange is rotten then it will rot other fresh oranges in horizontal and vertical direction in one minute so return the total time taken to rot all fresh oranges
Must be a graph such that in all of his component atleast one rotten orange must exist
BFS would be required with source node as rotten oranges at time t = 0
It’s queue will be tracking time at which inserted orange was rotten
And for BFS insert all the rotten oranges in the queue before processing BFS
As every rotten orange at a moment will simultaneously rot other connected fresh oranges
At every insertion of rotten orange you have to increment the time at which his parent was rotten
Distance of nearest cell having one : code
Same as Rotten Orange but instead of time here you have to store currDist
Firstly insert all the available ones in queue and that’s done
now just apply bfs with that queue and a visted array
Surrounded Regions : code
Question says return a matrix after converting all the ‘O’s as ‘X’ if the set from which that ’O’ belong is surrounded by ‘X’
Intiution
Just see boundary ‘O’s are not surrounded by ‘X’ so just consider those ‘O’s and traverse on matrix assuming those ‘O’s as source and for every other ‘O’ that could be approached will be marked as ‘O’s in set which not surrounded by ‘X’ and for other ‘O’s just convert them to ‘X’
Number of distinct Islands : code
In question it says distinct islands by which it means they must be distinct in their orientation too
example
Below matrix has only 3 distinct islands
Below matrix has only 1 distinct islands
Intiution
Unique → use set
For Solving this question you can use BFS and insert the node to temp array
insert that temp array into a set
but before inserting that temp array make sure you converted it’s top - left element as base {2,3} → {0,0} and now relative address of othr cell of that island would be {2,4},{3,3},{3,4} to {0,1},{1,0} and {1,1}
Now sort the temp array and then insert it to set
return size of set as your ans
Detect a cycle in an undirected graph : code
In undirected graph it is really easy to detect a cycle
- as if a node going to visit a nnode [nnonde => neighbour node] but if that nnode says it’s already visited but if that nnode was not the parent of the node which means someone else visited the node so it is a cycle
Detect a cycle in an directed graph : code
Now here the confusion comes like if a node is visited and it’s not a parent to current node that does not mean it’s a cycle as here it’s directed from node to nnode right ?
So here we can apply DFS ( as it goes in straight line ( in depth ) )
We will be also keeping a pathtraversed array that will keep track if a visited node is from same path on which i am running
That is when i am going in depth i have to mark pathTraversed as 1 and when i am backtracking i have to keep converting pathTraversed back to 0
and hence if i reach a node which is visited also for which pathTraversed is 1
Means i did encountered that node in same path and again i am here so it is a cycle
therefore return true
I am Vineet Raj on a journey to complete DSA in depth and I am writing such article for every topic so if you want to revise complete topic from single article you would love my articles !
Subscribe to my newsletter
Read articles from Vineet Raj directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by