Depth First Search: Graph Searching Algorithm in Python

Graph Search Problem

Many problems in real-life can be easily solved by mapping them to the mathematical structure called graphs. Graph Theory and Computational Theory has led to the solution of interesting problems like Traveling Salesman problem, minimum-flow problem, etc.

In this tutorial, we will be studying and solving a very basic problem of graph searching. Consider the road network, it is practical to think if we can reach from say, Cannaught Place in Delhi to Pari Chowk in Noida and when we are at the problem of connectivity, we also want to consider the shortest path to reach it. The problem we are trying to solve is: if there is a path from place A to place B, and to say it in graph terminology; if node A and node B are connected or in general, what are the nodes which are reachable from node A.

Graph structures are ubiquitous in problems where there is an initial state and final state, and other abstract things like sequence of decisions can be considered as paths to go from initial to final state. It’s a broad and interesting topic and this tutorial will give you a flavor of it. We will code the solution to the problem of finding what are the nodes reachable from source node using Depth First Search Algorithm in Python programming language. We will be considering another algorithm called Breadth First Search in another tutorial.

We will be using Stack data-structure and you might want to review its properties and operations before proceeding further.

Depth First Search

 

 

Consider the graph above, we can see that node 5 is unreachable from all other nodes (directed edge is in opposite direction). And any other node except node 2 is unreachable from node 5. Ok so, how are we going to solve this? You won’t believe if I say that we all have solved this problem. Remember maze? Yes, starting from a source node, walk the graph as long as you are not hitting a wall. If there isn’t any forward path, start moving backwards and explore next immediate unexplored path. Simple enough?

Formally, Depth First Search (DFS) explores the graph aggressively only backtracking when necessary.

This means that starting with source node, say node 3, we will follow the nodes reachable from it recursively. A possible scenario might be, from node 3 we chose node 4, and from node 4, the only reachable node is node 2. Now, we’ve hit the wall and will backtrack. Now we are again at node 4 but there are not other nodes reachable from it (we’ve already visited node 2), so we backtrack once again and are now at node 3. It has one unexplored node 1 and we visit it. From node 1, we can reach node 2 but hey! we have already visited it. So, Backtrack.

This procedure of exploring the nodes aggressively is called depth search procedure. We are going deep into the maze and we backtrack once can’t go further. We also remember the nodes we’ve already visited and never visit them again.

This explains search procedure. Now, how are we going to implement it? Those who are aware with recursion may get a hint that all we are doing is a recursive graph traversal. For this tutorial, we are coding up a simple stack based solution as it is simple to understand. Stack is a Last-in First-Out (LIFO) based data-structure. We are using Python Collections object deque as stack implementation.

Refer the code below. Stack is maintained to backtrack. We append the node we are visiting to stack and once we can’t go further, we pop the node from the stack ( we need to pop the last visited node, that’s why stack). Parent Dictionary is maintained as a record for the nodes we have already visited.

 

import sys
from collections import deque
from collections import defaultdict



#function to read the graph as adjacency list from the file
def read(filename):
	graph = defaultdict(list)
	with open(filename, 'r') as f:
		for line in f.readlines():
			graph[int(line.split('\t')[0])] = map(int, line.split('\t')[1:-1])
	return graph


def dfs(graph, source):
	stack = deque()			#stack declaration
	parent  = defaultdict(int)
	stack.append(source)
	parent[source] = 0
	node = source			#source node
	while 1:
		flag = True
		for n in graph[node]:	#for the nodes reachable from node
			if n not in parent.keys(): #if the node is not already visited
				flag = False	   #if we are visiting a new node, mark flag as False
				parent[n] = node   #mark it as visited
				stack.append(n)	   #append it to stack
				node = n	   #next search procedure starts from this new visited node
				break		   
		if flag == True:		#if flag == True, we haven't visited new node, means we can't go further.
			stack.pop()		#remove the last node for backtracking
			if stack:
				node = stack[-1] #new node to start searching from
			else:
				break
			
		
	return parent
	

if __name__ == "__main__":
	args = sys.argv
	filename = args[1]
	source = int(args[2])
	graph = read(filename)
	parent = dfs(graph, source)
	print parent

INPUT
#python codefile graphfile source node
#graph file contains tab-separated adjacency matrix
1	2	
3	1	4	
4	2	
5	2	
! python dfs.py dfs.txt 3
OUTPUT
defaultdict(<type 'int'>, {1: 3, 2: 1, 3: 0, 4: 3})
#Parent Dictionary
#Note that 5 is not there. 
#Also, when source is 5
! python dfs.py dfs.txt 5
OUTPUT
defaultdict(<type 'int'>, {2: 5, 5: 0})

Conclusion

We code up Depth First Search to graph search problem in this tutorial. Depth First Search is basic algorithm applicable to wide number of problems.

 

Leave a Comment

%d bloggers like this: