Graph Search Problem
Many problems in real-life can be easily solved by mapping them to the mathematical structure called graphs. Graph Theory and Computational Theory has led to the solution of interesting problems like Traveling Salesman problem, minimum-flow problem, etc.
In this tutorial, we will be studying and solving a very basic problem of graph searching. Consider the road network, it is practical to think if we can reach from say, Cannaught Place in Delhi to Pari Chowk in Noida and when we are at the problem of connectivity, we also want to consider the shortest path to reach it. The problem we are trying to solve is: if there is a path from place A to place B, and to say it in graph terminology; if node A and node B are connected or in general, what are the nodes which are reachable from node A.
Graph structures are ubiquitous in problems where there is an initial state and final state, and other abstract things like sequence of decisions can be considered as paths to go from initial to final state. It’s a broad and interesting topic and this tutorial will give you a flavor of it. We will code the solution to the problem of finding what are the nodes reachable from source node using Depth First Search Algorithm in Python programming language. We will be considering another algorithm called Breadth First Search in another tutorial.
We will be using Stack data-structure and you might want to review its properties and operations before proceeding further.
Depth First Search
Consider the graph above, we can see that node 5 is unreachable from all other nodes (directed edge is in opposite direction). And any other node except node 2 is unreachable from node 5. Ok so, how are we going to solve this? You won’t believe if I say that we all have solved this problem. Remember maze? Yes, starting from a source node, walk the graph as long as you are not hitting a wall. If there isn’t any forward path, start moving backwards and explore next immediate unexplored path. Simple enough?
Formally, Depth First Search (DFS) explores the graph aggressively only backtracking when necessary.
This means that starting with source node, say node 3, we will follow the nodes reachable from it recursively. A possible scenario might be, from node 3 we chose node 4, and from node 4, the only reachable node is node 2. Now, we’ve hit the wall and will backtrack. Now we are again at node 4 but there are not other nodes reachable from it (we’ve already visited node 2), so we backtrack once again and are now at node 3. It has one unexplored node 1 and we visit it. From node 1, we can reach node 2 but hey! we have already visited it. So, Backtrack.
This procedure of exploring the nodes aggressively is called depth search procedure. We are going deep into the maze and we backtrack once can’t go further. We also remember the nodes we’ve already visited and never visit them again.
This explains search procedure. Now, how are we going to implement it? Those who are aware with recursion may get a hint that all we are doing is a recursive graph traversal. For this tutorial, we are coding up a simple stack based solution as it is simple to understand. Stack is a Last-in First-Out (LIFO) based data-structure. We are using Python Collections object deque as stack implementation.
Refer the code below. Stack is maintained to backtrack. We append the node we are visiting to stack and once we can’t go further, we pop the node from the stack ( we need to pop the last visited node, that’s why stack). Parent Dictionary is maintained as a record for the nodes we have already visited.
import sys from collections import deque from collections import defaultdict #function to read the graph as adjacency list from the file def read(filename): graph = defaultdict(list) with open(filename, 'r') as f: for line in f.readlines(): graph[int(line.split('\t')[0])] = map(int, line.split('\t')[1:-1]) return graph def dfs(graph, source): stack = deque() #stack declaration parent = defaultdict(int) stack.append(source) parent[source] = 0 node = source #source node while 1: flag = True for n in graph[node]: #for the nodes reachable from node if n not in parent.keys(): #if the node is not already visited flag = False #if we are visiting a new node, mark flag as False parent[n] = node #mark it as visited stack.append(n) #append it to stack node = n #next search procedure starts from this new visited node break if flag == True: #if flag == True, we haven't visited new node, means we can't go further. stack.pop() #remove the last node for backtracking if stack: node = stack[-1] #new node to start searching from else: break return parent if __name__ == "__main__": args = sys.argv filename = args[1] source = int(args[2]) graph = read(filename) parent = dfs(graph, source) print parent
INPUT #python codefile graphfile source node #graph file contains tab-separated adjacency matrix 1 2 3 1 4 4 2 5 2 ! python dfs.py dfs.txt 3 OUTPUT defaultdict(<type 'int'>, {1: 3, 2: 1, 3: 0, 4: 3}) #Parent Dictionary #Note that 5 is not there. #Also, when source is 5 ! python dfs.py dfs.txt 5 OUTPUT defaultdict(<type 'int'>, {2: 5, 5: 0})
Conclusion
We code up Depth First Search to graph search problem in this tutorial. Depth First Search is basic algorithm applicable to wide number of problems.