MergeSort: Divide and Conquer Routine for Sorting in Python

In Algorithmic thinking, Divide and Conquer Paradigm is widely applicable to problem solving. As the name says, Divide and Conquer approach works by dividing a large problem recursively into small tractable sub-problems which can be solved with ease and then the solutions are then combined to the actual problem at hand. Enough verbosity, in this post we will explain this approach by working on Sorting Problem using the popular MergeSort Algorithm. We will also get our hands dirty by coding this up in our favorite programming language, Python.

The Sorting Problem:

In algorithmic thinking, we recommend to think in terms of input and output. For the sorting problem, input is array of unsorted numbers, [45, 23, 109,  41,  21, 12] for example and the output is the same array in sorted order [12, 21, 23, 41, 45, 109], order being the numbers in increasing order.

Though there are many algorithms to solve this efficiently (time and space complexity), We will solve this problem using MergeSort Algorithm.

MergeSort and Divide and Conquer:

The three main routines of MergeSort Algorithm are:

  1. Divide the input array into two sub-arrays [Divide]
  2. Recursively sort the left sub-array and the right sub-array [Conquer]
  3. Merge two sub-arrays into one array [Solution]

Refer to the following figure to see the approach in action:

Python Code for MergeSort:

The mergesort function in the code below takes array as input. If the array size is 1 (base case), it returns the array. Else it recursively calls mergesort() function on the left sub-array and right sub-array.  The array is sub-divided till its size reduces to 1 ( the base case ) and then only the merge() sub-routine kicks in. The merge() sub-routine merges the two sub-arrays into a single array.

#Python Code for MergeSort Algorithm
import sys

#read the file to return array
def read(filename=None):
	if filename:
		lines = []
		with open(filename, 'r') as f:
			lines = [int(line.rstrip()) for line in f.readlines()]
		return lines
		
	else:
		arr = []
		size = input("Enter size of the array :")
		while len(arr) < int(size):
			arr.append(int(input()))
		return arr

#merge subroutine to merge the sub-solutions
def merge(a1,a2):
	size = len(a1) + len(a2)
	a = []
	i = 0
	j = 0
	while len(a) < size:

		if a1[i] < a2[j]:
			a.append(a1[i])
			i+=1
		if i == len(a1):
			a.extend(a2[j:])
			break
		if a1[i] > a2[j]:
			a.append(a2[j])
			j+=1
		if j == len(a2):
			a.extend(a1[i:])
			break		
	return a

#MergeSort Routine, recursively get called on left and right parts of array
def mergesort(arr):
	size = len(arr)
	if size == 1:
		return arr
	m1 = mergesort(arr[:size/2])   #mergesort call on left sub-array
	m2 = mergesort(arr[size/2:])   #mergesort call on right sub-array
	m = merge(m1,m2)               #merge call on the two sorted sub-arrays
	return m

#the main function
if __name__ == "__main__":
        args = sys.argv
	filename = args[1]
	arr = read(filename)
	print len(arr)
	sorted_array = mergesort(arr)
	print sorted_array

#Input
The file contents: (Copy this into a file and run the python script with the filename as argument. Eg. 
python mergesort.py filename)
34
21
45
1
90
19
101
100
35
12
#Output
Sorted Array: (The console prints the sorted array)
[1, 12, 19, 21, 34, 35, 45, 90, 100, 101]

Conclusion:

Divide and Conquer Approach is powerful method for problem solving. It works when the sub-problems have the same structure. In MergeSort, the sub-problems are the same type as the original problem, sorting problem. There are other techniques like Greedy Algorithms, Dynamic Programming for solving problems.

Get this running today and see output for yourself.

Tags: Merge Sort in Python, Sorting algorithm in python, ds programs in python, python data structure programs, merge sort program, latest interview ds questions,

Leave a Comment

%d bloggers like this: