Category Archives: Algorithms

Heapsort: Algorithm and Complexity

Heap sort is one of the most important and interesting algorithms. The implementation is base of many other sophisticated algorithms. We will focus on basics of heap sorting here. Infact, if you understand what a heap data structure is, heap sort is just application of some known simple operations on the heap.

Let’s relook at our heap

heapsort

This is a max heap so let’s try to create an array in reverse sorted order (a min heap is good for sorted in lowest to highest format and vice versa).  Before moving forward, revisit the heap data structure and understand Insert and Delete operations, after that heap sort algorithm is a cake walk.

We know that max heap has a property that element at the top is highest. Using this property, we can create an algorithm like

  1. Create max heap from the random array
  2. Remove top element from heap and add to reverse sorted array
  3. Heapify (restore heap property) the remaining heap elements
  4. Repeat Step 2 till the heap is empty

Step 1 is nothing but inserting all elements of an array to heap. One Insert Key operation takes Log N operations so N elements will take N Log N

Step 2 is order of 1 as we are just removing the top element and adding to our array

Step 3 is nothing but Delete Node operation in heap, i.e.  take last leaf node and add as root, then compare with its child nodes and swap with largest. Repeat the process till heap property is maintained, i.e. child is greater than parent.  This is Log N operations for one heapify.

Step 4 is combination of 2 & 3, N times. Step 2 is constant order and hence can be ignored. This means we have N Step 3 order or N Log N.

Combining Step 1 and 4, we figure out order of heap sort complexity is of  N Log N order.

Related: http://kamalmeet.com/2013/05/heap-data-structure/

http://kamalmeet.com/2011/07/insertion-sorting/

Heap Data Structure

Understanding of Heap data structure is very important as it is part of many other data structures like priority queue and algorithms like priority queues, Dijkstra’s.

Let’s understand Heap data structure first. Heap is a simple tree based data structure with a unique property that root node is always larger or smaller than its children.

heapsort

In the current heap we can see parent element is always higher than child nodes, hence it is called max heap. Similarly in case the parent is smaller than children, it will be min heap. Also the heap shown above is binary heap as each parent node has only two child nodes.

Let’s talk about efficiency of the heap data structure in terms of Inserting and deleting the nodes.

Insert a Key: To insert a new node, new element is added to next possible leaf node. It is then compared to its parent node and moved up if larger.

1. Insert node at next leaf

2. While element is less than parent

Swap child and parent

 

Insert operation in a heap data structure is order of log N, where N is number of nodes available.

Delete Node: Deletion always occur at the root node. Last leaf node is moved to root and then moved down till it is greater than children.

1. Node =root

2. While node <any child

Swap node and largest child

Delete operation in a heap data structure is order of log N, where N is number of nodes available.

Banker’s algo for deadlock avoidance

What is deadlock?

From wikipedia- A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the “chicken or the egg“. The concept of a Catch-22 is similar.

So what is a deadlock in terms of operating system job scheduling. It is same, there are multiple jobs which are trying to gain control of resources and end up in a situation where each process is waiting for some resiurce which is being held by another process.

How to avoid such situations? Banker’s algoirthm is the answer. As the name suggest, the algorithm uses intellegence shown by a banker in order to maintain safe amount of cash in the bank, so that bank never runs into a situation where it has no money to fulfill its customer needs.

A very simple example will be, say the bank has 60K amount. There are 3 customers A, B and C. A needs 30K for its project, B needs 25K, and C needs 40K. Now we know that bank can fulfill need of all three customers, but not at once. So ideally the bank will fulfill one customer’s need at a time, wait for the customer to return the money (we will ignore interest for simplicity, and we will assume money will be returned by cistomer as soon as project is finished though in real workd cstomer might need some time), and then move on to next. Which is not effective use of money as customers will need to wait while others finish. In practical world, not all the money will be needed for projects in one go, so the bank can ask customers how much amount they will need at what stage, and at what time they can return the money, and try to organize the finances in a way that all customers can finish there projects in best possible time. The same kind of logic is used by job-scheduler of on operating system to make sure all jobs gets finished in best possible time, avoiding deadlock.

Lets say customers provide this data to bank

Customer A: Need 10K at start

Customer B: Need 10K at start

Customer C: Need 10K at start

The bank sees it can afford to give this money to all the customers as it will still have enough money to fulfill needs of all the customers.

In stage 2, all customers again come up with a demand of 10K. Now the bank sees if it will allocate all the money, it will run out of money with none of the projects being successfully completed, hence no point of recovery. So it must deny some of the customer. Safest bet will be to refuse the amount to C and tell it to wait till bank has enoght fund. Allocate the money to A and B, as it will still have enough money to give to these customer to finish the project.

So C is put on hold and A and B moves on with the project. Now say A comes up with demand of 10K, which bank will fulfill and A will be able to finish project successfully and return 30K. Now bank can provide money to C, as it knows it will have sufficient funds for other customers.

The same intellegence is used by an OS to make sure the system never runs out of resources.

Insertion Sorting

Insertion sorting, as the name suggest, is about inserting the elements of an unsorted array to their correct position one by one. To make it clearer, lets say we have an unsorted array. We take another array, which is empty initially and then we will keep adding numbers one by one into this array at their correct positions.

Say the original array A is

5, 3, 7, 2, 6

Now we take first element and add to zeroth position of a new array B, so B is now containing 5

next we pick 3 from original array and add to first position of B, it becomes 3, 5

After step 3 B is, 3, 5, 7 and then 2, 3, 5, 7 and finally 2, 3, 5, 6, 7

We have taken 2 arrays just to make things clear, the sorting functionality can be achieved with 1 array only. Instead of taking an additional array, we will think of original array as 2 sub-parts, at any given time first N-x elements are sorted and remaining x are not sorted.

Again taking the original array A

5, 3, 7, 2, 6

Step 1: 5 (sorted till here), 3, 7, 2, 6
Step 2: 3, 5 (sorted till here), 7, 2, 6
Step 3: 3, 5, 7 (S), 2, 6
Step 4: 2, 3, 5, 7, (S) 6
Step 5: 2, 3, 5, 6, 7 – Final

Complexity:

If there are N elements in array, at Nth point, we know we have N-x sorted and x unsorted. We pick up next element (N-x+1)st, and need to insert it into proper position in for N-x+1 elements. So in worst case we need to make N-x comparisons (in best case 1 comparison). The process has to be repeated N times for N elements.

So in worst case we have comparisons like

1+2+3+.. +N= N*(N-1)/2 which is quadratic i.e. N*N.

In best case (sorted array) we have 1+1+1+.. N times comparison which is order of N (linear).

We generally take worst case complexity so we will say Insertion sort has complexity of N^2.