Tag Archives: Algorithms

Heapsort: Algorithm and Complexity

Heap sort is one of the most important and interesting algorithms. The implementation is base of many other sophisticated algorithms. We will focus on basics of heap sorting here. Infact, if you understand what a heap data structure is, heap sort is just application of some known simple operations on the heap.

Let’s relook at our heap

heapsort

This is a max heap so let’s try to create an array in reverse sorted order (a min heap is good for sorted in lowest to highest format and vice versa).  Before moving forward, revisit the heap data structure and understand Insert and Delete operations, after that heap sort algorithm is a cake walk.

We know that max heap has a property that element at the top is highest. Using this property, we can create an algorithm like

  1. Create max heap from the random array
  2. Remove top element from heap and add to reverse sorted array
  3. Heapify (restore heap property) the remaining heap elements
  4. Repeat Step 2 till the heap is empty

Step 1 is nothing but inserting all elements of an array to heap. One Insert Key operation takes Log N operations so N elements will take N Log N

Step 2 is order of 1 as we are just removing the top element and adding to our array

Step 3 is nothing but Delete Node operation in heap, i.e.  take last leaf node and add as root, then compare with its child nodes and swap with largest. Repeat the process till heap property is maintained, i.e. child is greater than parent.  This is Log N operations for one heapify.

Step 4 is combination of 2 & 3, N times. Step 2 is constant order and hence can be ignored. This means we have N Step 3 order or N Log N.

Combining Step 1 and 4, we figure out order of heap sort complexity is of  N Log N order.

Related: http://kamalmeet.com/2013/05/heap-data-structure/

http://kamalmeet.com/2011/07/insertion-sorting/

Heap Data Structure

Understanding of Heap data structure is very important as it is part of many other data structures like priority queue and algorithms like priority queues, Dijkstra’s.

Let’s understand Heap data structure first. Heap is a simple tree based data structure with a unique property that root node is always larger or smaller than its children.

heapsort

In the current heap we can see parent element is always higher than child nodes, hence it is called max heap. Similarly in case the parent is smaller than children, it will be min heap. Also the heap shown above is binary heap as each parent node has only two child nodes.

Let’s talk about efficiency of the heap data structure in terms of Inserting and deleting the nodes.

Insert a Key: To insert a new node, new element is added to next possible leaf node. It is then compared to its parent node and moved up if larger.

1. Insert node at next leaf

2. While element is less than parent

Swap child and parent

 

Insert operation in a heap data structure is order of log N, where N is number of nodes available.

Delete Node: Deletion always occur at the root node. Last leaf node is moved to root and then moved down till it is greater than children.

1. Node =root

2. While node <any child

Swap node and largest child

Delete operation in a heap data structure is order of log N, where N is number of nodes available.

Insertion Sorting

Insertion sorting, as the name suggest, is about inserting the elements of an unsorted array to their correct position one by one. To make it clearer, lets say we have an unsorted array. We take another array, which is empty initially and then we will keep adding numbers one by one into this array at their correct positions.

Say the original array A is

5, 3, 7, 2, 6

Now we take first element and add to zeroth position of a new array B, so B is now containing 5

next we pick 3 from original array and add to first position of B, it becomes 3, 5

After step 3 B is, 3, 5, 7 and then 2, 3, 5, 7 and finally 2, 3, 5, 6, 7

We have taken 2 arrays just to make things clear, the sorting functionality can be achieved with 1 array only. Instead of taking an additional array, we will think of original array as 2 sub-parts, at any given time first N-x elements are sorted and remaining x are not sorted.

Again taking the original array A

5, 3, 7, 2, 6

Step 1: 5 (sorted till here), 3, 7, 2, 6
Step 2: 3, 5 (sorted till here), 7, 2, 6
Step 3: 3, 5, 7 (S), 2, 6
Step 4: 2, 3, 5, 7, (S) 6
Step 5: 2, 3, 5, 6, 7 – Final

Complexity:

If there are N elements in array, at Nth point, we know we have N-x sorted and x unsorted. We pick up next element (N-x+1)st, and need to insert it into proper position in for N-x+1 elements. So in worst case we need to make N-x comparisons (in best case 1 comparison). The process has to be repeated N times for N elements.

So in worst case we have comparisons like

1+2+3+.. +N= N*(N-1)/2 which is quadratic i.e. N*N.

In best case (sorted array) we have 1+1+1+.. N times comparison which is order of N (linear).

We generally take worst case complexity so we will say Insertion sort has complexity of N^2.

Rod Cutting problem

Input: A rod on X inches. Price list of Rod prices on n inches.

Problem: Cut a rod of X inches into pieces so that it can be sold at a maximum price.

Example

Input- 8 inch rod
Price list : 1 inch piece=1 Rs, 2=5, 3=8, 4=9, 5=10, 6=17, 7=17 and so on

Output= maximum price=22 (6+2 inch)


/**
* Problem definition: we need to calculate max revenue that can be generated by cutting
* a rod of given size (in inches)
* @author kamal
*
*/
public class CutRod {

//we will be given price based on an inch, 0 inch=0Rs, 1=1, 2=5 and so on
//append zeros where price is not known, so here we can sell max rod of 7 inches
private int prices[]={0,1,5,8,9, 10,17,17,0,0,0,0,0,0,0,0,0};//append n

//using dynamic programming, we will store any previously calculated max prices
private int revenue[]=new int[20];

/**
* Main Method
* @param s
*/
public static void main(String s[])
{
CutRod cr=new CutRod();
System.out.println(cr.fetchMaxPrice(8));
}


/**
* Recursively calls itself to fetch max price
* @param size
* @return
*/
private int fetchMaxPrice(int size)
{
int price=0;
if(size==0)
return 0;
if(revenue[size]>0)
return(revenue[size]);
//brute force
for(int i=1;i<=size;i++)

{

//ith cut price


price=getMax(price, prices[i]+this.fetchMaxPrice(size-i));

revenue[size]=price;

}

return price;

}


/**
* Returns maximum of 2 integers
* @param a
* @param b
* @return
*/

private int getMax(int a, int b)
{

if (a>=b)
return a;
else
return b;
}
}

Maximum Subarray Problem

Problem Definition- Find out a sub-array from n array of numbers where the sum is highest.

Solution  type: Brute force, find out all the combination of arrays and pick the one with maximum sum

Order of complexity: N^2

Code:
public class maxsubarray {

public static void main(String s[])
{
int arr[]={-1,2,4,60,1,-3,2,-6,7,-9,1};

int maxsum=0;
int maxlow=0;
int maxhigh=0;
int low=0;
int high=0;
int sum=0;
int newsum=0;
for(int i=0;i<arr.length-1;i++)
{
sum=0;
newsum=0;
sum=arr[i]+arr[i+1];
newsum=sum;
low=i;
high=i+1;
for(int j=i+2;j<arr.length;j++)
{
newsum=newsum+arr[j];
if(sum<newsum)
{
sum=newsum;
high=j;

}
}
if(sum>maxsum)
{
maxsum=sum;
maxhigh=high;
maxlow=low;
}
}
System.out.println(“lower bound:”+maxlow);
System.out.println(“higher bound:”+maxhigh);
System.out.println(“maximum sum:”+maxsum);
}
}

Tortoise and Hare algorithm

In last post I discussed an algorithm to find out intersection point in 2 lists. A related problem to that is to figure out if a list has cycles. Say there are 10 nodes in list and 10th node is pointing back to 4th node. How ill we figure out if a list has such a cycle.

There can be many solutions, like storing previously traversed nodes, or simply adding a flag to each node indicating of the node was already traversed or not. But these solutions need extra space of order n.

A classic solution to above problem is ‘Tortoise and Hare algorithm’ or ‘Floyd’s cycle-finding algorithm’. The idea is to have two pointers, one will traverse list node by node or move one node at a time (tortoise) and another pointer moving two nodes at a time (hare). If the list has a cycle, the two pointers are bound to meet at some node, otherwise they will never meet. Try to dry run it, you will know what it means.

How to get intersection point of 2 linked lists.

There are 2 lists A and B (we know the root nodes for both). We know that these 2 lists intersect at some point and after that it is common list (they merge into each other, or think of it as a Y list).

example: list A->1->2->3->4->5->6->end
List B->11->12->4->5->6->end

Both the list intersect at node 4. This is what we need to find.

Solution-
1. Find length of A
2. Find length of B
3. Find diff=length A-length B
4. Traverse larger list diff nodes (we know after this nodes will be equal for both lists).
5. Now traverse both list simultaneously and find out the common node (this is our required node).

More solutions here.

I like the solution 5 as well from solution lists. It is simply reversing the list A (6->5->4->-3->2->1).

So when we are traversing list 2 now we will get like 11->12->4->3->2->1->end (as the pointers are reversed). So we are actually calculating length(sum) of non-common part, And we already know length of 2 lists.

length A=A’+C(Common part)
length B=B’+C
and A’+B’=L (length of non-common part we found by traversing list B after reversing list A)

Three unknowns(A’, B’ and C) and three equations. Can’t get simpler than this.

Time complexity of an Algorithm- Basic

Whenever we write a piece of code or algorithm to solve a problem, we need to understand how much time will the algorithm take to solve the given problem. This is important information as we have to make a decision based on amount of input data, that if the given solution will be sufficient or needs further improvement. The time complexity is mostly expressed in big O notation, called as- order of.

In simple words, complexity of the code can be known by checking amount of times a particular piece of code gets executed and how that is dependent on input data.

1. Lowest level of time complexity (hence best), which can be achieved is constant time complexity or is Order of 1, O(1). This means the time taken by algorithm will always be constant irrespective of input data.

e.g. Find sum of N consecutive numbers.

No matter what the value of N is, we just need to find (N*(N+1)/2).

OR print a string str 100 times.

for (i=0 to 100){print str;};

In above cases the code will take a constant time independent of the input.

2. Second best complexity that can be achieved by a code is of order of log n or O(log n). If you are familiar with logarithms, you already know what I am talking about. If you are not comfortable with logs, read on. When we say log n, we are by default assuming base as 2 (if not specified otherwise).

So if I say

log (base 2) n=x, this can also be thought of: n=2^x.

For further simplicity, all we are trying to say here is, that with every iteration of the code, the previous step problem dataset gets reduced to half (½).

A classic example for this would be binary search. We have a sorted array and we are trying to find put a number.

Array arr={1,3,6,8,11,13,17, 22, 28}

search if x=3 is available or not

Algo-

1. take 2 variables start and end at start(1) and end(28) of array.
2. Find mid of the array (position 5, No 11).
3. Check if it is the desired number x
4. if No at mid position is >x, set end to mid, else set start to mid.
5. Repeat 2 to 4
Notice that every time we execute step 4, we are reducing our data set from previous step to half.

3. Complexity of order N, O(N)that is directly proportional to number of input data set. Note when we say O(N), actual complexity can be 1*N, 2*N, 100*N, k*N where k is any constants which does not change with N.

A very simple example here would be searching a number in a random array

Array arr={4, 3, 88, 15, 67, 13, 56}

find x= 15

we just need to loop through whole array once (in worst case), from 0 to N and check if the number is available at the position.

4. O(n log n).If you look carefully, we have just added above 2 complexities for log n and n. So we are talking about a code which has complexity of log n and gets executed n number of times.

A very simple example is of tree sorting, where we create a binary search tree for a given array. We know insertion in binary search tree taken log n time. Now instead of adding just one element, we are trying to add n elements (whole array), so we are talking about n* log n complexity.

5. O (n^2). If you have n elements as input, and you have to loop twice all these elements, we are using n*n steps. Example would be bubble or selection sorting

Array arr={4, 3, 88, 15, 67, 13, 56}

Task- sort the array

for ( i=0 to N-1)
for(j=i+1 to N-1)
{
if(arr[i] is greater than arr[j])
swap(arr[i],arr[j]);
}