Find Median of an Array

The simplest way to find median of an array to sort it and find the middle element. But any good sorting algorithm would take N*logN time. Can we do better than that?

Hint: Quick sort.

In quick sort, we choose a random pivot and then place it in the correct sorted position in array, that is all element on left are less than pivot and all elements on right and higher.

Using the same approach we can find median of an array, without actually sorting it.

1. We need to find mid element of the array os size N. Say N=101, we have to find 51st element (for even N, we will need to find N/2 and N/2 + 1).
2. Choose a random pivot (say first element) and find its correct place in array.
3. Now from the two halfs, find the one where mid element from step 1 (51st in this case) will be present. (say first half has 25 element and second has 75, we will discard first half)
4. Repeat from step 2 with current choosen array half.

Worst case performance is N^2. For every element we are dividing one half is halving one or two elements.

Average Case performance is N, as we are discarding N/2 elements in each iteration.

N+N/2+N/4…..

N(1+1/2+1/4…)
N*C (some constant)

1-D Range search

Range Search is a very basic alogorithm used in multiple situations, like out of a set of 100 employees, find all whose salary falls between a specific range. As we are talking about sigle aspect (salary), we will use a 1-D range serach approach to solve this problem.

A simple algorithm would use a Balanced binary serach tree, with property that all the nodes in left subtree are smaller than root node and all right subtree node are higher, for any subtree. The balancing property make sure all the root nodes are at a distance on X or X-1, where X is order of lg(N), N being nodes in tree.

Here is an example

Lets say the number represent salary of employess in thousands. And the range we need to find is say 23 and 63. We will traverse tree for both lower and upper bound. Orange arrows signifies traversal for 23 and Green ones show traversal for 63.

This leaves us with three type of nodes.

1. Nodes not traversed and lie out of subrees of traversal path (white nodes)- can be ignored
2. Nodes in path of traversal (blue nodes)- may or may not be in range
3. Node lying between traversal path (Yellow nodes)- these are in range.

Using above information, we can deduce a simple algorithm

1. Input lower range L, and high range H
2. Create a output set S as range set
3. Traverse/ search for L in the tree (if L is smaller than node, move right, else move left)
3a. if the node being traversed is in range, add to set S
3b. If L is less than Node value, Add the whole right subtree of current node to the set S (as it is in range)
4. Traverse/ search for H in the tree (if H is smaller than node, move right, else move left)
4a. If the node being traversed is in range, add to set S
4b. If H is greater than Node value, Add the whole left subtree to the set S (as it is in range)

A-star Search

While you are playing a game (say rubik’s cube), each state can lead to multiple new states after user makes a move. Or in a graph path finding, one can choose multiple nodes from a given input node. Such situations can be represented by a game tree (graph).

Each state can result into multiple new states (will come to the numbers in a moment). The question is, which next state to choose. A* (A-star) Search algorithm can help me make a choice.

We need to understand the following terms before we try to understand the algorithm.

Open Set: Set of moves not evaluated yet.
Closed set: Set of moves already evaluated
Score/ Priority: higher (in some case lower) the score, the closer we are to the solution.

Score can be calculated based on problem we are trying to solve. For example in rubik’s cube, it might be the number of boxes/ rows already matched (say 10 indicates 10 rows of colors matched, will get higher priority than a state with score 7).

Algorithm

1. Define initial state as current state
2. If current state is solution state, return success and exit. Else move on to step 3.
3. Find all states possible from current state, which are not in closed set already and add to open set
4. Add current state to closed set.
5. set an open state with maximum score as current state and remove from open set.
6. Go to 2.

A more elaborate explanation can be found – https://en.wikipedia.org/wiki/A*_search_algorithm
http://web.mit.edu/eranki/www/tutorials/search/

Java Best Practices- String Literal Comparison

Whenever we create a String literal or a String constant, Java searches its String pool and if string is found, it will give reference to the String variable. Please note this is true only when we create a literal and not the object, i.e you do String str=”hello” and not String str=new String(“hello”);

Another point of importance is that if you compare string literals using ==, it might work because as mentioned above both the Strings might be refering to same object.

Example

String str=”hello “+”kamal”;
String str1=”hello”;
str1+=” kamal”;
String str2=”hello kamal”;
String str3=”hello kamal”;

System.out.println(str);
System.out.println(str1);
System.out.println(str2);
System.out.println(str3);
// all above prints hello kamal

System.out.println(str==str1); //false
System.out.println(str==str2); //true
System.out.println(str==str3); //true
System.out.println(str.equals(str1)); //true
System.out.println(str.equals(str2)); //true
System.out.println(str.equals(str3)); //true

Java Best Practices- String literals at left

It is always a good practice in Java to place String literals at left while comparing a String lteral with a String variable using equals or equalsIgnoreCase. The idea is if you keep variables at left, it might result in null pointer exception.

For example

String a=null;
System.out.println("0".equals(a)); //works fine
System.out.println(a.equals("0"	)); //results in null pointer exception

Initialize a matrix – efficiently

A simple code to initialize a matrix of N*N elements with all same number can be to create 2 for loops from 0 to N and set element to the required number. A silght improvement can be done to this code by

 for (int i = 0; i < N / 2; i++)
      for (int j = i; j < N - i; j++) {
        arr[i][j] = 0;
        arr[j][i] = 0;
        arr[N - 1 - i][N - 1 - j] = 0;
        arr[N - 1 - j][N - 1 - i] = 0;
 }

Using CheckStyle with Eclipse

CheckStyle is a development tool which helps you format your Java code with respect to industry accepted standards. You can create your own check list to format and check style of code, or simply use one provided by checkstyle.

For eclipse it is easy to install plugin that is available. Go help->Eclipse MarketPlace-> serach for checkstyle.

Once eclipse plugin is added, you will see option on right click of project or a file to run a checkstyle which will show error messages in panel (Window->Show View-> Others-> Search Checkstyle).

If you want eclipse to format (Ctrl+Shift+F) the code in desired format, you can add formatter XML by Project->Properties->Formatter->Import. One such default XML can be found at – https://github.com/sevntu-checkstyle/sevntu.checkstyle/blob/master/sevntu-checks/CheckstyleFormatterForEclipse.xml

Data Handling in version control systems

Here is a good read on how a Version control system would store different versions of a file.

http://ericsink.com/entries/time_space_tradeoffs.html

I guess mostly it would be Deltas + Keyframe approach.

Some more reads-

https://en.wikipedia.org/wiki/Version_control
http://betterexplained.com/articles/a-visual-guide-to-version-control/

Finding a peak in 2 Dimensional Array

In last post I talked about finding 1-D array peak.

A peak in a 2-D array is an element which has left, right, top and bottom elements lower than it. A simple approach will be to extend the 1-D array approach. Treat each row (or column) as 1-D array, and apply 1-D peak finding algorithm, which we know takes lgN operations. Once we find that we will check if this is 2-D peak as well (a 2-D peak is 1-D peak by default, but not vice versa). If yes, we are good, else continue with next row.

In case we have a N*N 2-D array, above algorithm gives me a time complexity of N*lgN.

Lets improve the above complexity by introducing divide and conquer approach.

1. Divide the matrix (2-D array) into 4 equal parts, divided on mid row and mid column.
2. Find the local peak (highest element) in row and column.
3 a. If local peak is found in horizontal column (we know left and right are small), check if top and bottom are small, if yes current element is 2-D peak, if no, choose the sub matrix which has higher (top or bottom) number than current element.
3 b. If local peak is found in vertical row (we know top and bottom are small), check if right and left are small, if yes current element is 2-D peak, if no, choose the sub matrix which has higher (left or right) number than current element.
4. After 3, we have got a matric of N/4*N/4 size of inital matrix. Repeat from 1 with new matrix.

1 2 3 4 5
1 9 7 5 3
2 3 6 5 3
3 2 4 8 1
1 9 2 3 7

7 is found as local maximum, but 9 on left is larger so we move ahead

Step 2
1 2 3
1 9 7
2 3 6

Repeating steps above give us 2-D peak

Complexity
Step 1: Check N+N elements to find the maximum= T(N/2)+ CN
Step 2: Check N/2+N/2 elements= T(N/4)+C(N/2)+CN
= T(N/8)+C(N/4)+C(N/2)+CN
…
=T(1)+CN(1+1/2+1/4+1/8+….1/N)
we get a geometric series which tends to 1.
Hence overall complexity O(N)

Finding a peak in 1 Dimensional Array

A peak in a 1-D array is simply an element which has left and right elements smaller than it.

1, 3, 4 ,6 ,9, 11, 14, 12, 7, 3, 2

14, clearly is peak here.

A straight forward algorithm would be to parse the array from left to right and keep checking if a[index]>=a[index+1] and a[index]>a[index-1], we have found a peak.

The above algorithm gives me a complexity of O(N).

An improvement over above approach to user divide and conquer approach, like we do in binary search.

1. Go to mid of the array Arr[N/2]
2. Check if it is peak (left and right are smaller)
3. check if left element is larger than mid element, if yes, NewArr= Arr[0..N/2-1]
4. else NewArr=Arr[N/2+1..N]
5. Repeat from 1 with NewArr

Complexity

T(N)=T(N/2)+C //at every step we will divide the array into 2, C is constant time taken in the opertaion
=T(N/4)+C+C
=T(N/8)+C+C+C
=T(N/2^k)+CK
… total number of steps is lg(N)
=T(1)+C*lg(N)

or O(lgN) complexity.