Red Black Tree

Sometime back I wrote about Binary Search Tree (BST) and AVL tree. AVL tree helps us maintain search at an O(log N) complexity. The price we have to pay, is to maintain balance of tree with every insertion or deletion in tree.

Red black tree, is another balanced BST, but not as strictly balanced as AVL. A red black tree, has following properties.

1. Each node is marked either red or black.
2. Root is always black (does not impact actually- read on)
3. Each leaf node is null and black (It is optional to show leafs, but good for initial understanding).
4. Imp- No 2 red nodes can be directly connected, or no red node can have a red parent or child.
5. Imp- A path from root to (any) leaf will always have same number of black nodes.

Property 4 & 5 makes sure we always end up in a balanced BST.

here is an example red black tree. (image from wikipedia)

red_black

Here is a good illustration of how red black tree works – http://www.cs.usfca.edu/~galles/visualization/RedBlack.html

As red black tree gives a bit flexibility over AVL in tree balancing, insert operation is faster than AVL with less number of tree rotations. The insert speed comes at a cost of a tad slower searching, though it is still O(log N), the levels can be 1 or 2 more than AVL as tree is not strictly balanced.

Further readings

http://www.geeksforgeeks.org/red-black-tree-set-1-introduction-2/


https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

https://www.cs.auckland.ac.nz/software/AlgAnim/red_black.html

http://stackoverflow.com/questions/1589556/when-to-choose-rb-tree-b-tree-or-avl-tree

http://stackoverflow.com/questions/13852870/red-black-tree-over-avl-tree

Binary Search Tree and AVL Tree

In last post I mentioned about Binary Search tree. One challenge with Binary Search Tree or BST is when we try to insert a sorted list into it, we will end up a straight list structure rather than tree, i.e. we are adding all the nodes to one side. For example

bst_unbalanced

In this case, when we search for an element at extreme right, we are making O(N) comparisons.

To improve this situation, we have Balanced Binary search trees. AVL is one of such Balanced BST trees.

AVL tree has additional property over BST, that at any point of time, any 2 subtrees from a root can differ in height by order of 1. Or in other words, in a tree with height N, only last 2 level (N and N-1) of the tree can have leaf nodes without children.

AVL Tree maintains this balance using tree rotation. We can rotate a tree (or subtree) in left or right direction.

tree_rotation

The above example shows how left rotation works (if you look other way around, swap source and destination, it is right rotation).

More on tree rotation- https://en.wikipedia.org/wiki/Tree_rotation

Additional reads
https://en.wikipedia.org/wiki/AVL_tree

http://www.geeksforgeeks.org/avl-tree-set-1-insertion/

http://stackoverflow.com/questions/14670770/binary-search-tree-over-avl-tree

A very good visual representation of AVL tree Vs Binary tree. Try creating a tree with nodes 1,2,3,4,5,6,7,8,9 and see visual difference between 2 trees. http://visualgo.net/bst.html#

Binary Search Tree

To start with, lets understand what a tree datastructure is. A tree supports one way relation between objects of similar type. A very real example would be the way file system stores directories and files. A directory can have files as well as other directories which in turn can have more directories and files.

A Binary tree puts a restriction on tree that any node can have only 2 childs.

bst

A Binary search tree adds a property to binary tree that each node on left of parent node will have less key value than parent and each node on right will be higher key value. In turn, this also means at any point of time, for a node, whole left sub tree will have lower keys than the node value and right subtree will have higher values.

The above mentioned property helps in search operation. Lets say we are seaching for N

1. Go to root Node R.
2. Check if N is equal to R, return true.
3. if N>R, N=N.right (discard left tree)
4. if N

Kruskal’s Algorithm for Minimum spanning tree.

In last post I talked about Prim’s algorithm for minimum spanning tree.

Kruskal is another alorithm to solve same problem.

1. To start with, remove all edges from graph.
2. Add edges to a set S in sorted order with increasing order of weight.
3. Take an edge from set S (next minimum weight) and add to graph if (and only if) it is connecting non- cnnected nodes. Igore the edge if vertices are already connected.

So if I try to same problem for this graph.

graph

These will be the steps

kruskals

Another interesting depiction from wikipedia

kruskal2

Prim’s Algorithm for Minimum spanning tree.

If we look at wikipedia for Prim’s alogorithm, it states

“Prim’s algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph”

https://en.wikipedia.org/wiki/Prim%27s_algorithm

Let’s take this line word by word

Greedy algorithm: means this algorithm looks at the best option available at current stage and move stage by stage.

weighted graph: a graph where edges have a weight associated, for example distance as weight in case of a graph showing path between cities.

undirected graph: A graph which does not worry about direction of edges. A edge between vertices A and B would mean it can travel both ways.

Minimum spanning tree: In a graph, a minimum spanning tress is a tree, which has only edges which are required to connect all the nodes and keeping minimum weight.

Now Coming to Prim’s algo, again using wikipedia’s definition

    1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.
    2. Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
    3. Repeat step 2 (until all vertices are in the tree).

Let’s say we have a graph

graph

Here is step by step execution of Prim’s algorithm (note that for steps where we have multiple same minimum weight options, I am choosing randomly)

prims

Search and replace in linux vi editor

:%s/oldstring/newstring/gc 

This command will replace all the occurences for oldstring with newstring after confirming for each existence (c in the end tells vim to ask for confirmation).

More on search replace – http://vim.wikia.com/wiki/Search_and_replace

Some other useful commands are

/ for search 
n find next while search (after /)
N find previous while searcg (after /)

:$ last line of file
:0 first line of file

http://www.cs.colostate.edu/helpdocs/vi.html

Priority Queue

A queue is FIFO datastructure which means objects added first will be accessed first. Priority queue adds an element of priority with each object added to queue. And when elements are accessed they will be accessed based on this priority in increasing (min- priority queue) or decreasing (max- priority queue).

There are multiple ways we implement priority queue, basic one is to use queue, and just add priorit elelemnt. A queue has insertion and deletion operation at Order of 1 as elements are added and removed from front. But in case of priority queue, though we will add elements at front, but delete/ fetch based on priority, so we will need to traverse the whole list, hence O(N).

Start->Node1{obj, P3}->Node1{obj, P4}->Node1{obj, P1}->Node1{obj, P2}->End

Another, more efficient way to implement priority queue is using heap data structure.

Read here about – http://kamalmeet.com/algorithms/heap-data-structure/

As heap data structure automatically maintains priority (Min heap – Min priority queye/ Max heap – Max priority queue), it fits best to implement priority queue. As both insertion and deletion of objects in heap is O(log N) operation, same is true for priority queue implemented through heap.

Dijkstra’s Algorithm

Sometime back I talked about Graph data structure.

One of the important usage of graph is to find the shortest path from one node to another, for example in travelling salesman problem.

One important algorithm to find shortest path for between two nodes in graph is Dijkstra’s algo.

Here is how

Dijkstra_Animation

Image source wikipedia

1. Mark All node distance as infinity (to mark that node has not been traversed yet)
2. Start from source node, add all other nodes to a set contaning unvisited node
3. Traverse all neighboring nodes from current node N and calculate distance from previous node by adding distance[N] + distance from N to this node (edge weight)
4. When all neighbors are calculated, mark current node as visited.
5. If we have visited the destination node, exit with the distance till destination node.
5. Pull next node from set.
6. Goto 3.


Read more here

JVM: Memory management and Garbage collection

If you have looked into server configuration files, you would have seen terms like –

-Xms512m -Xmx1536m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:PermSize=786m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC

These are parameters we use to set up heap memory use by JVM to store objects in memory and make sure effective usage of garbage collection.

Garbage Collection:
Garbage collection or GC in Java is a way in which JVM reclaims the space in heap occupied by objects no longer in use. This is achived by Java in Mark and Delete approach (Mark objects no longer in use and then delete the marked objects, optionally compacting the survived objects to make all available space together for better usage).

Heap Memory: All the objects created by JVM are added to heap memory. -Xms512m defines the minumum heap memory requested by JVM and -Xmx1536m defines maximum.

Yound and Old Generation: Further heap memory is divided into 2 blocks- young generation and old generation for optimum GC. Young generation memory contains the newly generated objects and old generation contains older objects. The idea is that the newly created objects will be more prone to GC. The objects which have survived a few GC cycles (they are still in use), have more probability of surviving future GC cycles. A minor GC runs frequently which will clean up the young generation heap. The major GC runs less frequently and cleans up whole healp including old generation area.

-XX:NewSize=512m and -XX:MaxNewSize=512m defines memory setting for young generation GC.

Eden and Survivor: Young generation heap area is further divided into Eden space and survivor space. A newly created object is placed in Eden space and moves to survivor space if it survives a GC. A survivor space object which has survived multiple GCs is then moved to Old Generation area.

Stack memory: Apart from Heap memory, JVM also has stack memory which is allocated to a thread of execution and store local primitive variable and reference to variables in heap for that particular thread. Stack memory works in LIFO fashion and is short lived. Normally it is very small in size ~1mb by default, but can be set using -Xss flag.

Error:
OutOfMemoryError is thrown by JVM when it runs out of heap and StackOverFlowError is thrown when stack memory is full.

Permanent Generation:
Perm Gen is area where JVM defines application metadata about classes and methods. This is not part of heap memory. Can be set using -XX:PermSize=786m and -XX:MaxPermSize=1024m

GC Algorithms:
There are various algorithms which can be used by JVM while GC. For example -XX:+UseConcMarkSweepGCin above cofiguration tells JVM to use concurrent mark sweep algo, which helps GC in parallel to application execution, hence low impact visible on application – read here. With Java 7, we have Garbage-First Collector option, which helps further by splitting heap in multiple areas and doing GC with minimum impact on application- read here.

Useful reads

http://pubs.vmware.com/vfabric52/index.jsp?topic=/com.vmware.vfabric.em4j.1.2/em4j/conf-heap-management.html

http://www.journaldev.com/2856/java-jvm-memory-model-and-garbage-collection-monitoring-tuning

https://blog.codecentric.de/en/2014/01/useful-jvm-flags-part-8-gc-logging/