Updating Java version in Linux

Recently I upgraded my Java version from 9 to 10 on a Linux machine. I was able to set JAVA_Home in .bashrc file properly but version did not get updated when checked java -version on shell.

Following commands helped to change the system’s Java version

sudo update-alternatives --install "/usr/bin/java" "java" "/path/to/jdk-10.0.1/bin/java" 1

And then

sudo update-alternatives --config java

This showed all the Java versions available and I was able to choose correct version i.e. 10.
Similary executed same commands for java compiler.

sudo update-alternatives --install "/usr/bin/java" "java" "/path/to/jdk-10.0.1/bin/javac" 1
sudo update-alternatives --config javac

Designing a Solution with AWS

When one goes for a Cloud based solution with solution provider like Amazon AWS, there are 2 things which are important. One, you need to have a clarity on what you are trying to achieve, and second is understanding of the services being provided by the provider.

Both the aspects are equally important. AWS provides plethora of services which can amuse at the same time confuse one. You might be tempted to use services which might not be required for your project and unnecessarily adds to the cost. At the same time if services not used with proper understanding, can backfire in terms of output and cost. For example, in one of my projects, incorrect implementation of autoscaling ended up running unused servers adding to cost instead of saving it.

Additionally, one need to be aware of all the capabilities of the service provider, for example, what all database and backup services we can use, can we use caching services, monitoring services provided by the service provider. Otherwise you will end up putting in unnecessary effort in rebuilding the wheel.

Here is a good starting point for AWS usage –

Features in Java 9

Java 9 might not be adding major changes like Java 8 which had features like Lambda or Streams or Interface modification, but it does have some good additions.

JShell: If you have worked in python or ruby, you might be aware of shell features these languages provide to quickly run and analyze commands and snippets. Java has come up with similar shell as JShell in Java 9 where you can evaluate the system.

Java Modules/ Jigsaw: Java has taken further step towards modularization. You can define your application in forms of modules and tell JVM what all modules will be used by current module, and what all modules will be exposed which can be used by others.

Private Methods in Interfaces: Java 8 gave us liberty to add method definitions in form of static and default implementations. With Java 9, one can add private methods to help us with organising our code better.

Additional Reads

http://www.baeldung.com/new-java-9
https://www.pluralsight.com/blog/software-development/java-9-new-features

AWS CloudFormation

When you are setting up an environment on AWS cloud, you need to go through many steps, like creation of IAM roles, Security groups, Databases, EC2 instances, load balancers etc. Often one resource is dependent on other and hence you have to create components one by one which can be time consuming. With Cloudformation scripts one can easily get the deployment steps automated. And most importantly, the script is reusable any number of times. So if I want to replicate a stage setup on production or another setup in another region, it is easily possible.

One can create template in JSON or YML formats. The template is submitted to cloud formation which executes the template and create the stack which is actual environment with all the mentioned components.

Another important thing is that you can not only create infrastructure, but also do required settings. For example, I needed to get setup for application done on EC2, which I was easily able to do with UserData section.

Here is an example

Resources: 
    AppNode1: 
        Type: AWS::EC2::Instance
        Properties:
            InstanceType: XXXXX # type here
            ImageId: ami-XXXX # any ami here
            KeyName: XXXX # name of the key if already exising or create a new one
            IamInstanceProfile: !Ref InstanceProfile
            NetworkInterfaces:
            - AssociatePublicIpAddress: true
              DeleteOnTermination: true
              Description: ENI for bastion host
              DeviceIndex: '0'
              SubnetId: subnet-XXXXX
              GroupSet:
              - !Ref AppNodeSG
            UserData:  
              "Fn::Base64":
                "Fn::Sub": |
                  #!/bin/bash
                  cd /root/
                  apt-get update
                  apt-get -y install awscli
                  aws s3 cp s3://XXXX/XXXX.XXX ~/some location
                  #One can install servers, download wars and deploy at runtime
    AppNode2: 
        Type: AWS::EC2::Instance
        Properties:
            # create another instance
    AppNodeSG: 
        # Security group to give access to ssh and port 80
        Type: AWS::EC2::SecurityGroup
        Properties: 
            GroupDescription: SecurityGroup for new AppNode
            VpcId: vpc-XXXXX
            SecurityGroupIngress:
            - IpProtocol: tcp
              FromPort: 80
              ToPort: 80
              CidrIp: 0.0.0.0/0
            - IpProtocol: tcp
              FromPort: 22
              ToPort: 22
              CidrIp: 0.0.0.0/0
    InstanceProfile:
        Type: AWS::IAM::InstanceProfile
        Properties: 
            Path: /
            Roles: [S3FullAccess] # S3FullAccess Role created Manually, so that my EC2 instance can access S3.

Features added in Java 8

I have already talked about 2 most important features added in Java 8 in my last posts, i.e. Lambda and Streams. Apart from these there were few interesting additions.

Method definition in interfaces: For long time, Java has avoided multiple inheritance due to dreaded diamond problem. They had given us interfaces which did not had method definition, which could have been implemented by a class in multiple numbers. Now, Java has relieved the restriction a bit by allowing static and default methods definition in interfaces.
What about multiple inheritance diamond problem? Well if compiler cannot decide which implementation to use it will thrown an exception.

ForEach: For Collections like Lists, instead of creating an explicit iterator one can use forEach.

Optional: There can be a situation when a method can return null, in such situations the value can be saved in optional class.

Good reads: https://www.journaldev.com/2389/java-8-features-with-examples
http://www.baeldung.com/java-8-new-features

Lambda expressions in Java

Lambda expressions in java is a way to implement functional interfaces. Functional interface is the one which only has single unimplemented method. This gives us freedom from creating a class or anonymous class. In addition Lambdas can be used with streams.

Here is an example. Download the code here

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;

public class Jaba8Lambda {

	interface Operations{
		public int operate(int a, int b);
	}
	
	public static void main(String s[]) {
		System.out.println("Welcome to Lambdas");
		
		// A simplest form of lambda, same interface, multiple implementations without a class
		Operations sum = (int a, int b)->a+b;
		Operations multiple = (int a, int b)->a*b;
		
		//Java8Lambda obj= new Jaba8Lambda();
		int mysum = sum.operate(2, 3);
		System.out.println(mysum);
		int myproduct = multiple.operate(2, 3);
		System.out.println(myproduct);
		
		// Another example using Thread Runnable
		// Earlier you would create a Runnable
		
		Runnable myrun = new Runnable() {
			
			@Override
			public void run() {
				System.out.println("starting:"+Thread.currentThread().getName());
			}
		};
		
		new Thread(myrun).start();
		
		// With Labdas we can do away with anonymous classes
		Runnable myrunLambda = ()->{
			System.out.println("starting:"+Thread.currentThread().getName());
		};
		new Thread(myrunLambda).start();
		
		// Lambdas can be used with streams as well
		List intlist= new ArrayList();
		intlist.add(2);
		intlist.add(7);
		intlist.add(12);
		intlist.add(17);
		Stream stream = intlist.stream(); 
		stream.forEach(i->{
			if(i%2==0) {
				System.out.println(i+" is even.");
			}
		});

	}
}

Java Streams

Java had introduced streams in Java 8 to help us fasten the development by providing ways to perform operations on data streams (collections) without writing bulky code.

In short, stream can be thought of stream or pipeline of data, on which you need to perform some operations. There can be two types of operations, intermediate and terminal. Intermediate operations are the ones which transforms the data in the stream, but the output is still a stream, for example filter or map operations. On the other hand terminal operations are ones which are applied collectively on the stream and output is something other than stream for example sum or reduce methods.

Here is an example usage to make it more clear. Or Download code from git repo

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Java8Streams {
	
	public static void main(String s[]) {
		System.out.println("Welcome to streams!");
		
		//There are multiple ways to create streams
		Stream stream = Stream.of(new Integer[]{1,2,3,4}); 
		Stream intStream = Stream.of(1,2,3,4);


		// Using streams to find sum of a integer list
		List intlist= new ArrayList();
		intlist.add(2);
		intlist.add(7);
		intlist.add(12);
		intlist.add(17);
		Stream streamnew = intlist.stream(); 
		
		// No more looping through the array
		Integer sum = intlist.stream().mapToInt(i->i).sum();
		System.out.println("Sum is :"+sum);
		
		// Lets check if the array contains even number, without looping
		boolean anyEven= intlist.stream().anyMatch(i->i%2==0);
		System.out.println(anyEven);
		
		// Check if all elements are even in the list
		boolean allEven = intlist.stream().allMatch(i->i%2==0);
		System.out.println(allEven);
		
		// Find first even number from the list
		Optional firsteven= intlist.stream().filter(i->i%2==0).findFirst();
		System.out.println(firsteven);
		
		// Now lets sum all even numbers
		int sumeven = intlist.stream().filter(i->i%2==0).mapToInt(i->i).sum();
		System.out.println(sumeven);
		
		// Lets use forEach to do operations on each element, say add a constant number to even numbers and print
		intlist.stream().forEach(i->{
			if(i%2==0) {
				i = i+100;
				System.out.print(i+",");
			}
		});
		System.out.println();
		//Better way to achieve the above
		intlist.stream().filter(i->i%2==0).forEach(i->{
			i = i+100;
			System.out.print(i+",");
		});
		
		System.out.println();
		// Simliar to filter, we gave map function
		List newlist = intlist.stream().map(i->{
			return i+100;
		}).collect(Collectors.toList());
		System.out.println(newlist);
		
		// Use reduce to perform some custom operation
		Optional output = intlist.stream().map(i->{
			return i+10;
		}).reduce((i,j)->{
			return i*j;
		});
		System.out.println(output);
		
	}

}

Machine Learning- Additional Concepts

In last post I talked about Machine Learning basics. Here I will take up a few additional topics.

Before getting into Recommendation Engine, lets look at a few concepts
Association Rule mining: At times we try to figure out association or relationship between 2 events. For example, we might say that when someone buys milk he might buy bread as well.

Support: Support is an indication of how frequently the events/ itemset appears together. If Event A is buying of milk and Event B is buying of Bread.
Support = number of times bread and milk are bought together/ total number of transactions

Confidence: Confidence is an indication of how often the rule has been found to be true.
number of times A and B occurs/ A occurrences = Supp(A U B)/Supp(A)

Lift = Supp(A U B)/ Supp(A)*Supp(B)

Apriori Algorithm: This algorithm tries to find out items that can be grouped together. It starts from bottom, ie tries to find subsets which are found together and moves up, based on the concept that it a set has items that share a relationship, all subsets also follow the relationship.

Market Basket Analysis: Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.

Recommendation Engine: When we buy something from an ecommerce store or watch a video on video streaming site, we get various recommendation for our next purchase or view. This is done by recommendation engines running behind the scenes.

There are two common types of recommendations used by these engines

User based/ Collaborative Filtering: If you have watched a video on a site like netflix, the system will look for more users who watched the same video and try to figure out the next most common video most of the users watched and recommend it to you.
Content based Recommendation: Instead of user behavior, the engine will try to segregate the contents based on its properties, for example, movie genre, actors, directors etc. So if you watched an action movie, the engine will try to look for movie with similar content, actors, directors, plot etc.

More:
https://www.analyticsvidhya.com/blog/2015/10/recommendation-engines/

Text Mining: Not always one will get clean and ready to use data which can be fed to algorithms to start analysis with. A common use case for text mining is Review. Customer provide review about the products or movies in plain english. It is a tricky task to make machine analyze these texts and figure out what is the opinion being shared.

Here are a few common techniques used for text mining.

Bag of Words: This is a simple modeling technique which try to figure out word counts. Data cleanup is done by removing punctuation, stop words (to, a, the etc), white spaces etc. After that algorithm runs and find out count of each word used. This technique is good for tagging documents by understanding common word patterns being used.

TD-IDF: Term frequency- Inverse Document Frequency
Term Frequency = number of times a term occurred in a document/ total terms in document
Inverse document frequency = total number of documents/ number of documents with term ‘T’

Sentiment Analysis: Here we try to figure out the sentiments, for example if a review provided is positive or negative. If there are terms such as- like, loved, enjoyed etc, we might want to consider it as positive review.

Time Series:
There are problems when we need to handled time based data for example stock prices. We will need to store data in time series where each value is associated to a point in time.
When analyzing time series data, there are a few interesting insights one looks for

Trend: one looks for data movement with respect to time in upward trends, downward trends, horizontal trends etc.
Seasonality: Looks for same pattern during the year. For example fruit prices go up during winter every year.
Cyclic patterns: At times the patterns go beyond the year, eg. a particular trend is seen in 3 years or so. The pattern might not have a fixed time period.
Stationarity: Refers to stability of mean value though there is no specific pattern.

Reinforcement Learning: This is based on reward and penalty approach. The agent / engine is intelligent, which is provided constant feedback if the decision taken was correct or not. Agent can improve future decisions based on feedback provided.
Deep Learning: Sometimes referred to as Artificial Neural network(ANN), one can think of it as artificially replicating a Biological Neural Network (BNN).
There are three core areas of a BNN, a Dendrite (receive signals from neurons), Soma(sums up all the signals) and Axon(signals are transmitted through Axon). ANN tries to replicate the BNN by implementing artificial neurons. Read More- https://en.wikipedia.org/wiki/Artificial_neural_network

Finding the cycle node in a list

We will need to break the problem into 2 parts, first find if a cycle is available in the list, second where does the cycle starts.

Finding the cycle can be done using tortoise and hare algorithm where we have two pointers in the start. first pointer will move at a speed of one node at a time and second pointer moves at a speed of 2. If these two pointers meet a point, we know there is a cycle.

Next step is to get count of elements in the cycle, which is easy after finding any node in cycle as we did using tortoise and hare algorithm. All we need to do is to take the node we found in the cycle, and move to next until we reach back to original node.

Last step, is to reposition two pointers at start. Next make one of the pointers get a lead of nodes equal to cycle size. Finally start moving the two pointers at a speed of 1, remember one pointer started from start node and second started at Nth node, where N is size of cycle. At the point where these meet is the node where the cycle starts.

Refer https://www.geeksforgeeks.org/detect-and-remove-loop-in-a-linked-list/

public ListNode detectCycle(ListNode a) {
		ListNode pt1=a;
		ListNode pt2=a.next;
		while(pt1!=pt2 && pt1.next!=null && pt2.next!=null && pt2.next.next!=null) {
			pt1=pt1.next;
			pt2=pt2.next.next;
		}
		if(pt1.next==null || pt2.next==null || pt2.next.next==null) return null;
		//we have a cycle, lets check number of nodes
		ListNode l = pt1;
		int count =0;
		pt1=pt1.next;
		while(pt1!=l) {
			count++;
			pt1=pt1.next;
		}
		count++;
		pt1=a;
		pt2=a;
		for(int i=0;i
					

Getting Started with R

There are multiple tools to get started with R, I have explored with Anaconda as that gives me flexibility of using Python in the same IDE. Within Anaconda you can either install R studio or use Jupyter notebook.

Once you have installed Anaconda, go to command prompt and create a new environment

conda env create -f requirements/my-environment.yml

After that activate the environment

source activate my-environment

OR
create a notebook

jupyter notebook test_R_notebook.ipynb

Once you have your R up and running, either in R studio or Jupyter notebook, here are a few basic commands to get started.

# Read file
mydata<-read.csv("path/filename.csv")
or
mydata<-read.csv("path/filename.csv", header=TRUE)
# Print data
mydata

# converting a text data to integers 
employeeDataNum$department<-as.numeric(employeeDataNum$department)

# remove NA values 
mydata<-na.omit(mydata)
mydata

# Replace NA with average
mydata$column[is.na(mydata$column)] <- round(mean(mydata$column, na.rm = TRUE))

# Plot bars for items 
data_subset<-mydata[c(7,8:20)]
data_subset<-ifelse(data_subset=='yes', 1,0)
barplot(data_subset)

# plot for boxlot
boxplot(data$column)


Check your library paths
Sys.getenv("R_LIBS_USER")

#Install a package 
install.packages("AER")
#with dependencies
install.packages("AER", dependencies=TRUE)
#include a library
library(dplyr)

# Getting specific columns
datanew<-mydata[,c(7,8,9,10)]

Divide data set into training and test sets
set.seed(4)
inTraing<-sample(2,nrow(mydata),prob=c(0.7,0.3),replace=T)
trainset<-mydata[inTraing==1,]
testset<-mydata[inTraing==2,]

# Applying alog on training data
linermodel<-lm(trainset$Other_players~.,data = trainset)
linermodel

# Predict for test ddata
predict<-predict(linermodel,testset)

# plot
testsubset<-testset[1:100,]
plot(testsubset$Other_players[1:100], type="l")
lines(predict[1:100],col="red")

# Finding correlation among columns
correlation <- cor(mydata)
install.packages('corrplot', dependencies=TRUE)
library(corrplot)
corrplot(correlation,type='lower')

# Subsetting data based on some conditiom
employee_left<-subset(employeeData, left==1)
employee_left

# More plotting
plot(employeeData$salary)
hist(employeeData$last_evaluation)

# Summary
summary(employeeData)

# creating decision tree
library(rpart)
my_tree<-rpart(formula = formulacolumn ~ .,data=traindata)
plot(my_tree, margin=0.1)
text(my_tree,pretty=T,cex=0.7)

# Confusion matrix

predtree<-predict(my_tree,testdata,type="class")
install.packages('e1071', dependencies=TRUE)
library(caret)
confusionMatrix(table(predtree,testdata$leftlibrary(randomForest)))

# using random forest for analysis
library(randomForest)
employee_forest<-randomForest(left~.,data=traindata)
predforest<-predict(employee_forest,testdata,type="class")
confusionMatrix(table(predforest,testdata$left))

# using naive bayes
library(e1071)
employee_naive<-naiveBayes(left~.,data=traindata)
pred_naive<-predict(employee_naive,testdata,type="class")
confusionMatrix(table(pred_naive,testdata$left))

# using svm
employee_svm<-svm(left~.,data=traindata)
pred_svm<-predict(employee_svm,testdata,type="class")
confusionMatrix(table(pred_svm,testdata$left))