Category Archives: Uncategorized

Getting Started with R

There are multiple tools to get started with R, I have explored with Anaconda as that gives me flexibility of using Python in the same IDE. Within Anaconda you can either install R studio or use Jupyter notebook.

Once you have installed Anaconda, go to command prompt and create a new environment

conda env create -f requirements/my-environment.yml

After that activate the environment

source activate my-environment

OR
create a notebook

jupyter notebook test_R_notebook.ipynb

Once you have your R up and running, either in R studio or Jupyter notebook, here are a few basic commands to get started.

# Read file
mydata<-read.csv("path/filename.csv")
or
mydata<-read.csv("path/filename.csv", header=TRUE)
# Print data
mydata

# converting a text data to integers 
employeeDataNum$department<-as.numeric(employeeDataNum$department)

# remove NA values 
mydata<-na.omit(mydata)
mydata

# Replace NA with average
mydata$column[is.na(mydata$column)] <- round(mean(mydata$column, na.rm = TRUE))

# Plot bars for items 
data_subset<-mydata[c(7,8:20)]
data_subset<-ifelse(data_subset=='yes', 1,0)
barplot(data_subset)

# plot for boxlot
boxplot(data$column)


Check your library paths
Sys.getenv("R_LIBS_USER")

#Install a package 
install.packages("AER")
#with dependencies
install.packages("AER", dependencies=TRUE)
#include a library
library(dplyr)

# Getting specific columns
datanew<-mydata[,c(7,8,9,10)]

Divide data set into training and test sets
set.seed(4)
inTraing<-sample(2,nrow(mydata),prob=c(0.7,0.3),replace=T)
trainset<-mydata[inTraing==1,]
testset<-mydata[inTraing==2,]

# Applying alog on training data
linermodel<-lm(trainset$Other_players~.,data = trainset)
linermodel

# Predict for test ddata
predict<-predict(linermodel,testset)

# plot
testsubset<-testset[1:100,]
plot(testsubset$Other_players[1:100], type="l")
lines(predict[1:100],col="red")

# Finding correlation among columns
correlation <- cor(mydata)
install.packages('corrplot', dependencies=TRUE)
library(corrplot)
corrplot(correlation,type='lower')

# Subsetting data based on some conditiom
employee_left<-subset(employeeData, left==1)
employee_left

# More plotting
plot(employeeData$salary)
hist(employeeData$last_evaluation)

# Summary
summary(employeeData)

# creating decision tree
library(rpart)
my_tree<-rpart(formula = formulacolumn ~ .,data=traindata)
plot(my_tree, margin=0.1)
text(my_tree,pretty=T,cex=0.7)

# Confusion matrix

predtree<-predict(my_tree,testdata,type="class")
install.packages('e1071', dependencies=TRUE)
library(caret)
confusionMatrix(table(predtree,testdata$leftlibrary(randomForest)))

# using random forest for analysis
library(randomForest)
employee_forest<-randomForest(left~.,data=traindata)
predforest<-predict(employee_forest,testdata,type="class")
confusionMatrix(table(predforest,testdata$left))

# using naive bayes
library(e1071)
employee_naive<-naiveBayes(left~.,data=traindata)
pred_naive<-predict(employee_naive,testdata,type="class")
confusionMatrix(table(pred_naive,testdata$left))

# using svm
employee_svm<-svm(left~.,data=traindata)
pred_svm<-predict(employee_svm,testdata,type="class")
confusionMatrix(table(pred_svm,testdata$left))

Tuning Tomcat

There can be cases when you face problems like server overloaded or underloaded but requests being rejected by Tomcat (or any other application/ web server). All these issues drill down to incorrect tuning of the server. Here is an interesting case study https://medium.com/netflix-techblog/tuning-tomcat-for-a-high-throughput-fail-fast-system-e4d7b2fc163f

From my personal experience, I found a few important parameters to be considered (specific to tomcat but other servers might have similar values)
maxThreads=”50″
maxConnections=”50″
acceptCount=”100″

maxThreads are actual worker threads which will actually execute the request or perform the requested operations. Setting this up correctly is tricky, as a value too high, means a lot of processing, hence CPU and memory can choke up. On the other hand, a value too low would mean we are not using server capabilities completely but still refusing requests as our all available threads are busy.

maxConnections are connections server is accepting. This will mostly depend on traffic you are expecting.

acceptcount is beyond maxConnections. Any requests which cannot be accommodated as a new connection will wait in a queue whose size is provided by acceptcount. If a request is received beyond acceptcount, it will be rejected by server.

In short, the total number of requests a server can handle at a time is acceptcount + maxconnections. And maxthread are actually threads fulfilling these requests.

More details- https://tomcat.apache.org/tomcat-7.0-doc/config/http.html
https://www.mulesoft.com/tcat/tomcat-connectors

Using Java class based configuration with Spring

In last few posts about Sring, I have used XML based configuration. But offlate I have figured out that it is easier to use Java based configuration. Here is how it is done for a simple spring mvc application

1. Firstly you will tell web.xml that you will use which class for configuring spring

<servlet>
<servlet-name>dispatcherServlet</servlet-name>
<servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
<init-param>
<param-name>contextClass</param-name>
<param-value>org.springframework.web.context.support.AnnotationConfigWebApplicationContext</param-value>
</init-param>
<init-param>
<param-name>contextConfigLocation</param-name>
<param-value>com.myapp.config.MyConfig</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>

<servlet-mapping>
<servlet-name>dispatcherServlet</servlet-name>
<url-pattern>/app/*</url-pattern>
</servlet-mapping>

2. You will mark your class with annotation @Configuration

3. For MVC application @EnableWebMvc

4. For component scan- to tell your spring application where to find compoenent @ComponentScan(basePackages = “com.app.my com.app.service com.app.aspect”)

5. If you are using AOP @EnableAspectJAutoProxy

So a sample file would look like

@Configuration
@EnableAspectJAutoProxy
@EnableWebMvc
@ComponentScan(basePackages = "com.app.my com.app.service com.app.aspect")
public class MyConfig extends WebMvcConfigurerAdapter {


	@Bean
	public UrlBasedViewResolver urlBasedViewResolver() {
		UrlBasedViewResolver res = new InternalResourceViewResolver();
		res.setViewClass(JstlView.class);
		res.setPrefix("/WEB-INF/");
		res.setSuffix(".jsp");

		return res;
	}

	@Override
	public void addResourceHandlers(final ResourceHandlerRegistry registry) {

		registry.addResourceHandler("/fonts/**")
				.addResourceLocations("/fonts/").setCachePeriod(31556926);
		registry.addResourceHandler("/css/**").addResourceLocations("/css/")
				.setCachePeriod(31556926);
		registry.addResourceHandler("/images/**")
				.addResourceLocations("/images/").setCachePeriod(31556926);
		registry.addResourceHandler("/js/**").addResourceLocations("/js/")
				.setCachePeriod(31556926);
	}


	@Bean
	public CommonsMultipartResolver multipartResolver() {
		CommonsMultipartResolver mr = new CommonsMultipartResolver();
		mr.setMaxUploadSize(50000000);
		return mr;
	}


}