Checking postgres server logs

At times you might want to see what all queries had hit your database, specially while testing and analyzing a solution. Similar need occurred recently in a project using postgres database. One can simply set log_statement flag to ‘all’ in postgresql.conf

postgres@server:/etc/postgresql/9.5/main$ vi postgresql.conf

Look for

log_statement = ‘all’ # none, ddl, mod, all

You can also set log_destination in the same file. By default logs will be at

postgres@server:/var/log/postgresql$ ls

And finally restart postgres server

/etc/init.d/postgresql restart

Please note enable logs on production server is not recommended in normal scenarios as they consume a lot of space.

Node vs Java Comparison

A good read on Node vs Java https://rclayton.silvrback.com/speaking-intelligently-about-java-vs-node-performance. Explains about the way Java and Node code will work to handle load.

My personal opinion is that programming language is just a tool to deliver the final solution. Only in specialized cases, where we know that a particular language has proven history of solving similar kind of problems, hence has sample code, libraries, knowledge base, forums to help, we should consider that as part of solution. Otherwise non-technical reasons like client preferences, team’e expertise will come into picture.

SOLID Principles for object oriented design

There are many best practices and principles figured out by developers and architects for object oriented design. Robert Martin has intelligently put a subset of these good practices together, and gave them acronym SOLID which helps easy remembrance.

Single responsibility principle: A class should handle only one single responsibility and have only one reason for change. For example a class “Employee” should not change if there a change in project or some reporting details.

Open Closed principle: Code should be open for extension but closed for modification. If you want to add a new type of report in the system, you should not be changing any existing code. More here

Liskov substitution principle: “objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.” So if we have Employee class, which is extended by Manager. We should be able to use Manager instead of Employee and all the Employee methods like calculate Salary, generate annual report etc should work without any issues. Say if there is an object like “ContractWorker” that does not support a few functions of Employee like annual report, one should be careful not to make it subtype of Employee.

Interface Segregation principle: “no client should be forced to depend on methods it does not use”. Coming back to previous example, if “ContractWorker” does not need to support annual report, we should not force it to implement an iEmployee interface. We should break the interfaces say iReport and iEmployee, iEmployee can extent iReport and iContractWorker should implement only iReport. iReport can further be divided into reporting types if required.

Dependency Inversion principle: This one seems to be one of my favorite as I have written about it here, here, here and here. This one indeed is one of the most important design patterns which can be followed to make the code loosely coupled and hence making it more maintainable (golden rule- low coupling + high cohesiveness). In traditional programming, when a high level method calls a low level method, it needs to be aware of the low level method at compile time, whereas using DI we can make high level method depend on an abstraction or interface and details of implementation will be provided at run time, hence giving us freedom to use which implementation to be used. Coming back to my previous example, I can have multiple implementations of Employee Reporting, iReport. Some implementation need and excel report, other might need a PDF reporting, which can be decided at runtime.

Tuning Tomcat

There can be cases when you face problems like server overloaded or underloaded but requests being rejected by Tomcat (or any other application/ web server). All these issues drill down to incorrect tuning of the server. Here is an interesting case study https://medium.com/netflix-techblog/tuning-tomcat-for-a-high-throughput-fail-fast-system-e4d7b2fc163f

From my personal experience, I found a few important parameters to be considered (specific to tomcat but other servers might have similar values)
maxThreads=”50″
maxConnections=”50″
acceptCount=”100″

maxThreads are actual worker threads which will actually execute the request or perform the requested operations. Setting this up correctly is tricky, as a value too high, means a lot of processing, hence CPU and memory can choke up. On the other hand, a value too low would mean we are not using server capabilities completely but still refusing requests as our all available threads are busy.

maxConnections are connections server is accepting. This will mostly depend on traffic you are expecting.

acceptcount is beyond maxConnections. Any requests which cannot be accommodated as a new connection will wait in a queue whose size is provided by acceptcount. If a request is received beyond acceptcount, it will be rejected by server.

In short, the total number of requests a server can handle at a time is acceptcount + maxconnections. And maxthread are actually threads fulfilling these requests.

More details- https://tomcat.apache.org/tomcat-7.0-doc/config/http.html
https://www.mulesoft.com/tcat/tomcat-connectors

Types of NoSQL Databases

NoSQL databases can be divided into 4 major type

Key-Value: The simplest one, can be thought of as a hashmap. Data can grow exponentially without impacting performance much, as long as your keys are unique.

Example: Redis, Riak

Document Based: This is kind of an extension to key-value format, by providing a proper format to the value/ document being saved. Meta-Data is provided to make sure documents are tagged and searchable.

Example: MongoDB, CouchDB

Column Based: In contrast to Row Based storage of normal RDBMS, a column based storage, keeps data stored columnwise. This gives an advantage for searching data based on columns easily and at the same time lets your data grow upto large levels by supporting distribution of data.

Good description of Column-based storage: https://en.wikipedia.org/wiki/Column-oriented_DBMS

Example: Cassandra, Vertica

Graph Based: This kind of database is ideal for data that is connected to each other in some logical way. Or in simple words, if you can represent your data in form of graph. One good example is A is friend of B, B is friend of C, so we can recommend A to be friends with C.

Example: Neo4J, OrientDB

Additional reads

https://www.3pillarglobal.com/insights/exploring-the-different-types-of-nosql-databases

http://opensourceforu.com/2017/05/different-types-nosql-databases/

http://www.jamesserra.com/archive/2015/04/types-of-nosql-databases/

https://en.wikipedia.org/wiki/NoSQL

CAP Theorem (you can choose 2 of the three- Consistency, Availability and Partition tolerance) Based analysis of NoSQL databases

http://blog.nahurst.com/visual-guide-to-nosql-systems

Generating ER diagram from database -2

Sometime back I wrote about DBvisualizer to generate schema ER design from database.

Here is another way by using schemaspy.

http://schemaspy.sourceforge.net/

This is a simple java based tool/ jar file. As per example given in link above, all you need to run the jar file providing database access details.

java -jar schemaSpy.jar -t dbType -db dbName [-s schema] -u user [-p password] -o outputDir

You might want to give database drivers jar file path. For example, for Postgres

java -jar /home/kamal/pathto/schemaSpy_5.0.0.jar -t pgsql -db dbnamehere -s public -u dhusername -p dbpassword -host localhost -port 5432  -o /home/kamal/outputdir -dp /home/kamal/pathto/postgresql-9.3-1104.jdbc4.jar

Enterprise Architecture- Building the core model

One challenge often faced by organizations is that IT is often reactive rather than guiding the operations. You bring in IT once you face a problem and build an IT solution. Once solution is build, struggle starts to integrate with other parts/ applications. A lot of time is than spent on making different pieces work together, which can be avoided if proper EA practices are in place.

Ideally, IT should look at the system and come up with opportunities of improving existing system like automating ordering system and adding new services like moving to mobile platforms.

IT and Business can than prioritize the solutions/ projects based on value addition. Having a big picture in front, it will be easier to take decisions and less time will be spent in making things communicate and work with each other.

Building a core model is important so that newer services can easily gets integrated. For example if centralized data handling and services to share data securely are already in place, getting a new mobile app to market is much easier than in a scenario where we do not have any such centralized solution in place already.

Creating the core IT model is not easy. You need to take a call what to keep as core and what should be customizable. As a rule of thumb, identify what is fixed and what can be changed/ customized in your business. Based on this information, we need to design which part of design is fixed and which is flexible. For example, in a particular business, product information might be centralized but sales can be customized.

In addition, core model need to take care of that fact which processes can be standardized and what data will be centralized and shared. More- http://kamalmeet.com/architecture/enterprise-architecture-manage-your-data-and-processes/

Enterprise Architecture- Manage your data and processes

Any Enterprise architecture needs to take care of two important things, processes and data. A standardized process makes sure certain operations are done in a certain way no matter who is performing them. Data ofcourse is a very important part for any organization, it helps in every aspect of business from fulfilling sales orders, maintaining inventory, making decisions for future etc. So it is important that data is shared across units in effective and secured manner.

Based on the business, need for data sharing and process standardization might vary. We know a standard process adds to predictability (less flexible = agile) but might not work in cases where innovation and flexibility is needed, for example in sales or research. So if we are looking at a business like McDonald’s, we know that each unit needs to follow similar process. Hence process will be part of my core architecture. But in case we are dealing with a Insurance sales business, where each unit might need a different strategy, we will not be be standardizing the process to detail level.

Similarly, decisions needs to taken on centralization of data. For example in a car manufacturing and sales unit, it is important to keep data on inventory, sales, production in sync. Whereas, for an insurance company, it might need flexibility of keeping car insurance and personal insurance data separately. Though you might need to keep products available information at common place. Nonetheless, the data definitions should be strict through out the organization, a completed sales mean same in all aspects.

Based on decisions made as per above analysis, we will be able to create our core architecture effectively. We will know what to add to core and what to keep flexible. The good core design will help business in maintainability and scalability. Adding a new business unit (or a new product or service) and integrating to existing business will depend on readiness of architecture we have finalized.

Django REST Framework (DRF)- Getting started

This post assumes that you have some background knowledge on python and Django, and you know about setting up virtual environment and getting a Django environment up and running.

I will briefly go about setting up the environment first.

Setup Guide

1. Install virtual environment 
pip install virtualenv
2. setup a project folder 
virtualenv myproj/
OR 
setup project folder with specific python version 
virtualenv myproj --python=/usr/bin/python3.5
3. Activate virtual environment
source myproj/bin/activate

Reference to virtual environment 
http://python-guide-pt-br.readthedocs.io/en/latest/dev/virtualenvs/

4. Once inside Virtual environment, install django  
cd mproj/
pip install django
5. Install rest framework
pip install djangorestframework
6. Optional - Swagger to view APIs
pip install django-rest-swagger
7. Create a django project
django-admin startproject mysite

By now we have a django project ready- mysite. You can open it up in your favorite editor.

Look for settings.py (mysite/mysite) and modify Installed apps to include rest framework

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'rest_framework_swagger',
]

If you are planning to use swagger add these to urls.py (same folder as settings)

from django.conf import settings

at top.
And

if settings.DEBUG:
    from rest_framework_swagger.views import get_swagger_view
    schema_docs_view = get_swagger_view(title='Mysite API')
    urlpatterns += [
        url(r'^__docs__/$', schema_docs_view),
    ]

at the end.

Let’s create a sample app now. Go to shell and create the app

cd mysite/
python manage.py startapp employees

If you will look at the editor, you will find employees app with default folder structure and files added.

You will find an empty models.py. This is where we will define our database entities or tables. Lets get started and create a simple Employee model

import uuid
from django.db import models

class Employee(models.Model):
	id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
	name = models.CharField(max_length=256)
	title = models.CharField(max_length=256)
	department = models.CharField(max_length=256)

Next we need to create a serializer. Serializers help us convert model data to (and from) a required format. More on serializers https://docs.djangoproject.com/en/1.11/topics/serialization/

Create a serializers.py in employees folder (parallel to models.py) and add

from rest_framework import serializers
from .models import Employee

class EmployeeSerializer(serializers.ModelSerializer):
    class Meta:
	    model = Employee
	    fields = (
	        "id",
	        "name",
	        "title",
	        "department"
	    )

You can see we have kept the serializer simple for this example. We are simply telling the serializer to use Employee model and given the fields we will require.

Next we will create a view in views.py

from rest_framework import viewsets
from .serializers import EmployeeSerializer
from .models import Employee

class EmployeeViewSet(viewsets.ModelViewSet):
	serializer_class = EmployeeSerializer
	queryset = Employee.objects.all()

All we have done here is to provide the serializer and queryset. If you want to understand what is happening behind the scenes, you need to look into viewsets.ModelViewSet provided by rest_framework. If you will open this class you will find following code.

class ModelViewSet(mixins.CreateModelMixin,
                   mixins.RetrieveModelMixin,
                   mixins.UpdateModelMixin,
                   mixins.DestroyModelMixin,
                   mixins.ListModelMixin,
                   GenericViewSet):
    """
    A viewset that provides default `create()`, `retrieve()`, `update()`,
    `partial_update()`, `destroy()` and `list()` actions.
    """
    pass

Lets take a quick look inside one of the mixins (ofcourse you will never modify this code as this is provided by rest_framework. All we will do is to use it).

class CreateModelMixin(object):
    """
    Create a model instance.
    """
    def create(self, request, *args, **kwargs):
        serializer = self.get_serializer(data=request.data)
        serializer.is_valid(raise_exception=True)
        self.perform_create(serializer)
        headers = self.get_success_headers(serializer.data)
        return Response(serializer.data, status=status.HTTP_201_CREATED, headers=headers)

    def perform_create(self, serializer):
        serializer.save()

    def get_success_headers(self, data):
        try:
            return {'Location': data[api_settings.URL_FIELD_NAME]}
        except (TypeError, KeyError):
            return {}

You can see, the CreateModelMixin provide us functionality to create a model instance. All it need from our code is to fetch serializer (which we have provided) and it will take care of the rest.

Also if you look closely to mixins provided by ModelViewSet. We have all the mixins required by our REST actions

Post- Create
Get- List/ Retrieve
Put/ Patch – Update
Delete- Destroy

Further reading reference for view – http://www.django-rest-framework.org/api-guide/generic-views/
Once we have our view in place, we need to configure the final chunk, that is url mapping.

Create a urls.py in employees

from django.conf.urls import url
from .views import EmployeeViewSet

urlpatterns = [
	url(
		r'^employees/$',
			EmployeeViewSet.as_view({
			'get': 'list',
			'post': 'create',
		}),
		name='employees',
	),
	url(
		r'^employees/(?P<pk>[a-f0-9-]+)/$',
			EmployeeViewSet.as_view({
		 	'get': 'retrieve',
                        'put': 'partial_update',
                        'delete': 'destroy',		
		}),
		name='employee-details',
	),
]

All we have done here is to map REST urls to our ViewSet methods. You may recollect that we have not actually written implementation of any of the methods to handle actions as they are provided to us by rest_framework.

Note that we have used partia_update for the put action. This means end user need not send all the fields while updating the object. We could have used ‘update’ instead of ‘partial update’ if we always needed to update all fields in the object.

Lastly, we need to tell django where to look for urls. So in mysite/mysite/urls.py, we will add

url(r'^', include('employees.urls')),

in urlpatterns. So it might look like

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^', include('employees.urls')),
]

and import

from django.conf.urls import url
from django.conf.urls import include

We are done with coding now. Lets make things to work

Create migration files from models
(myproj) kamal@system:~/myproj/mysite$ python manage.py makemigrations
Create the actual database 
(myproj) kamal@system:~/myproj/mysite$ python manage.py migrate
Finally run the server 
(myproj) kamal@system:~/myproj/mysite$ python manage.py runserver

Access the swagger in browser
http://localhost:8000/__docs__/

Understanding Business Intelligence

What is Business Intelligence?

Before jumping into the subject of BI, we need to understand a few related concepts.

Data– Every organization has data, and some has lots of data. For example an ecom site will have loads of data about search history, products viewed by customers, order details etc.

OLAP– Online Analytical Processing. It is member of BI family, that performs multidimensional analytics, calculations, trends analysis.

ETL: Another member of BI family. ETL stands for Extract, Transform and Load. So an ETL tool will perform these 3 operation on your data.

Data Mining: From heaps of data, one needs to mine the useful data, by doing some calculations, selection process etc. We can say Data mining helps us get information, which a BI tool can present in more usable form where user can slice and dice and get the desired perspective say which products are more in demand in a particular section of age group.

Big Data: With storage getting cheaper, a lot of companies are storing as much info as possible (say log of a customer activity on an e-commerce site). Big Data techniques help us analyze the huge data. One form of analysis might be data mining on Big Data or another might be just indexing data.

Reporting: Another important feature of BI is to present the data in dynamic form, where one can view data and further slice and dice to reach relevant conclusion.

Now once we understand all the related concepts, it is easy to understand BI

“Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”

http://www.gartner.com/it-glossary/business-intelligence-bi/

“Business Intelligence (BI) comprises the set of strategies, processes, applications, data, technologies and technical architectures which are used by enterprises[citation needed] to support the collection, data analysis, presentation and dissemination of business information.”

https://en.wikipedia.org/wiki/Business_intelligence

Club these definitions with the above concepts, we will understand that BI is not a new concept. It was always there and was being used in one form or other by companies. But with increasing data and competition, the concept has become more relevant now.

So What is BI?

The core idea is to simply use data to make good business decisions. Take the data and convert into information represented in a form which can make sense and help a Business to answer relevant questions. Which products are selling? Why people choose one product or service over another? What can we expect in future quarters or years?