Types of NoSQL Databases

NoSQL databases can be divided into 4 major type

Key-Value: The simplest one, can be though of as a hashmap. Data can grow exponentially without impacting performance much, as long as your keys are unique.

Example: Redis, Riak

Document Based: This is kind of an extension to key-value format, by providing a proper format to the value/ document being saved. Meta-Data is provided to make sure documents are tagged and searchable.

Example: MondoDB, CouchDB

Column Based: In contrast to Row Based storage of normal RDBMS, a column based storage, keeps data stored columnwise. This gives an advantage for searching data based on columns easily and at the same time lets your data grow upto large levels by supporting distribution of data.

Good description of Column-based storage: https://en.wikipedia.org/wiki/Column-oriented_DBMS

Example: Cassandra, Vertica

Graph Based: This kind of databases are ideal for data which is connected to each other in some logical way. Or in simple words, if you can represent your data in form of graph. One good example is A is friend of B, B is friend of C, so we can recommend A to be friends with C.

Example: Neo4J, OrientDB

Additional reads

https://www.3pillarglobal.com/insights/exploring-the-different-types-of-nosql-databases

http://opensourceforu.com/2017/05/different-types-nosql-databases/

http://www.jamesserra.com/archive/2015/04/types-of-nosql-databases/

https://en.wikipedia.org/wiki/NoSQL

CAP Theorem (you can choose 2 of the three- Consistency, Availability and Partition tolerance) Based analysis of NoSQL databases

http://blog.nahurst.com/visual-guide-to-nosql-systems

Generating ER diagram from database -2

Sometime back I wrote about DBvisualizer to generate schema ER design from database.

Here is another way by using schemaspy.

http://schemaspy.sourceforge.net/

This is a simple java based tool/ jar file. As per example given in link above, all you need to run the jar file providing database access details.

java -jar schemaSpy.jar -t dbType -db dbName [-s schema] -u user [-p password] -o outputDir 

You might want to give database drivers jar file path. For example, for Postgres

java -jar /home/kamal/pathto/schemaSpy_5.0.0.jar -t pgsql -db dbnamehere -s public -u dhusername -p dbpassword -host localhost -port 5432  -o /home/kamal/outputdir -dp /home/kamal/pathto/postgresql-9.3-1104.jdbc4.jar

Enterprise Architecture- Building the core model

One challenge often faced by organizations is that IT is often reactive rather than guiding the operations. You bring in IT once you face a problem and build an IT solution. Once solution is build, struggle starts to integrate with other parts/ applications. A lot of time is than spent on making different pieces work together, which can be avoided if proper EA practices are in place.

Ideally, IT should look at the system and come up with opportunities of improving existing system like automating ordering system and adding new services like moving to mobile platforms.

IT and Business can than prioritize the solutions/ projects based on value addition. Having a big picture in front, it will be easier to take decisions and less time will be spent in making things communicate and work with each other.

Building a core model is important so that newer services can easily gets integrated. For example if centralized data handling and services to share data securely are already in place, getting a new mobile app to market is much easier than in a scenario where we do not have any such centralized solution in place already.

Creating the core IT model is not easy. You need to take a call what to keep as core and what should be customizable. As a rule of thumb, identify what is fixed and what can be changed/ customized in your business. Based on this information, we need to design which part of design is fixed and which is flexible. For example, in a particular business, product information might be centralized but sales can be customized.

In addition, core model need to take care of that fact which processes can be standardized and what data will be centralized and shared. More- http://kamalmeet.com/architecture/enterprise-architecture-manage-your-data-and-processes/

Enterprise Architecture- Manage your data and processes

Any Enterprise architecture needs to take care of two important things, processes and data. A standardized process makes sure certain operations are done in a certain way no matter who is performing them. Data ofcourse is a very important part for any organization, it helps in every aspect of business from fulfilling sales orders, maintaining inventory, making decisions for future etc. So it is important that data is shared across units in effective and secured manner.

Based on the business, need for data sharing and process standardization might vary. We know a standard process adds to predictability (less flexible = agile) but might not work in cases where innovation and flexibility is needed, for example in sales or research. So if we are looking at a business like McDonald’s, we know that each unit needs to follow similar process. Hence process will be part of my core architecture. But in case we are dealing with a Insurance sales business, where each unit might need a different strategy, we will not be be standardizing the process to detail level.

Similarly, decisions needs to taken on centralization of data. For example in a car manufacturing and sales unit, it is important to keep data on inventory, sales, production in sync. Whereas, for an insurance company, it might need flexibility of keeping car insurance and personal insurance data separately. Though you might need to keep products available information at common place. Nonetheless, the data definitions should be strict through out the organization, a completed sales mean same in all aspects.

Based on decisions made as per above analysis, we will be able to create our core architecture effectively. We will know what to add to core and what to keep flexible. The good core design will help business in maintainability and scalability. Adding a new business unit (or a new product or service) and integrating to existing business will depend on readiness of architecture we have finalized.

Django REST Framework (DRF)- Getting started

This post assumes that you have some background knowledge on python and Django, and you know about setting up virtual environment and getting a Django environment up and running.

I will briefly go about setting up the environment first.

Setup Guide

1. Install virtual environment 
pip install virtualenv
2. setup a project folder 
virtualenv myproj/
OR 
setup project folder with specific python version 
virtualenv myproj --python=/usr/bin/python3.5
3. Activate virtual environment
source myproj/bin/activate

Reference to virtual environment 
http://python-guide-pt-br.readthedocs.io/en/latest/dev/virtualenvs/

4. Once inside Virtual environment, install django  
cd mproj/
pip install django
5. Install rest framework
pip install djangorestframework
6. Optional - Swagger to view APIs
pip install django-rest-swagger
7. Create a django project
django-admin startproject mysite

By now we have a django project ready- mysite. You can open it up in your favorite editor.

Look for settings.py (mysite/mysite) and modify Installed apps to include rest framework

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'rest_framework_swagger',
]

If you are planning to use swagger add these to urls.py (same folder as settings)

from django.conf import settings

at top.
And

if settings.DEBUG:
    from rest_framework_swagger.views import get_swagger_view
    schema_docs_view = get_swagger_view(title='Mysite API')
    urlpatterns += [
        url(r'^__docs__/$', schema_docs_view),
    ]

at the end.

Let’s create a sample app now. Go to shell and create the app

cd mysite/
python manage.py startapp employees

If you will look at the editor, you will find employees app with default folder structure and files added.

You will find an empty models.py. This is where we will define our database entities or tables. Lets get started and create a simple Employee model

import uuid
from django.db import models

class Employee(models.Model):
	id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
	name = models.CharField(max_length=256)
	title = models.CharField(max_length=256)
	department = models.CharField(max_length=256)

Next we need to create a serializer. Serializers help us convert model data to (and from) a required format. More on serializers https://docs.djangoproject.com/en/1.11/topics/serialization/

Create a serializers.py in employees folder (parallel to models.py) and add

from rest_framework import serializers
from .models import Employee

class EmployeeSerializer(serializers.ModelSerializer):
    class Meta:
	    model = Employee
	    fields = (
	        "id",
	        "name",
	        "title",
	        "department"
	    )

You can see we have kept the serializer simple for this example. We are simply telling the serializer to use Employee model and given the fields we will require.

Next we will create a view in views.py

from rest_framework import viewsets
from .serializers import EmployeeSerializer
from .models import Employee

class EmployeeViewSet(viewsets.ModelViewSet):
	serializer_class = EmployeeSerializer
	queryset = Employee.objects.all()

All we have done here is to provide the serializer and queryset. If you want to understand what is happening behind the scenes, you need to look into viewsets.ModelViewSet provided by rest_framework. If you will open this class you will find following code.

class ModelViewSet(mixins.CreateModelMixin,
                   mixins.RetrieveModelMixin,
                   mixins.UpdateModelMixin,
                   mixins.DestroyModelMixin,
                   mixins.ListModelMixin,
                   GenericViewSet):
    """
    A viewset that provides default `create()`, `retrieve()`, `update()`,
    `partial_update()`, `destroy()` and `list()` actions.
    """
    pass

Lets take a quick look inside one of the mixins (ofcourse you will never modify this code as this is provided by rest_framework. All we will do is to use it).

class CreateModelMixin(object):
    """
    Create a model instance.
    """
    def create(self, request, *args, **kwargs):
        serializer = self.get_serializer(data=request.data)
        serializer.is_valid(raise_exception=True)
        self.perform_create(serializer)
        headers = self.get_success_headers(serializer.data)
        return Response(serializer.data, status=status.HTTP_201_CREATED, headers=headers)

    def perform_create(self, serializer):
        serializer.save()

    def get_success_headers(self, data):
        try:
            return {'Location': data[api_settings.URL_FIELD_NAME]}
        except (TypeError, KeyError):
            return {}

You can see, the CreateModelMixin provide us functionality to create a model instance. All it need from our code is to fetch serializer (which we have provided) and it will take care of the rest.

Also if you look closely to mixins provided by ModelViewSet. We have all the mixins required by our REST actions

Post- Create
Get- List/ Retrieve
Put/ Patch – Update
Delete- Destroy

Further reading reference for view – http://www.django-rest-framework.org/api-guide/generic-views/
Once we have our view in place, we need to configure the final chunk, that is url mapping.

Create a urls.py in employees

from django.conf.urls import url
from .views import EmployeeViewSet

urlpatterns = [
	url(
		r'^employees/$',
			EmployeeViewSet.as_view({
			'get': 'list',
			'post': 'create',
		}),
		name='employees',
	),
	url(
		r'^employees/(?P[a-f0-9-]+)/$',
			EmployeeViewSet.as_view({
		 	'get': 'retrieve',
                        'put': 'partial_update',
                        'delete': 'destroy',		
		}),
		name='employee-details',
	),
]

All we have done here is to map REST urls to our ViewSet methods. You may recollect that we have not actually written implementation of any of the methods to handle actions as they are provided to us by rest_framework.

Note that we have used partia_update for the put action. This means end user need not send all the fields while updating the object. We could have used ‘update’ instead of ‘partial update’ if we always needed to update all fields in the object.

Lastly, we need to tell django where to look for urls. So in mysite/mysite/urls.py, we will add

url(r'^', include('employees.urls')),

in urlpatterns. So it might look like

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^', include('employees.urls')),
]

We are done with coding now. Lets make things to work

Create migration files from models
(myproj) kamal@system:~/myproj/mysite$ python manage.py makemigrations
Create the actual database 
(myproj) kamal@system:~/myproj/mysite$ python manage.py migrate
Finally run the server 
(myproj) kamal@system:~/myproj/mysite$ python manage.py runserver 

Access the swagger in browser
http://localhost:8000/__docs__/

Understanding Business Intelligence

What is Business Intelligence?

Before jumping into the subject of BI, we need to understand a few related concepts.

Data– Every organization has data, and some has lots of data. For example an ecom site will have loads of data about search history, products viewed by customers, order details etc.

OLAP– Online Analytical Processing. It is member of BI family, that performs multidimensional analytics, calculations, trends analysis.

ETL: Another member of BI family. ETL stands for Extract, Transform and Load. So an ETL tool will perform these 3 operation on your data.

Data Mining: From heaps of data, one needs to mine the useful data, by doing some calculations, selection process etc. We can say Data mining helps us get information, which a BI tool can present in more usable form where user can slice and dice and get the desired perspective say which products are more in demand in a particular section of age group.

Big Data
: With storage getting cheaper, a lot of companies are storing as much info as possible (say log of a customer activity on an e-commerce site). Big Data techniques help us analyze the huge data. One form of analysis might be data mining on Big Data or another might be just indexing data.

Reporting: Another important feature of BI is to present the data in dynamic form, where one can view data and further slice and dice to reach relevant conclusion.

Now once we understand all the related concepts, it is easy to understand BI

“Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”

http://www.gartner.com/it-glossary/business-intelligence-bi/

“Business Intelligence (BI) comprises the set of strategies, processes, applications, data, technologies and technical architectures which are used by enterprises[citation needed] to support the collection, data analysis, presentation and dissemination of business information.”

https://en.wikipedia.org/wiki/Business_intelligence

Club these definitions with the above concepts, we will understand that BI is not a new concept. It was always there and was being used in one form or other by companies. But with increasing data and competition, the concept has become more relevant now.

So What is BI?

The core idea is to simply use data to make good business decisions. Take the data and convert into information represented in a form which can make sense and help a Business to answer relevant questions. Which products are selling? Why people choose one product or service over another? What can we expect in future quarters or years?

Steps for Designing a system from scratch

You are into a new project, and provided with some requirements. How would you go about designing the new system

We are in this situation a lot of times. So here I am trying to create a step by step guide from taking a requirement doc to finalizing a system architecture.

Stage 0: Before getting started with the design process, we need to make sure about the following.

– Do we have clear understanding of requirements?
– Are we creating something from scratch or enhancing and existing system. In later case we will have design and technology constraints from previous system?
– Have we identified non functional requirements- security, performance, availability etc.
– Have we identified all stakeholders and their role?
– Have we identified key players that will help in creating architecture (architects, Business Analysts, Product owners)
– Have we decided on time/ money to spent on design activities?
– Have we identified reference material? Do we have artifacts for similar design problems, from either inhouse or external sources?
– How are we going to maintain the design artifacts- wiki, git, svn, confluence etc. We will need to maintain versioning?
– Have we identified any guiding principles for the design- we will use open source softwares and tools, or we will be using linux system of deployment etc.
– Are there any constraints- say client wants to use any specific third party tools or technologies, any specific compliance required by law (multilingual support), service availability grantee
– Have we identified all third party systems with which our system will interact and how the interaction will be done?
– Are we creating the system in one go or will it be a phased delivery. Have we identified the value add provided by various components being built and prioritized the delivery?
– In case of phased delivery, we need to identify scope of each phase?
– Have we identified risks involved and mitigated them?
– If we are modifying or enhancing an existing system, we need to understand what areas can be reused, enhanced and built from scratch?
– Better to create a formal document to identify what all design artifacts are required.
– Define KPIs (Key performance indicators) and SLAs (Service Level Agreements)
– Have we defined acceptance criteria for the design?

Stage 1: Now we need to understand the business and what changes do we need.

– Have we understood organization structure?
– Have we identify business goals and objectives for the organization and what changes are required?
– Identify all business requirements, for example customer should be able to return a product is a business requirement.
– Identify and design current business processes (How current business work, does it fulfill all the business requirements or not, if yes, do we need to change or enhance the way it is being done right now, for example current purchase process is manual and we want to provide online options.)
– Identify changes or modification required in business processes

– Design artifacts to be delivered in this stage
Catalogs
— Organizations
— Actors
— Goals
— Roles
— Business Services
— Locations
— Process / Products
Matrices
— Business interaction
— Actor/ Role
Diagrams
— Business Services
— Functional Decomposition
— Product/ Process lifecycle
— Goal/ Service diagram
— Business Use cases
— Process Flow
— Event diagram

Stage 2: Focus on Data used

– What data is being used in the application? how it is originated and used?
– How the data is shared securely in enterprise
– Create common vocabulary and data definitions
– Identify security measures to be taken

– Design artifacts to be delivered in this stage
Catalogs
– Data Entities

Matrices
– Data Entity/ Business function
– Data Entity/ Application matrix

Diagrams
– Conceptual Data Diagram
– Logical Data Diagram
– Physical Data diagram
– Data lifecycle diagram
– Data Security diagram
– Data migration diagram

Stage 3: What all Applications are available? Changes required and new ones to be created

Application- Core parameters
– Platform independence
– Easy to use
– Identify existing applications and newly ones to be created at logical level and than map to physical level

– Design artifacts to be delivered in this stage
Catalogs
– Application portfolio
– Interface catalog

Matrices
– Application/ Organization
– Role/ Application
– Application/ Function
– Application/ Interactions

Diagrams
— Application communication
— Application and user location
— Application use case
— Application details – components/ modules and services
— Application details – Layered architecture if used

Stage 4: Understand the technology working behind the scenes

Control technical diversity: Minimizes cost of expertise.

Catalogs
— Technology portfolio

Matrices
— Application/ Technology

Diagrams
— Deployment diagram
— Environments and locations
— Communication engineering diagram (firewalls)

Stage 5: Lets consider Non Functional Requirements
— Security
— Performance
— Availability
— Disaster recover
— Data backups
— Others (Project specific)

Stage 6: Post Design phase:

– Did we identify reusable artifacts and services which can be used by other projects?
– Have we conducted periodic validation that design and product being build are in sync?
– Does the design change due to any change requests? Has that been reflected in design?
– Have we met all the acceptance criteria that were set initially?

Creating SSH Key

SSH keys are used to establish a secured connection. For example if you need to add your key for secured git connectivity.

For generating SSH key use

ssh-keygen -t rsa -b 4096 -C “your_email@example.com”

It will ask for some info which you can leave default.

For help on options available: ssh-keygen –help

More details: https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

Understanding Enterprise Architecture- basics

What is an Architecture?
An architecture helps in identifying components and their relationship. It provides basic guidelines for representing the components. It helps in understanding that how system can evolve and enhanced.

What is Enterprise Architecture (EA)?
Before getting into EA, we need to understand what an enterprise is? An enterprise organization is a set of companies with a common goal.
Now an enterprise can have different applications, solution components at different levels. EA helps getting the bigger picture by putting all the elements together. It helps understand how these applications interact with each other. How different processes are dependent and related.

Why an organization needs an EA?
This helps us understand impact of change at one part on whole enterprise, and hence helps in decision making, lowering down cost of operations, sharing of resources and capabilities, manage security, change management, helps make – buy- outsource decision etc. In addition their can be regulatory drivers in some govt and non-govt organizations which need the entity to maintain EA.

How do Architecture Frameworks help?
An Architecture framework provides common vocabulary so that every stakeholder’s understanding is same. It provides a set of tools and building blocks that can be used to create final architecture. In addition it provide a list of standards so that everybody involved can follow similar strategy.

What are different Architecture domains?
Common frameworks like TOGAF provide 4 domains
Business Architecture: Understanding of business processes.
Data Architecture: Structure of logical and physical data.
Application Architecture: Design of application systems to be created and deployed, their interactions and mapping to core business processes.
Technology Architecture: Details on infrastructure, middleware, deployments, communications etc.

Data Modeling at different levels

When you are designing database for an application, there can be 3 core levels at which you can design your database.

1. Conceptual Level: At this level you are only aware of high level entities and their relationships. For example you know that you have “Employee” Entity who “works for” a “Department” and “has” an “Address”. You are not worried about details.

2. Logical Level: You try to add as much details as possible, without worrying about how it will actually be converted to a physical database structure. So will provide any attributes for “Employee” i.e. Id, FirstName, LastName, AddressId, Salary and define primary and foreign key relations.

3. Physical Level: This is the actual representation of your database design with exact column names, types etc.

database

More info- http://www.1keydata.com/datawarehousing/data-modeling-levels.html