Krutchen 4+1 Architectural view model- Agile perspective

I have a strong belief that Solutions Architecture is as much of a science as it is an art. There is no set of fixed rules you can apply to get a final architecture. There are rules, but they can get changed based on kind of project, teams and constraints you are working with.

Coming to 4+1 Architectural view of software architecture, Krutchen shared an interesting way of looking at software architecture in this paper.

Original paper- https://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-architecture.pdf

In crux, the paper suggest that we can look at any software architecture from 4 perspectives or views to get a complete picture.

Logical view: This is end user view of the system. How many entities or classes are there and how they interact, for example, how Employee will be related to Department and Project, what will happen when someone joins or leaves an organization.
UML: Class Diagrams, State Diagrams.

Process View: This talks about how the business works as a process. If you need to open an account in a bank, what process needs to be followed. In addition, we take care of non-functional requirements like scalability, performance etc in this view.
UML: Activity Diagrams

Development View: This is a view for developers, understanding how the system will be implemented (also known as implementation view). How many components will be created and how will they interact with each other.
UML: Component Diagram, Package Diagram

Physical View:
This view explains how this system will be deployed physically. What kind of machines are there and how these are interacting.
UML: Deployment diagram

Scenarios: Scenarios or Use cases are given special attention. Because before getting into any other views, one needs to understand all the use cases we need to handle for the system being developed.

Well the paper explains about these views in details, so here I would like to add my understanding of how to use this model in agile development methodology.

Agile Perspective:
When you are building a software in agile manner, you are taking up one use case at a time, broken down in form of stories. Once you have sorted out what all use cases are you dealing with in current sprint or cycle, you can start by understanding logical view for these cases. Moving on to Process view then Development view and finally Physical view, case by case. So rather than creating the whole picture in one go, we will be creating our architecture as and when we are working on a particular use case.

A few basic statistics for Machine Learning

Machine learning itself is big area, but to get started one needs to be familiar with a few basic concepts of Statistics and Probability.

Here are a few basic concepts given a set of numbers say 12, 5, 7, 17, 22, 5, 10, 2, 5, 17, 2, 11

Mean- Given a set of N numbers, mean is average of these numbers. 115/12 = 9.58

Median- Sort the list, for even items, median is sum of the middle 2 numbers divided by 2. For odd items median is the middle number.
2, 2, 5, 5, 5, 7, 10, 11, 12, 17, 17, 22
7+10= 17/2=8.5

Mode- Number that appears most often in the list. In this case it is 5.

Range- Max number minus minimum number in the list: 22-2= 20

Variance- It is the measure of how the set of number is varying from the mean. This is calculated by finding difference of each number from mean and than squaring.

Taking a simple example in this case, say we have 3 numbers, 1, 2, 3 mean in this case would be 6/3 =2. So variance would be
(1-2)^2+(2-2)^2+(3-2)^2 / 3= 2 /3 = 0.67

Standard Deviation- The variance is figured by squaring the difference of mean and numbers in list. Standard deviation takes the square root of variance to reset the unit of original list. So in case variance was 0.67, standard deviation would be sqtr(0.67) or 0.81.

K-means clustering

Clustering, as the name suggest is simply grouping some random elements. Say, you are given points on a 2-D graph, you need to figure out some kind of relationship or pattern among them. For example you might have crime position in a city or Age vs Mobile Price graph for purchasing habit of an online website.

In Machine Learning, clustering is an important class of algorithms as it helps figure out a pattern in random set of data and helps take decision based on the outcome. For example, an online store needs to send a targeted marketing campaign for new Mobile phone launched. Clustering algorithm can help figure out buying patterns and single out customers who are most likely to buy the mobile phone launched.

There are many clustering algorithms, K-means being one of the simplest and highly used.

1. Place K points randomly on the graph. These will serve as initial center points or centroids of the K clusters.
2. Now assign each item (point) existing in the graph to one K clusters, based on its closeness (shortest distance from centroid) to the centroid of cluster
3. When all items are assigned a cluster, recalculate centroid of the cluster (point from where distance to all the items in cluster is minimum on an average).
4. Repeat step 2 and 3, until there is no scope of improvement (no movement in seen in centroids).

The above algorithm will divide all the items into K clusters. Another challenge will be to come up with optimum value of K. i.e. How many clusters can correctly or adequately represent the data. Well there are many ways to figure that, one of effective way is the elbow method. You start plotting number of clusters on X axis and Sum of squared errors on Y axis. Sum of squared errors is, you take each point in a cluster, find the distance from centroid and square it (to eliminate negative distances), this is the error for this particular point as ideally you want each point near to centroid of cluster to make it perfect. Then sum up all these errors and plot on graph. The resulting graph starts looking like an arm. We find out the elbow, that is, where we see the numbers getting flatter on X axis. This means any additional cluster is not adding too much value to the information we already have.

Java- Call by Reference vs Call by Value

In Java, when we send objects to a method, do they get passed by reference or by value?
Answer is, the reference of object is passed by value. This can be a bit tricky to absorb first. Let us try to understand. Reference in this case is a pointer (just to visualize, as we know there are no pointers in Java) to the object location.

Employee obj = new Employee();
sendToMethod(obj);

When we are sending this obj to a method, the object copy itself does not get sent neither the object reference. It is the copy of obj reference. So think of it like obj is stored at location “HB10”, this “HB10” gets copied and sent to method. Any changes directly made to the object gets reflected back to the calling method, but if reference itself is update, no changes get reflected.

private void sendToMethod(Employee obj)
{
obj.setSalary(100000);// gets reflected
obj = new Exployee(); // we removed reference, so no changes after this will be reflected in calling method.
obj.setDept(“IT”);// does not gets reflected.
}

A more elaborated example code

import java.util.ArrayList;

public class Test {

public static void main(String args[]) {
ArrayList iList = new ArrayList();
iList.add(1);
System.out.println(“\nPrint list Before”);
iList.stream().forEach(x->System.out.print(x+”, “)); // prints 1,
updateMe(iList);
System.out.println(“\nPrint list After”);
iList.stream().forEach(x->System.out.print(x+”, “)); // prints 1, 2,
}
public static void updateMe(ArrayList iList) {
ArrayList jList = new ArrayList();
iList.add(2); //this gets reflected
jList.add(3);
iList = jList; // reference changed, no changes after this are reflected.
iList.add(4);
jList.add(5);
}
}

Understanding Agile

I have heard many versions of people defining Agile development. I met someone who claimed that they use Agile development in their project. When asked how can they say they are doing Agile development, it was told that team meets in the morning everyday to discuss status and hence the project is using Agile development.

I have also heard people talking about Agile development in a manner that it will automatically solve all your development problem, they do not realize if not implemented properly, Agile development methodologies can actually backfire. Others confuse Scrum with Agile development, well they are not completely wrong, but one needs to realize Scrum is a Agile Framework like Kanban, Extreme Programming, SAFe etc.

Keeping it simple, lets take a quick look at Agile manifesto.
http://agilemanifesto.org/

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

And then 12 principles
http://agilemanifesto.org/principles.html

We are talking about focus on customer satisfaction, continuous delivery, accommodate changing requirements, frequent collaboration between Business and development teams, focus on people than on processes, more focus on face to face and frequent interactions, focus on working software, reflect on current processes frequently and self improvements.

One thing to note is Agile development methodologies or manifesto nowhere gives a series of steps or rules. These are guidelines, which one need to apply to ones project. This will definitely help but there is no silver bullet. There are Agile frameworks and methodologies as mentioned above, but you still need to figure out what will work for you rather than blindly following some document or advice of a so called expert. Best way is to look at as many methodologies and frameworks as possible, pick up the practices which you feel will work for you, and keep on validation after a short period what is working and what does not.

Technology Agnostic Design

When developing high level design for a solution, it is not a good idea to think about technology choices. We need to keep a gap between design and implementation.

You should not finalize at this point if the solution is going to be build in Java, Python, NodeJS or PHP at this time. For example, you just define that there will be a employee service to provide employee data, but which language will be used to build it, is not part of high level design.

You should take a call what kind of database is well suited, will it be a RDBMS or a document based database, but we should not take a decision which specific vendor’s database we are going to use at this point. For example, we will take a call that we will use RDBMS, but will it be Oracle, mySQL, postgres or some other vendor provided DB, we will decide when we will think about implementation details.

Similarly, all your vendor and technology decisions will not be part of high level design. The details should be filled at a later point, once you have finalized your high level design and made sure you have all the components, services and communications identified.

Why should we not think about technology choices while building high level design? Because it limits our design and solution. Because strengths and weaknesses of a technology becomes strength and weakness of our solution. Because strengths and weaknesses of technologies change over time, hence our architecture should be independent of that.

Once you bring in vendor and technology at a high level design, you commit too much to that. For example say XYZ RDBMS provider is currently the best in market as they provide fastest operations. You design your architecture around that, you use vendor specific data structures and data types. In future, if there is a vendor providing similar services at a cheaper price, we will figure out making a change is very costly as we have to change too much of code. If we would have thought of RDBMS as just a plug and play provide, we could have made this change easily. Infact, keeping our architecture technology agnostic will force us to think beyond a vendor. We will need to think ways of improving our database performance, think of better indexing, sharding, caching, making our architecture robust and independent of technology and vendors.

https://bigmedium.com/ideas/links/managing-technology-agnostic-design-systems.html

https://www.infoq.com/news/2007/09/technology-agnostic-soa

http://bradfrost.com/blog/post/managing-technology-agnostic-design-systems/

Checking postgres server logs

At times you might want to see what all queries had hit your database, specially while testing and analyzing a solution. Similar need occurred recently in a project using postgres database. One can simply set log_statement flag to ‘all’ in postgresql.conf

postgres@server:/etc/postgresql/9.5/main$ vi postgresql.conf

Look for

log_statement = ‘all’ # none, ddl, mod, all

You can also set log_destination in the same file. By default logs will be at

postgres@server:/var/log/postgresql$ ls

And finally restart postgres server

/etc/init.d/postgresql restart

Please note enable logs on production server is not recommended in normal scenarios as they consume a lot of space.

Node vs Java Comparison

A good read on Node vs Java https://rclayton.silvrback.com/speaking-intelligently-about-java-vs-node-performance. Explains about the way Java and Node code will work to handle load.

My personal opinion is that programming language is just a tool to deliver the final solution. Only in specialized cases, where we know that a particular language has proven history of solving similar kind of problems, hence has sample code, libraries, knowledge base, forums to help, we should consider that as part of solution. Otherwise non-technical reasons like client preferences, team’e expertise will come into picture.

SOLID Principles for object oriented design

There are many best practices and principles figured out by developers and architects for object oriented design. Robert Martin has intelligently put a subset of these good practices together, and gave them acronym SOLID which helps easy remembrance.

Single responsibility principle: A class should handle only one single responsibility and have only one reason for change. For example a class “Employee” should not change if there a change in project or some reporting details.

Open Closed principle: Code should be open for extension but closed for modification. If you want to add a new type of report in the system, you should not be changing any existing code. More here

Liskov substitution principle: “objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.” So if we have Employee class, which is extended by Manager. We should be able to use Manager instead of Employee and all the Employee methods like calculate Salary, generate annual report etc should work without any issues. Say if there is an object like “ContractWorker” that does not support a few functions of Employee like annual report, one should be careful not to make it subtype of Employee.

Interface Segregation principle: “no client should be forced to depend on methods it does not use”. Coming back to previous example, if “ContractWorker” does not need to support annual report, we should not force it to implement an iEmployee interface. We should break the interfaces say iReport and iEmployee, iEmployee can extent iReport and iContractWorker should implement only iReport. iReport can further be divided into reporting types if required.

Dependency Inversion principle: This one seems to be one of my favorite as I have written about it here, here, here and here. This one indeed is one of the most important design patterns which can be followed to make the code loosely coupled and hence making it more maintainable (golden rule- low coupling + high cohesiveness). In traditional programming, when a high level method calls a low level method, it needs to be aware of the low level method at compile time, whereas using DI we can make high level method depend on an abstraction or interface and details of implementation will be provided at run time, hence giving us freedom to use which implementation to be used. Coming back to my previous example, I can have multiple implementations of Employee Reporting, iReport. Some implementation need and excel report, other might need a PDF reporting, which can be decided at runtime.