Sampling and Estimation

An important tool for decision-making is sampling and estimation. For any promotional campaign, new product launch strategy, marketing ad campaign, etc. companies need to understand their customer behavior. Sampling and estimation is a strong tool to analyze the customer’s data, for example, say a supermarket company wants to understand how profitable is an online customer vs an offline customer. Rather than going through data for all the customers a random sample is taken and analyzed.

Sample Data snapshot

Data Analysis: Let’s say we are able to calculate following data

Sample Size: 14998
Profit for Sample: 1665295
Average: 111.03
Standard Deviation: 275.30

Similarly we can find data for Online vs Offline Customers

Online Customers

Sample Size: 3830
Profit for Sample: 448509
Average: 117.13
Standard Deviation: 283.91

Offline Customers

Sample Size: 11168
Profit for Sample: 1216786
Average: 108.94
Standard Deviation: 272.27

Point Estimates vs Interval Estimates

The above estimate that we have done is a form of a point estimate as we are trying to find an average point and use it as an estimator for the population. For example, we have an average profit estimate for online customers is 117.13. But it is highly unlikely that an online customer’s profit is actually 117.13. So we try to find a range or interval in which the profit is likely to fall. An important aspect of such an analysis is how confident are we with our range. Normally such an analysis is done for confidence levels at 90%, 95% (mostly used), and 99%.

90% Confident Interval (CI) = (sample mean – 1.645* SD/√sample size, sample
mean + 1.645* SD/√sample size)

95% Confident Interval = (sample mean – 1.96* SD/√sample size, sample
mean + 1.96* SD/√sample size)

99% Confident Interval = (sample mean – 2.576* SD/√sample size, sample
mean + 2.576* SD/√sample size)

Let’s solve for online customers CI level 95%

117.13 – 1.96*275.30/√3830 , 117.13 + 1.96*275.30/√3830

108.41, 125.84

Simple Regression Analysis

An important tool for analysis is simple regression, where we try to predict a dependent variable based on the value of the independent variable. The equation would look like

y = mx + c

You might recognize this equation as an equation for a line in the 2D plane. So basically we try to plot all the points on the 2D plane and try to figure out a pattern.

y is the dependent variable
x is the independent variable
m is slope
c is intercept

To solve this, Microsoft Excel provide us off the shelf tool for Regression.

Go to Data-> Data Analysis -> Select Regression -> For Y range choose Profit Column -> For X Range choose Online/ Offline Column -> Select Labels checkbox as we have selected header row as well -> Let confidence level default and press OK

Output of Simple regression is

Important values to note here is coefficients, which we can substitute in our equation

y = mx + c
c= 108.94
m = 8.19

x can be online or offline in our case we have values 0 or 1

so for online (x=1) customers, equation resolves to

Profit for online customers = 8.19 * 1 + 108.94 = 117.13

Profit for offline customers = 8.19 * 0 + 108.94 = 108.94

This is ins sync with our earlier calculations.

An important value to note here is P-value. A higher P-value indicates the probability of error in the current analysis. We define a threshold α (alpha), and we keep the p-value below this threshold. I will discuss hypothesis threshold α later, but for now, we can say that a value of 11 is a very high probability of error. So this indicates can we actually associate profit with online or offline parameters. Or are there other parameters that are playing a role?

To understand this, let’s introduce another factor “age” in calculations and solve using multiple regression.

Multiple Regression Analysis

Let’s say we introduce the age data to our excel, and the data is not absolute age but say range id (when you fill a form it gives you ranges 10-18, 19-24, and so on).

Sample Data

We will repeat the steps to calculate regression, just that this time we will select both online and age column for value of x

Multiple Regression

Once calculates, we will see values like

Multiple Regression

As we can see the P-value this time is very low, we can trust our analysis. We have multiple independent variables, so our equation will be like

y = m1x1 + m2x2 + c

or Profit = 27.181 * online + 25.85 * age + 17.080

We can substitute values and find the Profit, for example, we want to find profit for young people (age group 1)

Profit Online = 27.181 * 1 + 25.85 * 1 + 17.080 =70.03

Profit Offline = 27.181 * 0 + 25.85 * 1 + 17.080 =42.85

We can calculate profit for other age groups as well. We can conclude that both age and mode (online/offline) are playing a role in profit.

Decision Trees for Decision Making

We have talked about the basics of decision making and sensitivity analysis. Next, we will look into the usefulness of decision trees in process of decision making.

We will go back to our previous example where we are analyzing between 2 product prototypes, and have probabilities and outcomes available. We will represent the data in form of a decision tree and solve the problem.

Before getting into the problem, we need to understand the basic constituents of a decision tree.

Circular nodes: shows various outcomes, for making the decision we calculate the best option based on values of available outcomes and probability

Square Nodes: These are decision nodes, which shows various options available and we need to choose the best

While drawing the decision tree, we go from left to right, but when solving the tree, we go from right to left, calculating one layer at a time. Let’s go back to our example and see the decision tree in action.

Decision Tree

We start by creating a tree, mentioning all the options available and their outcomes in case the option is chosen. Then we start solving from right to left and update values for circular – outcome nodes (values in red). We move one step backward and at the decision node, the best option is chosen.

Sensitivity Analysis in Decision Making

A few days back I wrote about the basics of decision making. Next, we will look into Sensitivity analysis.

Sensitivity Analysis examines how our decision might change with different input data.

We will start with our previous example, where a company is trying to launch a product and they have the following options right now.

ALTERNATIVESUCCESS OUTCOMEFAILURE OUTCOME
Go with prototype 1200,000-180,000
Go with prototype 2100,000-20,000
Do nothing00
Decision/ Payoff Table

Let us say

P = Probability of a favourable market i.e. Success

(1-P) = Probability of unfavourable market i.e. failure

Sensitivity Analysis

EMV Prototype 1 = 200000P – 180000(1-P)
= 380000P – 180000

EMV Prototype 2 = 100000P – 20000(1-P)
= 120000P – 20000

EMV Do nothing = 0P – 0(1-P) = 0

sensitivity analysis

Point 1

EMV Do nothing = EMV Prototype 2
0 = 120000P – 20000
P = 20000/120000
P = 0.167

Point 2

EMV Prototype 2 = EMV Prototype 1
120000P – 20000 = 380000P – 180000
P = 160000/260000
P = 0.615

So based on sensitivity analysis we can conclude based on probability of success or favorable market P, that

Do nothing if P < 0.167
Go for prototype 1 if P>=0.167 and P<0.615
Go for Prototype 2 if P>= 0.615

Decision Making – Fundamentals

What is a good decision?

A good decision is based on logic, consider all available data and possible alternatives, and is obtained through rational analysis of data and alternatives.

Does a good decision always result in favorable outcome?

No. Remember, at times good decisions can fail and bad decisions can be a success.

Steps in Decision making

  1. Clearly define the problem: Do we understand the problem or stuck at symptoms?
  2. List the possible alternatives: What options are available? Doing nothing is also an alternative.
  3. Identify the possible outcomes (or states of nature): What outcomes are possible for the alternatives figured out in step 2? Identify each outcome, positive or negative.
  4. List the payoff or profit of each combination of alternatives and outcomes: Create a matrix for each alternative + outcome combination, and figure out payoff.
  5. Select one of the decision theory models
  6. Apply the model and make your decision

Let’s say a book company is planning to launch an ebook reader (like Kindle). they have 2 prototypes currently in RnD.

Problem statement: Come up with a ebook reader which can boost sales for ebooks.

  • Alternative 1: Launch Prototype 1
  • Alternative 2: Launch Prototype 2
  • Alternative 3: Do not launch a product

Now say for each alternatives we can have various outcomes

  • Huge Success (Sales above 100K in a quarter)
  • Moderate Success
  • Failure

After this we will analyze each combination of payoffs, for example

Alternative 1 (Prototype 1) + Outcome 1 (Huge Success) = Payoff (Profit 200K, selling 100K readers)

Similarly a matrix is created for each combination possible,.

Before making a decision, one needs to take into account the Risk-taking ability of the person or organization. We can divide risk nature into

  • Risk Averese
  • Risk Nuetral
  • Risk Lovers

Also, the risk appetite will change based on the risk involved, for example, for someone earning 100K, a risk of 1K is low, but when the same risk becomes high when it involves 200K.

In addition, one also needs to take Decision making environment into consideration

Decision making Environments

  1. Decision making under certainty: Decision-maker knows with certainty the consequences of every alternative
  2. Decision making under uncertainty: decision-maker does not know probabilities of various outcomes
  3. Decision-making under risk: Decision-maker knows the probabilities of various outcomes.

In short, when a company needs to make a decision, it will start from a decision under an uncertainty position, and try to move to the decision under risk by associating some probabilities to the outcomes based on past experience or market research. When the probability is straight 1 or 0, it is a decision under certainty, which is almost never possible.

Let’s go back to our previous example, and make it simple with just 2 outcomes, and based on past experience company can predict a 50-50 chance of success or failure.

AlternativeSuccess OutcomeFailure Outcome
Go with prototype 1200,000-180,000
Go with prototype 2100,000-20,000
Do nothing00
Decision/ Payoff Table

So considering this a Decision under risk scenario, we use a popular method called Expected Monetary Value, to evaluate the alternatives.

EMV or Expected Monitory Value (alternative i) = (payoff of first outcome) * (probability of first outcome) + (payoff of second outcome) * (probability of second outcome) + ….. +(payoff of Nth outcome) * (probability of Nth outcome)

Going back to our use case, we can say

EMV for prototype 1: (0.5)*(200,000) + (0.5) *(-180,000) = 10,000

EMV for prototype 2: (0.5)*(100,000) + (0.5) *( -20,000)=40,000

EMV for Do nothing: (0.5)*0 + (0.5)*0= 0

So based on our analysis, we can see prototype 2 has the largest EMV and is the best option to go under current circumstances.

GraphQL- Security

We have covered GraphQL Basics, GraphQL Schema, and GraphQL Architecture. Another important aspect one needs to consider is security. Here we will talk about some of the basic concepts on techniques that can be used to implement GraphQL security.

Timeouts: First and most basic strategy is to implement timeouts. It is easy to implement at the server level and can save one from malformed, complex, and time-consuming queries.

Maximum Query Depth: It is easy for a client to create a complex query with deep relation or at times cyclic relation. One needs to set up a limit on the maximum depth we are supporting.

query{
   me{ #Depth 1
      friend{ #Depth 2
         friend{ #Depth 3
            friend{ #Depth 4
               #this could go on

Complexity: Another way to control query executions to have a complexity limit for queries that can be executed. By default, every query is given a default one complexity.

query {
author(id: "abc") { # complexity: 1
  posts {           # complexity: 1
    title           # complexity: 1
   }
  }
}

The above query will fail if we set the max complexity for the schema to 2. We can also update default complexity for a query, for example, if we feel posts query should have a complexity of 5, we can do that.

Throttling: Another aspect to control clients from overloading the server is throttling. GraphQL normally suggests two types of throttling, server time based and complexity based. In server time-based throttling, each client can be given a limit of time it can use on the server, mostly based on the leaky bucket strategy where time will get added if you are not using the server. The complexity-based throttling poses a limit of maximum complexity that a client can execute, for example, if the limit is 10, and the client sends 4 queries with complexity 3 each, one would be rejected.

Disclaimer: This post was originally posted by me in the cloud community –https://cloudyforsure.com/graphql/graphql-security/

GraphQL- Architecture

We have already talked about GraphQL basics and GraphQL schema. As a next step, we will look into GraphQL Architectural patterns to implement a GraphQL based solution.

Before moving ahead, it makes sense that we understand that GraphQL itself is nothing but a specification – http://spec.graphql.org/draft/. One can implement the specification in any language of choice.

Architecture 1: Direct database access

In the first architectural pattern to implement GraphQL, we have a simple GraphQL server setup, which directly accesses the database and returns required data.

GraphQL server with a connected database
image source: https://www.howtographql.com/basics/3-big-picture/

As we can see, this type of implementation is possible mostly for fresh development. When we make a decision while setting up the system that we want to support GraphQL based access, we build the system with first-hand support.

Architecture 2: Support for existing systems

More often, we come across scenarios where we will need to provide support for existing systems, which are usually built with support for REST and microservices-based access to existing data.

GraphQL layer that integrates existing systems
image source: https://www.howtographql.com/basics/3-big-picture/

The pattern above indicates an additional layer between the actual backend implementation and the client. The client makes a call to the GraphQL server, which in turn connects to actual backend services and gets the required data.

Architecture 3: Hybrid Model

We have talked about 2 patterns so far, one where the GraphQL server has direct database access, and the second when the GraphQL server fetches data from an existing legacy system. There can be a use case where partial implementation is done fresh and some data is being fetched from existing APIs. In such a use case, one can implement a hybrid model.

Hybrid approach with connected database and integration of existing system
image source: https://www.howtographql.com/basics/3-big-picture/

Resolver Functions

The discussion about various types of GraphQL implementation is not completed without talking about Resolver functions. A resolver function is responsible for mapping the query with the implementation or actual fetching of data. So all the above-mentioned implementation will drill down to the fact that how the GraphQL resolver function is written to resolve the query and fetch the data.

Disclaimer: This post was originally posted by me in the cloud community – https://cloudyforsure.com/graphql/graphql-architecture/

GraphQL- Schema

In a previous post, I talked about GraphQL basics and how it can help one simplify fetching data for service over REST. Here, we will take the next step and under the concept of schema in GraphQL.

When starting to code a GraphQL backend, the first thing one needs to define is a schema. GraphQL provides us with Schema definition language or SDL which is used to define the schema.

Lets define a simple entity

type Person {
  id: ID!
  name: String!
  age: Int!
}

Here we are defining a simple person Model, which has three fields id, name, which is a string, and age, which is an integer. The “!” mark indicates that this is a required field.

Just defining the Person model does not expose any functionality, to expose the functionality, GrpahQL provides us with three root types, namely, QueryMutation, and Subscription.

type Query 
{  
   person(id: ID!): Person
}

The query mentioned above explains how a client can send an ID and gets a Person object in return.

Next, we have mutations that can help one implement remaining CRUD operations like create, update and delete.

type Mutation {
  createPerson(name: String!, age: Int!): Person!
  updatePerson(id: ID!, name: String!, age: String!): Person!
  deletePerson(id: ID!): Person!
}

Finally, one can define subscriptions, which will make sure whenever an event like the creation of a new object, updation or deletion happens, the server sends a message to the subscribing client.

type Subscription {
  newPerson: Person!
  updatedPerson: Person!
  deletedPerson: Person!
}

Disclaimer: This post was originally posted by me in the cloud community – https://cloudyforsure.com/graphql/graphql-schema/

GraphQL- Getting started

GraphQL is a query language for your APIs. It helps us write simple, intuitive, precise queries to fetch data. It eases out the way we communicate with APIs.

Let’s take an example API, say we have a products API. In REST world you would have the API defined like

to get all products details

GET: http://path.to.site/api/products

or

to get single product details

GET: http://path.to.site/api/products/{id}

Now, let’s say Products has schema as

Product

  • id: ID
  • name: String
  • description: String
  • type: String
  • category: String
  • color: String
  • height: Int
  • width: Int
  • weight: Float

Now by default, the REST API will return all the parameters associated with the entity. But say a client is only interested in the name and description of a product. GraphQL helps us write a simple query here.

query 
{
    product(id:5){
         name
         description
    }
}

This will return a response like

{
    "data": { 
        "product": {
           "name": "Red Shirt"
           "description": "Best cotton shirt available"
        }
    }
}

This looks simple, so let’s make it a bit complicated. Say a product is associated with another entity called reviews. This entity manages product reviews by customers.

Reviews

  • id: Id
  • title: String
  • details: String
  • username: String
  • isVerified: Boolean
  • likesCount: Int
  • commentsCount: Int
  • comments: CommentAssociation

Now you can imagine in case of REST API, client needs to call additional API say

http://path.to.site/api/products/{id}/reviews/

With graphQL, we can achieve this with a single query

query 
{
    product(id:5){
         name
         description
         reviews{
             title
             description 
         }
    }
}

This will return a response like

{
    "data": { 
        "product": {
           "name": "Red Shirt",
           "description": "Best cotton shirt available",
           "reviews": [
               {
                    "title": "Good Shirt",
                    "description": "Liked the shirt's color" 
               },
               {
                    "title": "Awesome Shirt",
                    "description": "Got it on sale"
               },
               {
                    "title": "Waste of Money",
                    "description": "Clothing is not good"
               }
           ]  
        }
    }
}

We can see GraphQL solves multiple problems which were there with REST APIs. Two major problems we have evaluated in the examples above

Overfetching: We can explicitly mention all data we need from an API. So even if a Product model has 50 fields, we can specify which 4 fields we need as part of the response.

Multiple calls: Instead of making separate calls to Products and Product reviews, we could make a single query call. So instead of calling 5 different APIs and then clubbing the data on the client side, we can use GraphQL to merge the data into a single query.

Why Graph?

As we can see in the above examples, GraphQL helps us look at data in form of a connected graph. Instead of treating each entity as independent, we recognize the fact that these entities are related, for example, product and product reviews.

Disclaimer: This post was originally posted by me in the cloud community –https://cloudyforsure.com/graphql/graphql-getting-started/

Business, Government and Society

Normally one only considers market forces that impact business decisions. But there are some important nonmarket forces like government and society that impact business. Hence, every business needs a market strategy as well as a nonmarket strategy to be successful.

Three institutions for allocation of the resources

  • Market (Private Sector): Capitalist system operates on the basis of private property, voluntary exchange, competition, and the profit motive.
  • Government (Public Sector): Centralized mechanism in which government provides public services to citizens on the basis of criteria other than the ability to pay.
  • Community: Decentralized mechanism for allocating resources where the purpose is to meet certain categories of need rather than to make a profit.

Globalization and De-Globalization

Globalization is a concept when local markets open up for global markets. This impacts the import and export of the country. Whereas in certain cases, the government takes decisions based on National priorities over globalization, giving rise to de-globalization.

Business Environments: Internal vs External

Every business has to deal with external and internal environments. Internal factors are straightforward where the business has control over, for example, its policies, production units, etc. The external environment is the one where the business has no or little control like government policies, cultural differences between countries, laws in different countries, and so on.

Business flow: Inputs — > Production — > Outputs

Inputs can be land, labor, raw materials, etc. A lot depends on external factors in this case.

Production is the process of converting the inputs into outputs. This is the firm’s internal environment which it can control.

Outputs are the output generated by a firm in form of products and services, which it needs to supply to other firms, government, end customers, again external factors play a role.

The Market Economy

source: https://saylordotorg.github.io/text_macroeconomics-theory-through-applications/s07-03-the-circular-flow-of-income.html

The image above shows financial flow in a market economy. In this case, households are engaged with firms by working in them and earning wages, investing in the firm and earning dividends, renting land or resources and getting an income, or directly owning the firms and making profits. Households in turn buy from the firms adding to their income.

Government purchases from firms and adding to revenue. Additionally, Government gets tax from firms and households which adds to its income. The government also gives back to households in terms of various schemes like policies for people below the poverty line.

Firms and households interact with the rest of the world in terms of import and export. this also depends on various government policies.

Market and Non-Market environments in business

The Nonmarket Environment of Business
image source: https://sloanreview.mit.edu/article/what-every-ceo-needs-to-know-about-nonmarket-strategy/

Information about the nonmarket environment is important for any firm. One needs to understand what citizens are thinking about the product, what the media is talking about, what are government policies, what are issues raised by NGOs, and so on. Firms need to engage and form coalitions with these units. Firms need to deal with uncertainty around these factors.

The market environment includes direct interactions between firms, suppliers, and customers that involve voluntary economic transactions.

The nonmarket environment is composed of social, political, and legal arrangements and interactions between the firm and individuals, interest groups, government entities.

Non Market strategy

In short, senior management for any business needs to understand that business is not only economic agents but also are social and political beings. Firms are impacted by a lot many actors like laws and regulations, social pressure, activism, and public perception. So any business management needs to make sure to form a non-market strategy, along with a market strategy.

Istio- Getting started

I recently wrote about service mesh and how it helps ease managing and deploying services. Istio is an open-source service mesh that layers transparently onto existing distributed applications. Istio helps to create a network of deployed services with load balancing, service-to-service authentication, monitoring with no additional coding requirements. Istio deploys a sidecar for each service which provides features like canary deployments, fault injections, and circuit breakers off the shelf.

Let’s take a look at Istio at a high level

The overall architecture of an Istio-based application.
Image source: https://istio.io/latest/docs/concepts/what-is-istio/

Data Plane: as we can see in the design above that Data Plane uses sidecars through Envoy Proxy and manage traffic for microservices.

Control Plane: Control plane helps to have centralized control over network infrastructure and implement policies and traffic rules.

Control plane functionality (https://istio.io/latest/docs/concepts/what-is-istio/)

Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.

Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection.

A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.

Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.

Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.

Currently Istio is supported on

  • Service deployment on Kubernetes
  • Services registered with Consul
  • Services running on individual virtual machines

Lets take a look at core features provided by Istio

Traffic management

Istio helps us manage traffic by implementing circuit breakers, timeouts, and retries, and helps us with A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits.

Security

Another core area where Istio helps is security. It helps with authentication, authorization, and encryption off the shelf.

While Istio is platform-independent, using it with Kubernetes (or infrastructure) network policies, the benefits are even greater, including the ability to secure pod-to-pod or service-to-service communication at the network and application layers.

https://istio.io/latest/docs/concepts/what-is-istio/#security

Observability

Another important aspect that Istio helps with is observability. It helps managing tracing, monitoring, and logging. Additionally, it provides a set of dashboards like Kiali, Grafana, Jaeger, etc to help visualize the traffic patterns and manage the services.

Additional Resources –

https://istio.io/latest/docs/

https://dzone.com/articles/metadata-management-in-big-data-systems-a-complete-1

Disclaimer: This post was originally posted by me in the cloud community – https://cloudyforsure.com/cloud-computing/istio-getting-started/