Category Archives: Cloud Computing

Cloud Native Application Design – Compute

Cloud-native application design consists of three core pillar decisions- Compute, Database, and Storage. Most cloud providers give various options for all.

When starting with application design, the first decision the team needs to make is how and where we will deploy it.

Virtual Machine: Age old classic option is to go for a virtual machine, where the development team takes complete control over setting up servers, managing virtual machines, Magaing the health of machines, Scalability, etc. Examples are AWS EC2, Azure Virtual Machine, and Google Virtual Machine.

  • Pros– Easiest to get started, gives more control over setup.
  • Cons– The dev team is responsible for managing the machine, the least cost-effective solution.

Containers: Lightweight containers give the perfect solution for a microservices-based design. Container Manager like Kubernetes gives off-the-shelf solutions for managing the health and scalability of container-based implementation. Most Service Providers give options for off-the-shelf service like Azure Kubernetes Service or Amazon Elastic Kubernetes Service.

  • Pros: Best suited for Microservice-based solutions, lightweight hence cost-effective, and easy to manage.
  • Cons: Learning curve to making sure correct use of tools like docker and Kubernetes

Functions: Next deployment option is to deploy using Functions as a Service. Again a very good fit for Microservices gives you on-demand execution of code options. Service Provider gives us options like Azure Functions or AWS Lambda to build our code as Functions.

  • Pros: Can be cost-effective with pay based on the execution model as you only pay for execution time.
  • Cons: Does not fit all use cases, not all scenarios are supported (most vendors have a limit on the time it will take for execution), and Vendor locking.

Specialized options: Apart from the options mentioned above, most service providers give you specialized options like Azure gives you App Service and AWS has Elastic Bean Stalk that helps you deploy popular technologies like Java, Python, etc directly.

  • Pros: Being Managed services, helps the dev team get free from managing underlying infra and focus on development. Also provides features like monitoring and scaling off the shelf.
  • Cons: Learning curve to understand the framework and vendor locking as the deployable is specific to the vendor.

Cloud Native Application Design – Load Balancing

Load Balancing is an important technique in cloud-native application design to achieve scalability, reliability, and availability. The load can be distributed among nodes (physical or containers), based on rules like round robin, weighted, performance-based, geographical distribution, etc.

Load Balancing can be achieved at the following levels

DNS Level: DNS level load balancing is a method of distributing incoming network traffic across multiple servers or IP addresses by using DNS (Domain Name System) servers to resolve domain names to IP addresses. You can choose distribution riles based on need, for example,  you might want to send traffic originating from Europe to hit Europe servers whereas traffic from North America to hit North America servers. While resolving the DNS, the traffic manager will choose the backend endpoint based on the rules set.

Layer 7 or Application Layer: In Layer 7 load balancing, the load balancer analyzes the content of the incoming requests, including the HTTP headers, URLs, and other application-specific data, to determine how to distribute the traffic. For example, we can set rules that /images pattern is getting redirected to a backend, whereas /videos pattern is to another. Additionally one can have features like SSL termination, and WAF (Web Application Firewall, that will protect from threats like SQL injection attacks, Cross Site Scripting or XSS attacks, etc.) implemented.

Layer 4 or Transport Layer: Layer 4 load balancers can route traffic based on basic criteria such as source IP address, destination IP address, source port, destination port, and protocol type. At the transport layer, the load balancer does not have access to request data, hence decisions can only be taken at IP or Port level. At the same time as no parsing is involved, the overall performance is better.


Cloud Native Application Design – Data Security

Security in the cloud can be broadly categorized at the following three levels

  • Infrastructure Security
  • Application Security
  • Data Security

Data Security

  • Encrypt Data at rest and transit: cloud service provider providers mechanism to secure your data at rest and in transit. Encryption is one big tool in your arsenal, for example, a simple step of using HTTPS against HTTP will ensure your data is encrypted and secured while in transit. Similarly, most cloud service providers have encryption available for disks and databases to help secure the data.
  • Data type-specific security: You also need to understand that there will be certain needs specific to the type of data you are storing, for example, if you are storing healthcare-related data, you will need to understand HIPAA (Health Insurance Portability & Accountability Act) needs, for finance-related data you might want to check PCI (Payment Card Industry) data standards. Also, there might be region-specific needs like in Europe we have GDPR or General Data Protection Regulation for personal data.
  • Avoid Weak Encryption: Though most cloud service providers give us options to encrypt our data, filesystems, and disks, it is the responsibility of the architect to make sure strong encryption is implemented. Tools like Key Vault services can help to store encryption keys to avoid manual handling. Also, all your APIs and pages dealing with important data should use HTTPS (Secured) protocol.

Cloud Native Application Design – Application Security

Application Security: When deploying your application in the public cloud, you need to make sure we are taking care of all precautions to safeguard our application from unauthorized access and attacks.

  • Infrastructure as a Code: Avoid accessing resources manually and configuring them, use scripts like terraform, ansible, or cloud-specific options to build infrastructure as a code.
  • No direct access: If a resource is not needed to be available externally, make sure all access is blocked. For example, if a database is to be accessed only by a microservice, give access only to that microservice and block all other access.
  • Automated Deployment: Deployments should not be done manually by actually placing deliverables on target machines manually, automate the process via continuous delivery scripts.
  • Layered Security Approach: When implementing security, most cloud service providers encourage a layered approach. That is, implement security rules at different layers like a load balancer, application server, application code, database, and so on. So that even in case one layer is compromised, your core application and data are still secured.
  • API Security (Authentication / Authorization): All APIs should be behind proper authentication and authorization. Note that a service accessed from the internet will have different security than a service that can only be accessed internally.
  • Common Application Threats: Common attacks like Code Injections, SQL Injections, and Cross-Site Scripting (XSS) can be targeted toward the application. It is the responsibility of the architect and development team to make sure best practices are followed while writing the code to tackle these attacks.
  • Perimeter Layer Attacks: DDOS or Distributed Denial Of Service is a common attack used by hackers to bring an application down. Most cloud service provider gives you out-of-the-box solutions that can help manage these threats.
  • Known Security holes- OWASP: Make sure to understand and take care of common threats like broken access control, inefficient logs, use of old unsecured libraries, etc.
  • Best Practices (API Gateways / Patterns): Use practices like Rate Limit, Circuit breaker, and bulkhead pattern to safeguard your application from attacks. Architectural best practices like API gateway in front of services make sure no direct access to service and also boilerplate responsibilities like security, HTTPS offloading, audit logging, etc can be offloaded from the main service.

Cloud Native Application Design – Infrastructure Security

When talking about security in the cloud, we can broadly categorize it into the following three areas.

  • Infrastructure Security
  • Application Security
  • Data Security

Infrastructure Security

Infrastructure security is about making sure that the infrastructure we are using is accessed only by authorized personnel. This is about both physical and virtual access. An important aspect when it comes to cloud security is understanding that it is a shared responsibility of the public cloud provider and development team.

  • Physical security: At the lowest level of security, one needs to consider the fact that physical machines can be accessed and tampered with. This is more possible when we have on-premise hardware infrastructure than on the cloud. But even when one is choosing a cloud platform, it makes sense to understand and question the level of physical security implemented by the cloud service provider to avoid any unauthorized access. This aspect is handled by the cloud service providers as part of shared responsibility.
  • Virtual access to infrastructure/ Role-based access (RBAC): The next level of access is someone gaining virtual access to the machines manually, programmatically, or through malware. Role-based access to make sure only authorized personnel or code can access data, having security groups and firewalls in place, and making sure security patches and antivirus definitions are always updated can help mitigate this threat.
  • Use Virtual Networks: Create Virtual Networks to group together resources needed by an application. For example, if a service API can only be accessed by an API gateway or a database should only be accessed by a particular microservice, we can make sure these components are in a virtual network and cannot be accessed from the outside world.
  • Manual Errors/ Infrastructure as a Code: A misconfiguration causing a VM exposed through unwanted open ports can be another problem. Implementing infrastructure as a code where automated scripts are responsible for creating and maintaining infrastructure can be helpful in avoiding manual errors.
  • Storage/ Data Access: Who can access a service or a filesystem or a database? What kind of access is required? How can the resources be accessed? One needs to answer these questions before getting started with the application development process. Role-based access is an important tool that can help architects making sure proper security. For example, a user or an application might just need read access on the file system or database, then rules should not allow any read or update access.
  • Audit Tracing: Most cloud service providers allow you to see any changes being done on infrastructure. You can monitor which resources were updated by whom and when. This is an important tool for teams to keep a track of changes.

Cloud Native Application Design – Capacity Planning

An important aspect of architecting a system is capacity planning. You need to estimate the resources that your software is going to consume. Based on the estimate one can easily calculate the overall cost/ budget needed for the project. Most cloud service providers have pricing estimation tools, where one can provide their requirements and calculate an estimated price for a year or specific time period.

When estimating the price one needs to come up with high-level infrastructural requirements.

The most important areas are

  • Storage
  • Database
  • Compute

There are other areas as well, but these three constitute the major portion and if we are able to estimate these, others should be easy.


For estimating your database needs, you need to understand whall all entities you will be storing. Estimate the amount of data being stored for each entity for a specific time period, for example, a year. The core process would be the same for NoSQL and RDBMS databases, for example, you will store a document instead of a row in a document-based database.

Taking RDBMS as the base case, we will try to calculate capacity requirements for the sample table. Practically you will identify a few important tables that will help you estimate for complete requirements.

So let’s say you have a product table, first, we will check how much storage will be needed for single tow.

NameVarchar (512)512 bytes
DescriptionVarchar (2048)2048 bytes
PriceFloat4 bytes
QuantityNumber4 bytes

Say Total we have 10K bytes

Total 10000 bytes or ~0.01MB
say we are anticipating 1 million records in a year, that would translate to
0.01 * 1,000,000 or 10000 MB or ~10 GB

Say we have 10 tables, we would have a storage need of a Total of 100 GB (add a buffer for indexing, metadata, etc)

The second is memory usage + CPU usage + Network bandwidth
Say I will never run queries with more than 2 tables join with a max 10 GB data, so I know I need RAM to support at least 20GB


Storage is easier to calculate

You are storing x mb of file and in 1 year you expect n number of files

x* n mb


Now compute is the most important and complex area for estimation. The battle-tested method for calculating compute requirements is through load test.

A load test gives you an idea of how much load can be handled by a node (VM or Pod)

Following are the usual steps for Load testing

  • Identify APIs that will be used most frequently
  • Identify APIs that are most heavy in terms of resource usage
  • Come up with h perfect mix of load (historical data helps) and load test the system
  • Let the load run for longer durations (a few days) for getting better results (memory leak)
  • Check data performance at different percentiles 50, 90 95, 99, 99.9
  • Check for TPS handled with load
  • Check for error rate and dropped requests
  • Monitor infrastructure performance – CPU, Memory, Request queues, Garbage collection, etc.

Once you have all the data, based on your SLAs you can figure out TPS (Transaction per second) your node can handle. Once you have the TPS number, it is easy to calculate the overall requirement.

For example, if your load test confirms that one node can handle 100 TPS and you have overall requirements for 1000 TPS, you can easily set

TPS calculation: Number of transactions/time in seconds, say your load test reveals 10000 requests were processed in 1 minute, TPS = 10000/60 or 166.6

Additional Consideration: conditions one need to take into consideration when finalizing capacity requirements

Disaster recover

  • If one or more nodes are down in one cloud
  • If a complete cloud region is down

Noisy Neighbour

Especially in a SaaS-based system, there is a phenomenon of a noisy neighbor where one tenant can eat up threads causing other tenants to wait.

Bulkhead pattern and/or rate limiting are common solution options to handle noisy neighbors. But one needs to consider threads/infrastructure that can be realistically blocked by noisy neighbors.

Performance Tuning

An important aspect of capacity optimization is to make sure we are using our resources in the best possible manner. For example, most applications or web servers have configurations that one can set to their requirements in order to get the best possible performance. Some examples are

  • Number of concurrent threads
  • Memory setting e.g. Java Heap Memory (Set to max)
  • Compression (true vs false)
  • Request queue (throttling)

Cloud Native Application Design – Pillars of Cloud Architecture

Operational Excellence

Efficiently deploy, operate, monitor, and manage your cloud workload

  • automate deployments
  • monitoring, alerting, and logging
  • manage capacity and quota
  • plan for scale – peak traffic
  • automate whenever possible


Design for the resilient and highly available system

  • automatically recover from failure
  • test recovery procedures
  • scale horizontally to manage the workload


Secure your data and workload, align with regulatory requirement

  • data security
  • data encryption – at rest and in transit
  • apply security at all layers
  • enable traceability to investigate and take actions automatically
  • no direct interaction with data

Performance Efficiency

Design and tune your resources for the best performance

  • monitor and analyze the performance
  • compute performance optimization
  • optimize storage performance
  • anticipate load and scale
  • best practices like caching, CQRS, sharding, throttling, etc to be used

Cost optimization

Maximize the business value for infrastructure used

  • Monitor and control cost
  • optimize cost- compute, database, storage, etc
  • identify and free unused and underused resources

Azure API Management

Azure API Management provides a set of services that can help users to manage API Lifecycle, i.e. Design. Mock, Deploy, Policy Management, Explore, Consume, and Monitor APIs.

Diagram showing key components of Azure API Management.

we can see there are three core components here. Developer portal helping consumers to discover, try out and onboard to services. Management Plane helps providers manage the API policies and monitor them. The gateway is the interface between consumer clients and provider applications.

API Gateway

The API gateway acts as a facade to the backend services, allowing API providers to abstract API implementations and evolve backend architecture without impacting API consumers. The gateway enables consistent configuration of routing, security, throttling, caching, and observability.

To create API Gateway, you will need to go to Azure Portal -> API Management Service -> Create

Management Plane

API providers interact with the service through the management plane, which provides full access to the API Management service capabilities. Customers interact with the management plane through Azure tools including the Azure portal, Azure PowerShell, Azure CLI, a Visual Studio Code extension, or client SDKs in several popular programming languages.

If Gateway was about implementing policies in real-time, Management plane is about helping developers set these policies and interact with analytics dashboards via portal VC Code extension or other Azure interfaces.

Developer Portal

App developers use the open-source developer portal to discover the APIs, onboard to use them, and learn how to consume them in applications.

The developer portal allows consumers to search for APIs, explore them, consume them and view analytics from the consumer side.

Cloud Native Application Design – Backend For Frontend Pattern

When working with applications for which frontend is available in more than one medium, for example, a desktop and a mobile application. The scenario can be complicated when the Application has different mobile versions for Android and iOS. Also, the APIs might be consumed by third-party services. In short, the same set of APIs has other consumers, which might have different requirements out of them.

One way to solve the problem is that the API being called checks the source of the call and replies with the required data for example for a GET orderdetails call, a mobile call might need just order history listing, whereas a desktop frontend might want to show more information as it can accommodate more on the interface. At the same time, we just want to expose a piece of limited information to a third party caller.

General Purpose API vs BFF

The image above shows very well, how the BFF pattern helps customize responses for callers.