Category Archives: Cloud Computing

Cloud Native Application Design – Compute

Cloud-native application design consists of three core pillar decisions- Compute, Database, and Storage. Most cloud providers give various options for all.

When starting with application design, the first decision the team needs to make is how and where we will deploy it.

Virtual Machine: Age old classic option is to go for a virtual machine, where the development team takes complete control over setting up servers, managing virtual machines, Magaing the health of machines, Scalability, etc. Examples are AWS EC2, Azure Virtual Machine, and Google Virtual Machine.

Pros– Easiest to get started, gives more control over setup.
Cons– The dev team is responsible for managing the machine, the least cost-effective solution.

Containers: Lightweight containers give the perfect solution for a microservices-based design. Container Manager like Kubernetes gives off-the-shelf solutions for managing the health and scalability of container-based implementation. Most Service Providers give options for off-the-shelf service like Azure Kubernetes Service or Amazon Elastic Kubernetes Service.

Pros: Best suited for Microservice-based solutions, lightweight hence cost-effective, and easy to manage.
Cons: Learning curve to making sure correct use of tools like docker and Kubernetes

Functions: Next deployment option is to deploy using Functions as a Service. Again a very good fit for Microservices gives you on-demand execution of code options. Service Provider gives us options like Azure Functions or AWS Lambda to build our code as Functions.

Pros: Can be cost-effective with pay based on the execution model as you only pay for execution time.
Cons: Does not fit all use cases, not all scenarios are supported (most vendors have a limit on the time it will take for execution), and Vendor locking.

Specialized options: Apart from the options mentioned above, most service providers give you specialized options like Azure gives you App Service and AWS has Elastic Bean Stalk that helps you deploy popular technologies like Java, Python, etc directly.

Pros: Being Managed services, helps the dev team get free from managing underlying infra and focus on development. Also provides features like monitoring and scaling off the shelf.
Cons: Learning curve to understand the framework and vendor locking as the deployable is specific to the vendor.

Cloud Native Application Design – Load Balancing

Load Balancing is an important technique in cloud-native application design to achieve scalability, reliability, and availability. The load can be distributed among nodes (physical or containers), based on rules like round robin, weighted, performance-based, geographical distribution, etc.

Load Balancing can be achieved at the following levels

DNS Level: DNS level load balancing is a method of distributing incoming network traffic across multiple servers or IP addresses by using DNS (Domain Name System) servers to resolve domain names to IP addresses. You can choose distribution riles based on need, for example, you might want to send traffic originating from Europe to hit Europe servers whereas traffic from North America to hit North America servers. While resolving the DNS, the traffic manager will choose the backend endpoint based on the rules set.

Layer 7 or Application Layer: In Layer 7 load balancing, the load balancer analyzes the content of the incoming requests, including the HTTP headers, URLs, and other application-specific data, to determine how to distribute the traffic. For example, we can set rules that /images pattern is getting redirected to a backend, whereas /videos pattern is to another. Additionally one can have features like SSL termination, and WAF (Web Application Firewall, that will protect from threats like SQL injection attacks, Cross Site Scripting or XSS attacks, etc.) implemented.

Layer 4 or Transport Layer: Layer 4 load balancers can route traffic based on basic criteria such as source IP address, destination IP address, source port, destination port, and protocol type. At the transport layer, the load balancer does not have access to request data, hence decisions can only be taken at IP or Port level. At the same time as no parsing is involved, the overall performance is better.

Cloud Native Application Design – Application Security

Application Security: When deploying your application in the public cloud, you need to make sure we are taking care of all precautions to safeguard our application from unauthorized access and attacks.

Infrastructure as a Code: Avoid accessing resources manually and configuring them, use scripts like terraform, ansible, or cloud-specific options to build infrastructure as a code.
No direct access: If a resource is not needed to be available externally, make sure all access is blocked. For example, if a database is to be accessed only by a microservice, give access only to that microservice and block all other access.
Automated Deployment: Deployments should not be done manually by actually placing deliverables on target machines manually, automate the process via continuous delivery scripts.
Layered Security Approach: When implementing security, most cloud service providers encourage a layered approach. That is, implement security rules at different layers like a load balancer, application server, application code, database, and so on. So that even in case one layer is compromised, your core application and data are still secured.
API Security (Authentication / Authorization): All APIs should be behind proper authentication and authorization. Note that a service accessed from the internet will have different security than a service that can only be accessed internally.
Common Application Threats: Common attacks like Code Injections, SQL Injections, and Cross-Site Scripting (XSS) can be targeted toward the application. It is the responsibility of the architect and development team to make sure best practices are followed while writing the code to tackle these attacks.
Perimeter Layer Attacks: DDOS or Distributed Denial Of Service is a common attack used by hackers to bring an application down. Most cloud service provider gives you out-of-the-box solutions that can help manage these threats.
Known Security holes- OWASP: Make sure to understand and take care of common threats like broken access control, inefficient logs, use of old unsecured libraries, etc. https://kamalmeet.com/uncategorized/owasp-top-10-security-threats/
Best Practices (API Gateways / Patterns): Use practices like Rate Limit, Circuit breaker, and bulkhead pattern to safeguard your application from attacks. Architectural best practices like API gateway in front of services make sure no direct access to service and also boilerplate responsibilities like security, HTTPS offloading, audit logging, etc can be offloaded from the main service.

Cloud Native Application Design – Infrastructure Security

When talking about security in the cloud, we can broadly categorize it into the following three areas.

Infrastructure Security
Application Security
Data Security

Infrastructure Security

Infrastructure security is about making sure that the infrastructure we are using is accessed only by authorized personnel. This is about both physical and virtual access. An important aspect when it comes to cloud security is understanding that it is a shared responsibility of the public cloud provider and development team.

Physical security: At the lowest level of security, one needs to consider the fact that physical machines can be accessed and tampered with. This is more possible when we have on-premise hardware infrastructure than on the cloud. But even when one is choosing a cloud platform, it makes sense to understand and question the level of physical security implemented by the cloud service provider to avoid any unauthorized access. This aspect is handled by the cloud service providers as part of shared responsibility.
Virtual access to infrastructure/ Role-based access (RBAC): The next level of access is someone gaining virtual access to the machines manually, programmatically, or through malware. Role-based access to make sure only authorized personnel or code can access data, having security groups and firewalls in place, and making sure security patches and antivirus definitions are always updated can help mitigate this threat.
Use Virtual Networks: Create Virtual Networks to group together resources needed by an application. For example, if a service API can only be accessed by an API gateway or a database should only be accessed by a particular microservice, we can make sure these components are in a virtual network and cannot be accessed from the outside world.
Manual Errors/ Infrastructure as a Code: A misconfiguration causing a VM exposed through unwanted open ports can be another problem. Implementing infrastructure as a code where automated scripts are responsible for creating and maintaining infrastructure can be helpful in avoiding manual errors.
Storage/ Data Access: Who can access a service or a filesystem or a database? What kind of access is required? How can the resources be accessed? One needs to answer these questions before getting started with the application development process. Role-based access is an important tool that can help architects making sure proper security. For example, a user or an application might just need read access on the file system or database, then rules should not allow any read or update access.
Audit Tracing: Most cloud service providers allow you to see any changes being done on infrastructure. You can monitor which resources were updated by whom and when. This is an important tool for teams to keep a track of changes.

Cloud Native Application Design – Capacity Planning

An important aspect of architecting a system is capacity planning. You need to estimate the resources that your software is going to consume. Based on the estimate one can easily calculate the overall cost/ budget needed for the project. Most cloud service providers have pricing estimation tools, where one can provide their requirements and calculate an estimated price for a year or specific time period.

When estimating the price one needs to come up with high-level infrastructural requirements.

The most important areas are

Storage
Database
Compute

There are other areas as well, but these three constitute the major portion and if we are able to estimate these, others should be easy.

Database

For estimating your database needs, you need to understand whall all entities you will be storing. Estimate the amount of data being stored for each entity for a specific time period, for example, a year. The core process would be the same for NoSQL and RDBMS databases, for example, you will store a document instead of a row in a document-based database.

Taking RDBMS as the base case, we will try to calculate capacity requirements for the sample table. Practically you will identify a few important tables that will help you estimate for complete requirements.

So let’s say you have a product table, first, we will check how much storage will be needed for single tow.

Column	Type	Size
Name	Varchar (512)	512 bytes
Description	Varchar (2048)	2048 bytes
Price	Float	4 bytes
Quantity	Number	4 bytes
..	..	..
..	..	..
..	..	..

Say Total we have 10K bytes

Total 10000 bytes or ~0.01MB
say we are anticipating 1 million records in a year, that would translate to
0.01 * 1,000,000 or 10000 MB or ~10 GB

Say we have 10 tables, we would have a storage need of a Total of 100 GB (add a buffer for indexing, metadata, etc)

The second is memory usage + CPU usage + Network bandwidth
Say I will never run queries with more than 2 tables join with a max 10 GB data, so I know I need RAM to support at least 20GB

Storage

Storage is easier to calculate

You are storing x mb of file and in 1 year you expect n number of files

x* n mb

Compute

Now compute is the most important and complex area for estimation. The battle-tested method for calculating compute requirements is through load test.

A load test gives you an idea of how much load can be handled by a node (VM or Pod)

Following are the usual steps for Load testing

Identify APIs that will be used most frequently
Identify APIs that are most heavy in terms of resource usage
Come up with h perfect mix of load (historical data helps) and load test the system
Let the load run for longer durations (a few days) for getting better results (memory leak)
Check data performance at different percentiles 50, 90 95, 99, 99.9
Check for TPS handled with load
Check for error rate and dropped requests
Monitor infrastructure performance – CPU, Memory, Request queues, Garbage collection, etc.

Once you have all the data, based on your SLAs you can figure out TPS (Transaction per second) your node can handle. Once you have the TPS number, it is easy to calculate the overall requirement.

For example, if your load test confirms that one node can handle 100 TPS and you have overall requirements for 1000 TPS, you can easily set

TPS calculation: Number of transactions/time in seconds, say your load test reveals 10000 requests were processed in 1 minute, TPS = 10000/60 or 166.6

Additional Consideration: conditions one need to take into consideration when finalizing capacity requirements

Disaster recover

If one or more nodes are down in one cloud
If a complete cloud region is down

Noisy Neighbour

Especially in a SaaS-based system, there is a phenomenon of a noisy neighbor where one tenant can eat up threads causing other tenants to wait.

Bulkhead pattern and/or rate limiting are common solution options to handle noisy neighbors. But one needs to consider threads/infrastructure that can be realistically blocked by noisy neighbors.

Performance Tuning

An important aspect of capacity optimization is to make sure we are using our resources in the best possible manner. For example, most applications or web servers have configurations that one can set to their requirements in order to get the best possible performance. Some examples are

Number of concurrent threads
Memory setting e.g. Java Heap Memory (Set to max)
Compression (true vs false)
Request queue (throttling)

Azure API Management

Azure API Management provides a set of services that can help users to manage API Lifecycle, i.e. Design. Mock, Deploy, Policy Management, Explore, Consume, and Monitor APIs.

we can see there are three core components here. Developer portal helping consumers to discover, try out and onboard to services. Management Plane helps providers manage the API policies and monitor them. The gateway is the interface between consumer clients and provider applications.

API Gateway

The API gateway acts as a facade to the backend services, allowing API providers to abstract API implementations and evolve backend architecture without impacting API consumers. The gateway enables consistent configuration of routing, security, throttling, caching, and observability.
https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts#api-gateway

To create API Gateway, you will need to go to Azure Portal -> API Management Service -> Create

Management Plane

API providers interact with the service through the management plane, which provides full access to the API Management service capabilities. Customers interact with the management plane through Azure tools including the Azure portal, Azure PowerShell, Azure CLI, a Visual Studio Code extension, or client SDKs in several popular programming languages.
https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts#management-plane

If Gateway was about implementing policies in real-time, Management plane is about helping developers set these policies and interact with analytics dashboards via portal VC Code extension or other Azure interfaces.

Developer Portal

App developers use the open-source developer portal to discover the APIs, onboard to use them, and learn how to consume them in applications.
https://learn.microsoft.com/en-us/azure/api-management/api-management-key-concepts#developer-portal

The developer portal allows consumers to search for APIs, explore them, consume them and view analytics from the consumer side.

Data Security

Infrastructure Security

Database

Storage

Compute

Operational Excellence

Reliability

Security

Performance Efficiency

Cost optimization

API Gateway

Management Plane

Developer Portal