When one thinks about deploying an application to Cloud, the first advantage that comes to mind is scalability. A proper word would be Elasticity, which implies that application can both scale in and scale-out. When talking about scalability, we can scale an application in two ways
Vertical Scaling: Also known as scaling up would mean adding more physical resources to the machine, for example, more RAM or CPU power to existing boxes.
Horizontal Scaling: Also known as scaling out would mean adding more boxes to handle more load.
Here are some common design techniques help us manage performance and scalability of a system
Data Partitioning: One traditional problem faced with performance and scalability is around data. As your application grows and the size of your data gets bigger, executing queries against it becomes a time-consuming job. Partitioning the data logically can help us scale data and keeping performance high at the same time. One example can be to manage data for different geographies in different data clusters.
Caching: It is an age-old technique for making sure our system performs fast. Cloud provides us out of the box caching mechanisms like Redis cache which can be used off the shelf and help us improve performance.
AutoScaling: Cloud service providers help us autoscale horizontally based on rules. For example, we can set a rule that when average CPU usage of current boxes is beyond say 70%, add a new box. We can also have rules for scale in like if the average CPU usage is below 50%, kill a box.
Background Jobs: Move the code which can be managed asynchronously and independently like reports generation or AI model execution to batch jobs or background jobs. This will help manage the performance of core application features.
Messaging Infra: Again use Messaging or Queue based communication for asynchronous tasks. Services handling messages can scale up or down based on need.
Scale units: At times you will feel that when scaling up you need to scale more than just virtual machine, for example, an application uses X web servers, Y queues, Z storage accounts, etc. We can create a unit consisting of the infra need and scale resources as a unit.
Monitoring: Have performance monitoring in place to make sure your application and services are following SLAs. Monitoring will also help us identify problem areas, for example, one service might be slowing down the whole system.