Category Archives: Cloud Computing

Common Security Threats

In the last post, I talked about some of the design areas one needs to consider when designing an application for Cloud. Here I will talk about some of the very common threats that an architect one should consider when designing the system.

Physical Layer Access: At the lowest level of security, one needs to consider the fact that physical machines can be accessed and tampered with. This is more possible when we have on-premise hardware infrastructure than on the cloud. But even when one is choosing a cloud platform, it makes sense to understand and question the level of physical security implemented by the cloud service provider to avoid any unauthorized access.

Virtual Access: The next level of access is someone gaining virtual access to the machines manually, programmatically or through malware. Basic techniques like using a Virtual Network to isolate data and code machines, using Identity Management and Role-based access to make sure only authorized personals or code can access data, having security groups and firewalls in place and making sure security patches and antivirus definitions are always updated can help mitigate this threat.

Manual Errors: A misconfiguration causing a VM exposed through unwanted open ports can be another problem. Implementing infrastructure as a code where automated scripts are responsible for creating and maintaining infrastructure can be helpful in avoiding manual errors.

Weak Encryption: Though most cloud service providers give us options to encrypt our data, filesystems, and disks, it is the responsibility of the architect to make sure strong encryption is implemented. Tools like Key Vault services can help to store encryption keys to avoid manual handling. Also, all your APIs and pages dealing with important data should use HTTPS (Secured) protocol.

Application Layer Attacks: Common attacks like Code Injections, SQL Injections and Cross-Site Scripting (XSS) can be targeted towards the application. It is the responsibility of the architect and development team to make sure best practices are followed while writing the code to tackle these attacks.

Perimeter Layer Attacks: DDOS or Distributed Denial Of Service is a common attack used by hackers to bring an application down. Most cloud service provider gives you out of the box solutions which can help manage these threats.

Designing for Security in Cloud

At times we hear news like data by a big software company was compromised. Cloud gives us a lot of capabilities, but along with that comes certain vulnerabilities. As now anyone can access resources on the cloud, it is important that proper security measures are thought of while designing the system. Let’s take a look at some of the core security areas to be considered when designing for Cloud

Infrastructure Access: Who can access a service or a filesystem or a database. What kind of access is required? How can the resources be accessed? One needs to answer these questions before getting started with the application development process. Role-based access is an important tool that can help architects making sure proper security. For example, a user or an application might just need read access on file system or database, then rules should not allow any read or update access.

Traceability: Most cloud service providers allow you to see any changes being done on infrastructure. You can monitor which resources were updated by whom and when.

Layered Approach: When implementing security, most cloud service providers encourage layered approach. That is, implement security rules at different layers like a load balancer, application server, application code, database and so on. So that even in case one layer is compromised, your core application and data are still secured.

Encrypt Data at rest and transit: cloud service provider providers mechanism to secure your data at rest and in transit. Encryption is one big tool in your arsenal, for example, a simple step of using HTTPS against HTTP will ensure your data is encrypted and secured while in transit. Similarly, most cloud service providers have encryption available of disks and databases to help secure the data.

Data type specific security: You also need to understand that there will be certain needs specific to the type of data you are storing, for example, if you are storing healthcare-related data, you will need to understand HIPAA (Health Insurance Portability & Accountability Act) needs, for finance-related data you might want to check PCI (Payment Card Industry) data standards. Also, there might be region-specific needs like in Europe we have GDPR or General Data Protection Regulation for personal data.

Designing applications for Cloud

Cloud has changed the way we used to think about application design. We need to make sure we are using cloud capabilities to the fullest so that we are able to create the applications in a manner that are more robust, and can withstand changing load and unexpected failures. To start with, when one starts designing the application, consider the following points are addressed.

Scalability: Perhaps the single keyword that has helped make cloud popular. Whenever we start talking about cloud-based systems, the image that comes to our mind is of an elastic infrastructure that can grow or shrink based on user needs, or in more technical terms, the application should be able to scale out and scale in. Microservices based design is synonymous with a scalable system, where each microservice can independently scale irrespective of the rest of the system.

Security: We know that cloud-based systems work on shared responsibility where a part of the responsibility to secure the system lies on the development team. Cloud provides us with a lot of features that can be used to secure the system but it is the onus of the development team to understand the capabilities of cloud and implement security as per application needs. Some of the common best practices are- use of Virtual Networks to isolate infrastructure, use firewalls and security groups, use Identity features like IAM to control infrastructure access, use Encryptions and out of the box tools like DDOS security from cloud service providers.

Performance: Cloud gives development teams a lot of power in terms of choosing from different services and infrastructure range, but it is the responsibility of the developers to understand their performance needs and choose options accordingly. One needs to understand which operations are critical and what can be done at the background asynchronously. Performance tests based on the load expected and make sure you have the required capabilities assigned.

Availability: Cloud is famous for providing the availability of more than 99% for most of its services. At times you have to pay for extra reliability you need. Also, you will need to understand the concepts of availability zones and geographies to make sure you are using the capabilities to the fullest. If your application is using multiple services, you need to understand the availability promise of each of the services as you can commit availability based on the weakest link in the chain.

Cost Optimization: Mostly undermined, but one of the most important factors when using the cloud. With the ease of boarding on to new infrastructure and services, developers at times fall into trap of over-provisioning the infrastructure, leaving capabilities unused, not freeing up unused resources, etc. Thankfully most cloud service providers give you a tool to monitor the usage of infrastructure and even have tools that can help you optimize your cost.

Automation: Cloud service providers provide us with tools for setting up infrastructure as code. The idea is to automate the process of setting up all infrastructure needs to avoid human failure. This also helps in restoring failed services automatically.

Handling Failures: Moving to the cloud also needs one to understand what can go wrong and how to handle those situations. For example, Microservices based architecture helps one to develop a system that is scalable and maintainable, but it also brings in a risk that with multiple microservices we have multiple things that can fail. Does our design take into account what will happen if one or more services are down?

Monitoring: Once an application is deployed to a production environment, you do not have any control our how your code and infrastructure will behave. If any problem is faced, logs are your only friends. Plus if you can have monitoring and alerting system in place you can detect and act upon potential issues before end customers start noticing them. Again cloud service providers provide us with tools for monitoring and alerting infrastructure and code level logs. We need to make sure we consider what options we can use while we are in designing the solution.

Amazon AWS Core Services

Recently Amazon has conducted free online training to provide an introduction to core services provided by Amazon Web Services or AWS. Follow this link https://aws.amazon.com/events/awsome-day/awsome-day-online/ to get more details about the training. It was divided into 5 modules, where it talked at a high level about the major services provided by AWS. The training is very useful for anyone wanting to know more about AWS services or Cloud services in general.

Here is the summary of training:

What is the cloud?
All your infrastructure needs are provided to you off the shelf. You do not need to setup or maintain any infrastructure. You provision what you need and when you need it, and pay for only what you use.
The advantage you get is that you do not need to pre-configure infrastructure and pay for everything upfront. You need not predict the exact capacity you need. The cloud provides you with an option to pay only for what you use. You have easy options to scale up or down based on your needs.

How you manage the infrastructure?
On-Premise: When you maintain the whole infrastructure on your premises.
Online Cloud or Cloud: You do not maintain any infrastructure and everything is owned by the cloud service providers.
Hybrid: A mix of on-premise and online cloud.

Regions, Availability Zones, Data centers, and Edge location

Regions are the first thing you will need to choose when procuring a resource, mostly it is a geographic location like Singapore, US East, US west, etc. Each region has 2 or more availability zones (AZs). The availability zone has one or more data centers. Edge locations are customer-facing locations that can be set as endpoints for CDNs (Content Delivery Network).

An understanding of Regions and AZs is important to help us guide with Data governance legal requirements and proximity to customers (for better performance). Also, not all the services are available in all the regions, plus the cost might vary sometimes.

Amazon EC2: Virtual machines are the heart of any cloud system. Amazon provides EC2 or Elastic Cloud Compute to get compute resources. It has many configurations to choose from based on your needs.

Amazon EBS or Elastic Block Store: You can think of it as hard drives for your EC2. It is available in SSD and HDD types.

Amazon S3 or Simple Storage Service: S3 stores data in the form of objects. It gives 11 9’s or 99.999999999% durability. It is used for backup and storage, application hosting, media hosting and so on. You can secure and control access to your buckets using access policies. Also, you can turn on versioning to manage the version of objects/files. Additionally, you can also set up your bucket as a static website.

Amazon S3 glacier: This is low-cost storage for long term backup.

Amazon VPC or Virtual Private Cloud: You can manage your resources in Virtual networks, which gives you a way to manage access to the resources in the virtual networks based on security group rules.

Amazon Cloudwatch: Cloudwatch is a monitoring service. It gives different forms of usage metrics (CPU/ Network usage etc). One can add alarms and triggers based on events like CPU usage above a certain limit.

EC2 Autoscaling: Autoscaling allows us to add machines when traffic is raised and reduce machines when traffic is low. We can add or reduce EC2 instances based on events like CPU usage percentage.

Elastic Load balancing: Elastic Load Balancing is a highly available managed load balancing service. It provides Application Load balancer, network load balancer, and classic load balancer options. The application load balancer is layer 7 (application layer) load balancer which can be used based on request formats whereas Network load balancer works on layer 4 (network layer).

Amazon RDS or Relational Database service: supports Postgres, MariaDB, Oracle, MySQL, MS-SQL, Amazon Aurora. Aurora is a high-performance database option by AWS that supports MySQL and Postgres and is faster than normal databases as it is built in a manner to take advantage of cloud scaling mechanism.

Amazon DynamoDB: Dynamo DB provides various NoSql database options, which is built for low latency.

AWS Cloud formation: Clout formation gives us Infrastructure as code. You can code your templates which will then be used to deploy resources on AWS.

AWS Direct Connect: Direct Connect helps connect your on-premise infrastructure to AWS. You can bypass the internet and directly connect to AWS with help of Vendors that support direct connect.

Amazon Route 53: Route 53 is the DNS service where you can register and manage your domains.

AWS Lambda: AWS lambda provides an option to deploy your code in the form of functions directly on the cloud. You can focus on your code without worrying about the infrastructure on which it will run and scale.

Amazon on SNS: Simple Notification Service (SNS) is a fully managed pub-sub messaging service for distributed or serverless applications.

Amazon Cloud front: A fast, reliable content delivery network. A customer requesting from India will hit the nearest CDN and get the data delivered, hence much faster than accessing from the actual source which may be in the U.S. IT is a lazy loading content system. Therefore, for the first time, it will be getting data from sources but subsequent requests will get locally cached data on CDN.

Amazon Elastic Cache: Fully managed Redis or Memcached-compatible in-memory data store.

AWS Identity and Access Management (IAM): Manage users, group and roles. Users can be created and added as part of the groups. Groups and Users will be given roles through which they access resources. Roles can also be provided to applications and services like AWS Lambda so that they can access other resources directly.

Amazon Inspector: Does an analysis of your resources and provides a report for vulnerabilities and best practices.

AWS Sheild: It is provided out of the box in free and paid versions and help us protect applications from DDOS (Distributed Denial of service) attack.

5 pillars of a good cloud architecture

Amazon recommends understanding 5 pillars of a good cloud architecture – https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf

Azure Load Balancer

A load balancer is a tool that helps us manage traffic coming to a web application. In the simplest form, let’s say the application is deployed on two or more machines, so the role of load balancer here would be to make sure that incoming requests load is evenly distributed on all the machines. Also if one of the servers is down or not responding, the load balancer will be responsible for detecting this failure and redirect the traffic to healthy machines.

To see the load balancer in action, let’s bring up two (or more as per convenience) VM’s in Azure and install the IIS server.

Create a Resource -> Add Virtual Machine -> Choose “Windows Server 2016 Datacenter” image-> Add access for RDP (3089) and HTTP (80) ports.

Make sure both the machines are part of the same Availability Set (or Virtual Scale Set).

RDP to the machines, you will see Server manager (or bring it up)

Choose the option to Add roles and Features, and go ahead and add the IIS server.

Finally, make sure that the windows firewall allows traffic on port 80. Go to “Windows firewall and advanced security options” -> Inbound Rules -> Add New Rule ->Type port ->number 80.

Once the above steps are done, you can access the IIS server default page when you will hit the IP address of these VMs. To distinguish between the two webpages, you can make some modifications to either of them.

Go to C:\inetpub\wwwroot -> update html or image.

The next step is to set up the load balancer. Add a new resource -> Load balancer. First thing you will need to provide backend pool, for which you will choose the availability set in which both the VMs are available (or Virtual Machine Scale Set), next you will need to set up Health probe, as both our VMs are listening on port 80, you can simply set the port 80 for health probe. If the load balancer senses some problem with a machine based on the interval (seconds after which the load balancer ping the health probe) and unhealthy threshold (number of failures occurred after which load balancer treats the node as failure), the load balancer will stop sending traffic to that node.

Finally, you will set up a Load balancing rule, where all you need to provide is an incoming port on which traffic is expected, backend pool and health probe which we had already setup. Once this is set up, you can hit the load balancing URL and see that traffic is directed to the IIS page we set up earlier. If you will refresh the page multiple times you will be able to see traffic is going to both the server randomly. If one of the servers is shut down, the load balancer keeps on working fine with traffic redirected to the second server.

In addition to load balancer rule, one can also set up NAT rules, which are usually used for forwarding traffic on a port to a specific VM. Here is a good reference for that

https://rasmusg.net/2017/11/20/part-1-of-2-port-forwarding-in-azure/

https://rasmusg.azurewebsites.net/2017/11/20/part-2-of-2-port-forwarding-in-azure/

While we are on the topic of load balancers, it is important to note that there are two other ways in which we can control the traffic in Azure. These are Application Gateway and Traffic Manager. Here is a good comparison of different options for load balancing and which to prefer when https://devblogs.microsoft.com/premier-developer/azure-load-balancing-solutions-a-guide-to-help-you-choose-the-correct-option/

Azure Messaging Services

Another important factor in software development is messaging. With the popularity of Microservices and Serverless applications for scalable design, Message-based communication has received special focus.

Azure does provide us with multiple ways for message-based communication.

Azure Storage Queue: This is a simple form of messaging where one can create a queue under Azure Storage service, send and receive messages from the queue.

Azure Service Bus Queue: If you need more sophisticated queues with more control on data retention, create topics with publisher-subscriber pattern, dead letter queue support etc, Azure Service Bus queue is an option for you.

Azure Storage queue vs Service Bus queue: By this point, it is obvious to ask the question when should one use the Storage queue and when to use the Service queue. Let’s look at some important points to consider

Storage queue uses storage infrastructure to provide simple GET/ PUT/ PEEK operations on queues, whereas Storage bus uses proper message-based infrastructure, with that message can be received without constant polling by subscribing to queues and topics.
Storage bus provides features like FIFO, duplication detection, “At most once” delivery, etc.
Storage queue provides point to point communication whereas service bus can be used for multiple publishers – subscribers design.
Service Bus has a limit on queue size as 80GB, which is not there in Storage queue

More Detailed comparison- https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted

Azure Relay Service: If you want to expose a service on your local network to cloud, you can use Azure Relay service without too much of a hassle. This uses socket-based communication and you will need not open a firewall port or get into network-level communications by setting a VPN gateway.

More- https://docs.microsoft.com/en-us/azure/service-bus-relay/relay-what-is-it

Azure Event Grid: Sometimes we want to message an application or send an alert based on some event happening, for example, your CPU usage for a Virtual machine is more than 80%, you would like to alert the admin or trigger an Azure function to take some action.

Azure Event Hub: Event hub is more relevant for processing larger amount of data like telemetry or streaming data. A good example of Event hub usage is Azure Application insights which showcase important information about applications using telemetry data.

Azure Notifications Hub: Azure Notification hub is a solution that provides you with the functionality of sending messages to mobile applications and devices. You can send push notifications to millions of devices in one go using the notification hub.

Azure Virtual Networks

What is a virtual network?
Often, an application cannot be deployed in isolation on a single machine. There will be multiple servers interacting with each other. There might be multiple backend servers, frontend servers and databases involved. Often it is a requirement that these resources work together for an application to work smoothly. Virtual Network provides a virtual boundary inside which these resources can exist and communicate with each other, at the same time being isolated from the rest of the world.

Creating a Virtual Network

Creating a Virtual Network is pretty straightforward in Azure. You can select Virtual network resource and Add a new one. But while creation you will need to take care of two things – Address space and Subnet.

Address Space is a range of internal IP addresses that can be used for the Virtual Network, hence determines how many resources can be added to the Virtual Network. The address space used is defined in terms of CIDR (Classless Inter-Domain Routing or Supernetting). One needs to be careful while giving address space range specially if we are planning to use multiple Virtual networks that need to connect as we should keep address ranges unique in that case to avoid overlapping.

Subnet is setting up smaller network ranges within a Virtual network. This is particularly useful when you would like to subgroup elements within a network, for example setting up a different subnet for frontend servers and backend servers.

Communicating with on-premise resources

Point to Site
There are times when a user wants to connect to a network, for example, accessing an office network from a personal laptop to access emails. Point to Site Connectivity through a VPN client to VPN Server is the best option in this case.

Site to site
We saw that we used a point to site case when we need to provide a single point to communicate with the VPN. Similarly, whole a particular location or an office needs access to a virtual network we can create a site to Site Connection with Virtual Network Gateway.

Expressroute
Expressroute is a dedicated private connection from the source to the VPN. Microsoft provided a set of locations to which users can connect using a dedicated private line and get onboard to Expressroute.

Communicate among VPNs

There will be cases when resources in one VPN needs to communicate to resources in another VPN on Azure. The best way to achieve this is by using VPN Peering.

“Virtual network peering enables you to seamlessly connect Azure virtual networks. Once peered, the virtual networks appear as one, for connectivity purposes. The traffic between virtual machines in the peered virtual networks is routed through the Microsoft backbone infrastructure, much like traffic is routed between virtual machines in the same virtual network, through private IP addresses only. Azure supports:
VNet peering – connecting VNets within the same Azure region
Global VNet peering – connecting VNets across Azure regions”
–https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview

Jumpbox Pattern
When accessing and managing Virtual network resources from outside, jumpbox pattern is a common mechanism. Basically one machine in virtual network is designated as jumpbox, this jumpbox is accessible from outside word but no other resources can be accessed. Once the administrator is on jumpbox machine, he will be able to manage other resources through jumpbox in a controlled manner.

Azure Storage

Storage is one of the most important aspects provided by any cloud service provider. At the end of the day, you need a good storage solution for managing your data, code, backups, executables and basically everything. You would need a different type of solutions to manage different types of data, like data which you access frequently vs one which is used once in a month, data which has sensitive information vs the one which all users should be able to access, data which should be stored in a relational database vs the data which should be stored on NoSQL database, the list goes on.

Azure has a solution to all your needs. Let’s take a look at different storage options provided by Azure.

Location: You would like to choose a location nearest to your access point for better performance.

Performance: Standard performance is cheaper and will save your data on magnetic drives whereas Premium storage will save on solid-state drives and is good for data that need high performance.

Account kind: Storage V2 and V1 are general-purpose storage accounts where V2 will give you an option of Cool or Hot Access Tier, which can be selected based on how frequently the data is used. Another Account kind is blog storage, specializes in data storage in blob form.

Replication:
Locally Redundant Storage – Replicated across different racks in single data storage. This will manage hardware failure.
Zone Redundant Storage – Replicates data in different zones in a region. This makes sure even if a data center is down, you don’t lose the data.
Geo-Redundant Storage – Data is replicated across geographies. GRS replicates your data to another data center in a secondary region, but that data is available to be read-only if Microsoft initiates a failover from the primary to a secondary region.
Read-access geo-redundant storage (RA-GRS): is based on GRS. RA-GRS replicates your data to another data center in a secondary region and also provides you with the option to read from the secondary region. With RA-GRS, you can read from the secondary region regardless of whether Microsoft initiates a failover from the primary to a secondary region.

Now with Azure storage, we can use one of the following services

Blobs: Blobs are Binary Large OBjects. Blob storage also known as Object Storage, is perfect for storing binary and text data. Medial files, images, documents, application installers etc are the best fit for this type of storage. The maximum file size that can be stored is 4.77 TerraBytes. Azure Data Lake storage works on top of Blob Storage.

Files: As the name suggests this type of storage is best when dealing with files. This also gives us SMB 3.0 protocol support which means you can directly be mounted on local or remote machines. File storage can be attached to VMs and accessed.

Tables: The solution should be considered when we need to store data in tabular form.

Queues: We can set up queues for message-based communications. Messages can be published and read from these queues.

Accessing Storage Accounts
Azure provides us two core mechanisms to access objects in storage, one by using the storage access keys, where 2 secured keys are provided and secondly by using a shared access signature which is used for temporary and limited access.

Securing Data
Data at Rest – One can use encryption for securing data at rest. Azure provides transparent data encryption by default (can be turned off) for databases (master database is not encrypted).

Data in transit- Data in transit can be secured by using https and smb 3.0 protocols.

Data in execution- Azure provides TEE or Trusted Execution Environment and Confidential computing with DC series virtual machines.

Cloud Computing – an Introduction

When I started my career, analyzing and finalizing hardware needs for deployments was a major task and had to be taken up months before actual production deployments. Hardware was costly. Though we had providers which would provide machine virtually, you would need to decide the requirements beforehand as once you procured a machine, you had to pay at least a month’s rent. And if you decide to upgrade or downgrade the server machine, it was a painful manual task.

Just imagine what a nightmare it would have been to scale up during a surge in requests. You had to foresee it, plan for it, arrange hardware for it (monthly rents).

With the cloud, things have changed for the better. You have a pay as you go model, so you actually pay for the usage of hardware only. With autoscaling features inbuilt into the cloud infrastructure, it is easy to increase or decrease compute power without any human intervention. Setting up databases and scaling them is another area which the cloud takes care for us. Most of the cloud service providers support both relational and NoSQL databases in an easy to use manner.

Security, access management, monitoring, encryption, and storage are some of the other services which are provided by cloud services providers of the shelf. Another popular set of services off-late is serverless compute. This means one can write code directly which can be run as functions on the cloud, without worrying about the deployment details completely. Cloud provider is responsible for scaling and maintaining such functions. This is in sync with the microservice approach where each function can behave as an independent microservice.

With one’s mind taken away from hardware details, it is easier for software engineers to focus on building quality products. But it is important that we design our products in a manner which are capable of taking advantage of cloud services. For example, it will be easier for a microservices-based application to autoscale in a cloud than a monolith application. A stateless service is easier to be deployed and scaled on cloud than stateful service. One still needs to take care of the fact which services are exposed on the internet and which should be exposed only to internal service. With the ease of deployment, it might be easier to mess up a running service, so proper automated and manual checks are required to be implemented.

Cloud, though makes things easier, but one needs to be cautious of using its capabilities and designing the system in a manner to make maximum use of services being provided