Category Archives: Architecture

Design: Netflix

Designing or architecting a system is a complex task. One needs to think of various aspects that can impact a system. At a high level, we bucket the requirements into two parts – Functional and Non-Functional. Functional requirements, in simple words, can be thought of as functionalities one needs to build. Non-functional requirements can be complex as they usually will not be called out explicitly and as an architect, you need to figure out after discussions with various stakeholders.

Reference: https://kamalmeet.com/system-design-and-documentation/steps-for-designing-a-system-from-scratch/

In this post, I would try to look at the system design for Netflix. Of course, it is a complex system and it is difficult to cover in one post, but I will try to touch upon important aspects.

Functional Requirements:

  • Account Management: Create Account/ Login/ Manage and Delete the Account
  • Subscription Management
  • Search
  • Watch a Video: View/ Download for offline viewing
  • Recommendations: User-based/ Generic/ Top trends/ Genre
  • Device Synchronization
  • Language Selection: Audio/ Video

Non-Functional Requirements:

  • Performance: Realtime streaming performance
  • Reliability
  • Availability
  • Scalability
  • Durability

Data needed:

  • number of users
  • daily active users
  • the average number of videos watched per day/ per user
  • the average size of the video
  • number of videos total/ uploaded per day

Let me borrow the high-level architecture image

https://www.linkedin.com/pulse/system-design-netflix-narendra-l/?published=t

Microservices-based architecture: Netflix is an early adapter of microservices and helped popularize the use of microservices. Microservices help Netflix manage its critical services by keeping them stateless, secured, scalable, available, and reliable.

CDN or Content Delivery Network: In the image above we see Open Connect, which is Netflix’s CDN. For any application which has consumers across multiple geographies, CDN is an important piece. This helps deliver content like images, videos, JavaScript, and other files from a location nearest to the user helping improve performance. In addition, Netflix provides Open Connect Appliances to ISPS free of cost, which helps ISPs save bandwidth and helps Netflix Cache content for better performance.

Transcoding: Any video getting uploaded to Netflix then gets converted to videos of various resolutions. The video gets uploaded to a queue from where it is taken up by transcoder workers who after converting the video upload them to AWS S3. When a user clicks on a video to be played, the best option is chosen based on the client and bandwidth.

API Gateway: ZUUL is the API gateway used by Netflix, which provides features for gateway like security, authentication, routing, decorating requests, Beta testing (based on routing), etc.

Resiliency: It is a resiliency library by Netflix. It handles scenarios like timeout handling, failing fast by rejecting requests when the thread pool is full, circuit breaker when the error rate is heavy, fallback to default response, etc.

Cache: Netflix uses EV cache to provide performance, reduced latency, better throughput, and reduced overall cost. EV cache is a custom implementation of Memcache, which is not dependent on RAM and can use SSD.

Database: Netflix uses MySQL for data that needs ACID properties, data like user data. Read replicas are used to improve query performances. Cassandra is used for NoSQL, to keep data like browsing and watching history. Older history data can be moved to the compressed cheaper data store.

Logs Management: All log data is sent to Chukwa through Kafka. You can view logs on the dashboard. Finally, logs can be sent to S3 for further retention and usage.

Search: Elastic Search is used for indexing and searching.

Recommendations: Spark is used for data analysis. it helps rank content based on user history as well as using data from users with similar tastes. For example, if two users have given similar ratings to a movie, their tastes might be similar. Also if a user watches comedy content mostly, the recommendation engine might suggest more comedy content.

Refrences:

Cloud Native Application Design – Challenges with Microservices

In the previous post, I talked about the benefits of microservices. but the discussion will be incomplete without talking about some of the important challenges one should expect when going for microservices-based architecture.

Well, microservices provide a lot of benefits, but this definitely is not a silver bullet, and if not architected properly, the design can cause more pain than it will provide benefits. Here are some of the considerations one needs to keep in mind when designing an application with microservices.

Too few/ too many services: The first question one needs to deal with when going for microservices-based architecture is how to break the application into microservices. Too many microservices would mean you are unnecessarily complicating the system and too few would mean you are not getting the benefits of a microservice-based design.

Complex DevOps: Unlike a monolith application where you are deploying just a single application, not you are dealing with dozens of microservices. Each service needs its own compilation and deployment pipeline, which means independent management and tracking.

Monitoring: As multiple services are communicating with each other to make the application work, a single failure can impact the overall success. Hence it is important to monitor all services, which means a complex dashboarding, alerting, and monitoring system in place to keep a check on all pieces.

Multiple Tech Stacks: One of the advantages we get with microservices is the independence you get in choosing a tech stack for each piece, but too many tech stacks would mean difficult intra-team support and low expertise on technology.

Managing Data: Another challenge with microservice-based design is managing the data. As a rule of thumb, each microservice should manage its own data. But this can get tricky as sometimes microservices need to share data. If not managed properly one can run into a problem of duplicate sources of data or performance issues in fetching data from other services.

Design for Accessibility – 2

In the last post, I introduced the concept of accessibility, here I will discuss more on testing your application for accessibility. The good thing is that we have many tools that can help us with basic accessibility testing.

The first level of testing that you would like to do for accessibility would be for Keyboard navigation (does your application support tab/ arrow navigation) and Screen reader. All Operating systems come with inbuilt support for screen readers Windows has Narrator, Mac has VoiceOver, and there are readers like JAWS and NVDA.

Additionally, there are tools to help with accessibility testing, some of the common ones are following

Chromium Developer Tools: Chromium has inbuilt support for checking for accessibility and generating reports for a page. Refer https://developer.chrome.com/docs/devtools/accessibility/reference/

Accessibility insights: Another tool that can help with testing and generating a detailed report for an application. Refer https://accessibilityinsights.io/. It gives not only details but also suggestions on fixes. Here is a sample report

Report generated by Accessibility Insights
https://techblog.topdesk.com/accessibility/testing-accessibility-with-accessibility-insights/

Additional Tools Worth Exploring

Design for Accessibility – 1

As a software developer/ designer/ architect, when you think about software, there are many non-functional aspects of the software that you need to take care of. Accessibility is one such important non-functional aspect, which can get neglected if you are not paying attention.

Before getting into more details, let’s try to understand what accessibility is –

Accessibility enables people with disabilities to perceive, understand, navigate, interact with, and contribute to the web. 

https://medium.com/salesforce-ux/7-things-every-designer-needs-to-know-about-accessibility-64f105f0881b

Four core Accessibility principles – Perceivable, Operable, Understandable, and Robust.

Information about accessibility testing and compliance
https://www.pqatesting.com/2020/10/26/your-accessibility-testing-cheat-sheet/

WCAG or Web Content Accessibility Guidelines, current version 2.1 gives us a detailed idea of areas one needs to consider when working on the accessibility of a website.

Refer: https://www.w3.org/TR/WCAG21/#intro

WCAG 2.1 gives three levels of conformance A, AA, or AAA

WCAG diagram showing overall structure, with principles in the far left column, guidelines in the adjacent column, and success criteria numbers across the following three columns, A, AA and AAA.
https://openclassrooms.com/en/courses/6663451-make-your-web-content-accessible/6912083-get-to-know-the-web-content-accessibility-guidelines-wcag

WAI-ARIA: Web Accessibility Initiative – Accessible Rich Internet Applications or WAI-ARIA is a specification developed by W3C in 2008.

WAI-ARIA, the Accessible Rich Internet Applications Suite, defines a way to make Web content and Web applications more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface controls developed with HTML, JavaScript, and related technologies. 

https://www.w3.org/WAI/standards-guidelines/aria/

Further Readings

Cloud Native Application Design – Benefits of Microservices

In the last post, I introduced the concept of microservices and why it is a default fit for Cloud Native Design. But an important question that should be asked is why microservice-based design has gained so much popularity in recent years. What all advantages microservices gives us over traditional application design?

Let us go over some advantages of microservices-based design here

Ease of Development: The very first thing visible in microservices-based architecture is that we have divided the application into smaller services. This also means we can organize our developers into smaller teams, focusing on one piece of deliverable, less code to understand, less code to the manager, and hence can be more agile.

Scalability: A monolith application is difficult to scale as you are looking at replicating the whole application. In the case of microservices, you can focus on pieces that are actually required to scale. For example, in an eCommerce system, a search microservice is getting a lot of requests and hence can be scaled independently of the rest of the application.

Polyglot programming: As we have divided applications into smaller pieces, we have flexibility in choosing tech stacks for different pieces as per our comfort. For example, if we know a certain Machine learning piece of application/ microservice can be developed easier in python, whereas the rest of my application is in Java, it is easy to make the choice.

Single Responsibility Services: As we have divided the application into smaller services, each service is now focusing on one responsibility only.

Testability: It is easier to test smaller pieces of applications than to test them as a whole. For example, in an e-commerce application, if a change is made in the search feature, testing can be focused on this area only.

Agile Development: As teams are dealing with smaller deployable now, it is easier and a good fit for Continuous Integration (CI) and Continuous Deployment (CD), hence helping achieve less time to market.

Stabler Applications: One advantage microservices architecture gives us is to categorize our microservices on basis of criticality. For example, in our e-commerce system, order placement and shopping cart features will be of higher criticality than say a recommendation service. This way we can focus on the availability, scalability, and maintenance of areas that can directly impact user experience.

Cloud Native Application Design – Microservices

I cannot tell you whether the cloud has popularized microservice-based design or is it vice versa, but both go hand in hand. To understand this, let us take a quick trip to the history of the most popular application designs in past few years.

Monolithic design: If you are in the software Industry for more than a decade, you know that there was a time when the whole application was written as one single deployable unit. This does not sound bad if the application itself is small, but as the size of the application grows, it becomes difficult to manage, test, and enhance.

Some of the important challenges with monolith design were – difficulty to update, even a small change needing complete testing and deployment, being stuck with a tech stack, and difficult scaling.

Service-Oriented Architecture: The challenges in Monolithic design paved the way for an architecture where code was distributed in terms of services, hence called Service-Oriented Architecture or SOA. For example, in an e-commerce platform, we will have services for product management, inventory management, shipping management, and so on. This design helped in better organizing the code, but the final application was still compiled into one deployable and deployed to a single application server. Because of this limitation, this inherited most of the challenges from Monolith design.

Microservices: Though SOA inherited challenges from the Monolith era, one important change it brought was to the mindset of the developers and architects. We started looking at the application not as a single piece but as a set of features coming together to solve a common purpose. With the cloud, Infrastructure related limitations were reduced to a great extent, hence giving us the freedom to further divide the application not only at design time but also at run time.

https://microservices.io/patterns/microservices.html

A major change that has come with Microservices is that you are breaking your application into a smaller set of services (hence the term microservice), which can solve one piece of the problem independently. This microservice is designed, developed, tested, deployed, and maintained independently. This also solves most of the problems we faced in Monolith design, because now you can scale, manage, develop (hence use independent tech stacks), and test these microservices independently.

Cloud Native Application Design – Type of Services

At a high level, cloud-provided services can be categorized into the following – Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Before you design an application for the cloud, it is important to get a basic understanding of these types, to make a call on what application best fits into which kind of service.

Before the cloud, when we owned the infrastructure on which applications were deployed, we were responsible for buying and managing hardware (usually a Server or PC), then deploying Operating Systems, Firewalls, application servers, Databases, and other software for applications to run. Everything needed to be manually handled from the power supply to software patches. This was changed with the introduction of the cloud.

Infrastructure as a Service or IaaS is the basic set of services provided by cloud providers, where you are provided with bare minimum hardware needs. For example, you take a Virtual machine and install OS on that (mostly you will get a VM with pre-installed OS, but you are responsible for managing the patches and upgrades). On top of that, you are responsible for installing and managing any tools and software needed for your applications.

Platform as a Service or PaaS provides a platform on top of which you can deploy your applications. In this case, the only thing you are responsible for is the code or application and data. Amazon Elastic Beanstalk or Azure Web Apps are examples of such services, that help you directly deploy your applications without worrying about underlying hardware or software.

Software as a Service or SaaS, as the name suggests, are services where you get the whole functionality as off the shelf, all you need to do is login and use the software. Any email provider service is a good example, like Gmail. you just need to login and use the service.

https://www.redhat.com/en/topics/cloud-computing/iaas-vs-paas-vs-saas

Cloud Native Application Design – Cloud Computing

I introduced Cloud Native Application Design and Cloud Computing in the last post. As a next step, we will discuss cloud computing a little more. Understanding cloud computing is the first step toward generating cloud-native applications.

So what is cloud computing? Cloud computing is a term we use to collectively call services provided by cloud service providers. These services include storage, compute, networking, security, etc. Let me borrow the official definition from Wikipedia.

Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each location being a data center. Cloud computing relies on sharing of resources to achieve coherence and typically uses a “pay-as-you-go” model, which can help in reducing capital expenses but may also lead to unexpected operating expenses for users.

https://en.wikipedia.org/wiki/Cloud_computing

The definition covers a few important aspects.

On-Demand: You can activate or deactivate the services as per your need. You are renting the services and not buying any hardware. For example, you can activate/rent a Virtual Machine on the cloud, use it, and then kill it (you take compute capacity from the cloud pool and return it back when you are done).

Multiple Types of services: The definition talks about Compute and Storage (these are basic, and one can argue most services are built on these), but if you go to any popular cloud service provider’s portal, you will see a much larger set of services like databases, security services, AI/ ML based, IOT, etc.

Data Centers: Cloud service providers have multiple data centers, spread across various geographical regions in the world, each region has multiple data centers (usually distributed into zones, with one zone having multiple data centers). This kind of setup helps in replicating resources for scalability, availability, and performance.

Shared Resources: As already mentioned, when on the cloud, you do not own resources and share infrastructure with other users (though cloud providers also have an option for separated infrastructure at a higher price). This helps cloud providers manage the resources at scale, hence keeping the prices low.

Pay-as-you-go: Probably one of the most important factors for the popularity of the cloud. You pay for your usage only. Going back to our previous example, you created a VM for X amount of time, so you pay only for that time.

Unexpected Operating Expenses: Though Cloud is popular because it can help reduce overall infrastructure costs, it can also go the other way if you are not careful about managing your resources. For example, unused resources or unused capacity will add to your bills, at the same time, low capability resources will impact services and user experience. So striking the correct balance becomes important and needs expertise, hence adding to operating costs.

Cloud Native Application Design – Basics

Cloud native application, or cloud native design, or cloud native architecture, or .. well the list is endless, and unless you are living under a rock, you hear these terms almost on daily basis. The critical question here is, do we understand what the term “cloud-native” means? Mostly “cloud-native” is confused with anything that is deployed on the cloud. And nothing can be more wrong.

What is “Cloud-Native”?

Cloud-native architecture and technologies are an approach to designing, constructing, and operating workloads that are built in the cloud and take full advantage of the cloud computing model.

https://learn.microsoft.com/en-us/dotnet/architecture/cloud-native/definition

Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

https://www.cncf.io/about/who-we-are/

Let me just say, “Cloud Native” means “Built for Cloud”. You start by keeping in mind what Cloud can do for you while designing your application.

Easier said than done. Most of the time, you will design a solution and then try to fit it into the cloud. Another challenge might be your understanding of the cloud. Cloud (or better we should call it cloud computing) itself can be a complex concept. So let us take a moment to demystify it, and let’s start with how industry leaders define it.

What is Cloud-Computing?

Simply put, cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. You typically pay only for cloud services you use, helping lower your operating costs, run your infrastructure more efficiently and scale as your business needs change.

https://azure.microsoft.com/en-in/resources/cloud-computing-dictionary/what-is-cloud-computing/#benefits

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS).

https://aws.amazon.com/what-is-cloud-computing/

In short, Cloud-Computing is a set of services, in the form of compute capabilities, storage, networking, security, etc. that one can use off-the-shelf. As an extension, we can say Cloud-Native design is designing our system keeping the full advantage of these services.

Design vs Code – The curse of Agile

Found some old notes of mine about agile, almost a decade old (note I am referring to Agile as new 🙂 ), but surprisingly the core question I was pondering upon a decade back, still holds true. The question development teams still struggle to solve is – how much time is sufficient for the design phase when one is following Agile practices.

Agile is new, bold, and sexy. Everyone wants to be Agile. Every resume I have seen of late has “Agile Development” mentioned in one form or another.

But the question is, what is Agile? Now some would say Scrum is Agile, having status meetings every day is Agile, and development in Sprints is agile. Well, if you look at the dictionary, agile is “able to move quickly and easily”. And Scrum, Sprints, TDD, etc are just some tools to get things done faster. I have shared my thoughts on agile development here.

With Agile development, I have seen too often teams trying to jump on development from day one. Design and architecture sound old-fashioned. Why spend time on something that will not show up on the screen as the end product? Why not instead spend time developing something which one can demo to clients, and get appreciated for fast work?

When you are driving, speed is attractive, but it needs better clarity, stability, balance, and most importantly synchronization if we have a team. By pushing the design to the back burner, we are saying “let’s start driving, we will figure out the route map on the way”, and before we know it, every member of the team comes up with a different route map and is sitting miles away from each other.

As would not stop people from developing unless the design is complete, but definitely some ground rules should be set beforehand.