Tag Archives: Design

Clean Code: Error Handling, Testing, and Classes

In the clean code series, I will cover Error Jandling, Testing and Clean classes in this post.

Error Handling

  • Catch only Exceptions meant to be caught, for example, checked exceptions in Java
  • Log as much information as possible when an error or exception occurs for analysis
  • Null objects should not be returned, instead, return empty objects


  • Code Coverage targets should be set and achieved
  • TDD helps write testable code and reduce the number of issues
  • Use the F.I.R.S.T rule for testing:
    • The test is fast-running.
    • The tests are independent of others.
    • The test is repeatable in various environments.
    • The test is self-validating.
    • The test is timely.

Clean Class

  • Single Responsibility
  • Open for extension and closed for Modifications
  • Readability: self-documenting class names, function names, and variable names

Clean Code: Comments, Formatting, and Objects

In continuation with the clean code series, here I am covering Comments, Formatting, and Objects and data structures.


  • Code should be self-explanatory, the purpose of the code is that humans should be able to understand and not that computer is able to execute it
  • Comment only where logic is complex
  • Private API should not have comments
  • Use comments when you want to caution other developers, for example, why List instead of Queue was chosen and should not be modified
  • Comments should answer why (was a decision made) and not how the code works


  • Formatting is a way to communicate with fellow developers
  • Readability = Maintainability = Extensibility
  • Verticle Alignment: Keep connected functions together for better readability
  • Horizontal Alignment: one should never need to scroll right
  • Team Formatting Rules: everyone should follow the same rules, braces, ident size, spaces/tabs

Objects and Data Structures

  • Follow OOPS Principles, e.g. parameters and behavior should be encapsulated
  • Law of Demeter:  M method of an object O can only consume services of the following types of objects:
    • The object itself, O.
    • The M parameters.
    • Any object created or instantiated by M.
    • Direct components of O.
  • Avoid Static methods wherever possible

Clean Code: Naming and Functions

Inspired by Clean Code by Robert C Martin, trying to summarize coding best practices. Starting with Naming and Functions best practices in this post.


  • Names should encode the intent, for example, studentBirthYear.
  • Use Good distinction: Do not use list1, list2, etc
  • Use Pronounceable name: dobmmyy vs dateOfBirthInMonthAndYear
  • use searchable names: int i, j, when you will try to search you will find a lot of them in code
  • Do not add type: phoneString, name String, name and phone should be sufficient
  • Avoid unclear prefixes: m_name vs manager_name
  • nouns for names and verbs for functions: employee for class and paySalary for function
  • Use Consistent concept: controller vs manager
  • Don’t use the same name twice to mean 2 different things. paymentInfo at one place returns bank details and another place user payment
  • Use Domain specific names
  • Avoid too long or too short names: Long is fine if it conveys better information, but not too long that makes it difficult to pronounce


  • Write Small functions, functions larger than 20 lines should be avoided
  • Make sure the function does only one thing
  • Use minimum arguments: max 2, if the function takes too many arguments, it is doing too much
  • DRY, Do not Repeat yourself: IF you are doing the same thing in multiple functions, move it to commonplace
  • Don’t use flag element, parameters of the Boolean type as a parameter already clearly state that it does more than one thing.

Design: Twitter

In the series of exploring designs for popular systems, I will try to come up with Twitter’s system design today.

Functional Requirements

  • User should be able to Tweet
  • User should be able to follow and view tweets of others on their timeline
  • User should be able to search for tweets
  • User should be able to view current trends

Non Functional requirements

  • Availability
  • Performance
  • Eventual Consistency
  • Scale 150 million users with 500 million tweets per day or ~5500 Tweets/Second

30000 view of Architecture

30000 view of Twitter Architecture

There are one or two aspects of the above design which are very interesting. The first one we can see is the user timeline. This can be a complicated piece, whenever a user login into the app, he should see his timeline, which will show all the tweets from people he is following. The user might be following hundreds of accounts, it will not be feasible to calculate tweets from all these accounts at runtime and create timeline data. So a passive approach makes sense here, where we can keep the user timeline data in a cache beforehand.

Say user A is following user B, and user B publishes a new tweet, at that time itself, user A timeline will be updated with a new timeline getting added to user A timeline data. Similarly, if 100 users are following user B, all the timelines get updated (Fanout the tweet and update all timelines).

It can get tricky if user B has millions of followers. A different approach can be used in this case. Assuming there are not many such popular users, we can create a separate bucket for handling these popular users. Say user A is following user C (celebrity), so instead of updating the timeline for C beforehand, tweets for all such celebrity users can be pulled in real-time.

Another important aspect is hashtagging and trends exploration. For all the tweets coming in, the text can be tokenized and tokens can be analyzed for most usage. For example, when a cricket match is going on in India, many people might tweet with the term match or cricket. Again these trends might be geo-location-based as this particular trend is a country-specific one.

Design: Whatsapp

In this series of trying to understand designs for popular systems, I am taking up Whatsapp in this post. Please remember all the designs I am sharing in this series are my personal view for educational purposes and might not be the same as actual implementation.

To get started let us take a look at the requirements

Functional Requirements

  • User should be able to create and manage an account (Covered already)
  • User should be able to send a message to contact
  • User should be able to send a message to a group
  • User should be able to send a media message (image/ video)
  • Message Received and Message Read receipts to sender
  • Voice Calling (Not covering here)

Non Functional Requirements

  • Encryption
  • Scaleability
  • Availability

At a very high level, the design looks very straightforward

The first important thing that we see here is that communication is not one-way like a normal web application where the client sends a request and receives a response. Here the mobile app (client) should be able to receive live messages from the messaging server as well. This kind of communication is called Duplex-Connection where both parties can send messages. To achieve this, one can use long polling or web sockets (preferred).

Communication Management: When a user sends a message, it will be sent to the queue of messages received, from where it will be processed and sent to the queue for messages to be sent to users.

Media Management: Before sending the message for processing, media is uploaded and stored in a storage bucket, and the link is shared with users, which can be used to fetch the actual media file.

Single/ Double/ Blue Tick: When a message is received and processed by the server, the information is sent back to the user and marked single ticked. Similarly, when the message is sent successfully the receiver is marked double tick and finally, when the receiver opens the message, it is blue ticked for the sender.

Additional Perpectives

WhatsApp system architecture design

Design: URL Shortner

The problem we are trying to solve is to create a service that can take a large URL and return a shorter version, for example, say take https://kamalmeet.com/cloud-computing/cloud-native-application-design-12-factor-application/ as input and give me https://myurl.com/xc12B2d, a URL easy to share.

The application looks simple, but it does provide a few interesting aspects.

Database: The Main database will be used to store long URLs, short URLs, created dates, created by, last used, etc. as we can see this will be a read-heavy database and should be able to handle large datasets, a NoSQL document-based database should be good for scalability.

Data Scale:

  • Long URL – 2 KB (2048 chars)
  • Short URL – 7 bytes (7 chars)
  • Created at – 7 bytes (7 chars for epoch time)
  • last used – 7 bytes
  • created by – 16 bytes (userid)
  • Total: ~2KB

2KB * 30 million URLs per month = ~60 GB per month or 7.2 TB in 10 years

Format: The next challenge is to decide the format of the tiny URL. The decision is an easy one, Base 10 URL would give you 10^7 or 10 million combinations for a 7-character string whereas a Base 62 format will give 62^7 or 3.5 trillion combinations for 7 character string.

Short URL Generator: Another challenge to solve is how to choose a random 7 Base 62 string for each URL.

Soln 1: Use MD5 which returns a string of 20+ chars, we can take the first 7 characters. The problem here is taking the first 7 characters might lead to a collision where multiple strings have MD5 with the same first 7 characters

Soln 2: Use a counter-based approach. A counter service will generate the counter which gets converted to Base 62, making sure all requests get a unique Base 62 string. To scale it better, we will have a distributed counter generator.

Design: Dropbox

In the series of exploring design for popular systems, I will look at a file share system like dropbox.

Functional Requirements

  • User is able to upload or download files via a client application or web application
  • User is able to sync and share files
  • User is able to view the history of updates

Non Functional Requirements

  • Performance: Low latency while uploading the files
  • Availability
  • Concurrency: Multiple users are able to update the same file

Scaling Assumptions

  • Average size file – say 200 MB
  • Total user base- 500 million
  • Daily active users- 100 million
  • Daily file creations- 10 per user
  • Total files per user- 100
  • Average Ingress per day: 10 * 100 million * 200 MB = 200 petabytes per day

Services Needed

  • User management Service
  • File Handler Service
  • Notification Service
  • Synchronization Service

File Sync

When Syncing the files we will break the file into smaller chunks, so that only the chunk which has undergone updates will be sent to the server. This architecture is helpful in contrast to sending the file to the server for every update. Say a 40 MB file gets broken into 2 MB chunks each.

This architecture helps solve problems like

  • Concurrency: If two users are updating different chunks, there is no conflict
  • Latency: Faster and parallel upload
  • Bandwidth: Only chunk updated is sent
  • History Storage: New version only need a chunk of data rather than full file space

The most important part of this design is the client component.

  • Watcher: This component keeps an eye on a local folder for any changes. It informs Chunker and Indexer about changes.
  • Chunker: As discussed above, the chunker is responsible for breaking a file into manageable chunks
  • Indexer: On receiving an update from watcher, Indexer updates the internal database with metadata details. It also communicates with Synchrnozation service sending or receiving information on updates happening to files and syncing the latest version.
  • Internal DB: To maintain file metadata locally on the client.

Cloud Storage finally stores the files and updates. Metadata server maintains metadata and helps inform clients about any updates through synchronization service. Synchronization service adds data to the queue which is then picked by various clients based on availability (if a client is offline, it can read messages later and sync up the data). Edge store helps provide details to clients from the nearest location.

REST APIs Naming – Beyond CRUD

Not planning to write yet another article on best practices for REST APIs as the topic is covered multiple times already. What better than Google’s reference document – https://cloud.google.com/apis/design/naming_convention

Here I would like to discuss some cases which are not straightforward. But before going there it makes sense to revise some basic concepts.

REST stands for Representational State Transfer. One can manage state of a resource.

What is a resource?

From https://restfulapi.net/

The key abstraction of information in REST is a resource. Any information that we can name can be a resource. For example, a REST resource can be a document or image, a temporal service, a collection of other resources, or a non-virtual object (e.g., a person).

The state of the resource, at any particular time, is known as the resource representation.

The resource representations are consist of:

  • the data
  • the metadata describing the data
  • and the hypermedia links that can help the clients in transition to the next desired state.

When we say REST can help to manage resources (CRUD operations), it is done by following methods

  • POST for Create
  • GET for Read
  • PUT and Patch for update
  • DELETE for Delete

There are other methods like options and head, but we will focus on the core CRUD operations mentioned above.

To get started let’s take a simple use case, where we have a resource Employee

Generic URL format will look like /{baseurl}/{service or microservice}/{resource}

For example https://api.kamalmeet.com/employee-management/employees

  • GET list of the employees GET /employees
  • Get specific data GET /employees/{id}
  • Create a new object POST /employees
  • Update an employee object Patch or PUT /employees/{id}
  • Delete the object DELETE /employees/{id}

Now that was the easy part

Let us talk about some complex cases now, which are not straightforward to fit into REST naming conventions.

Fetch Related resources for the object


Controller verb for a special operation


Complex resources representation

only get specific orders (dashes are acceptable) 

Fetch only specific columns

/employees/?fields={name, department, salary}

Complex searches (reports)


Complex listing


Above are some of the acceptable practices. Users can modify as per their needs.

Design: Netflix

Designing or architecting a system is a complex task. One needs to think of various aspects that can impact a system. At a high level, we bucket the requirements into two parts – Functional and Non-Functional. Functional requirements, in simple words, can be thought of as functionalities one needs to build. Non-functional requirements can be complex as they usually will not be called out explicitly and as an architect, you need to figure out after discussions with various stakeholders.

Reference: https://kamalmeet.com/system-design-and-documentation/steps-for-designing-a-system-from-scratch/

In this post, I would try to look at the system design for Netflix. Of course, it is a complex system and it is difficult to cover in one post, but I will try to touch upon important aspects.

Functional Requirements:

  • Account Management: Create Account/ Login/ Manage and Delete the Account
  • Subscription Management
  • Search
  • Watch a Video: View/ Download for offline viewing
  • Recommendations: User-based/ Generic/ Top trends/ Genre
  • Device Synchronization
  • Language Selection: Audio/ Video

Non-Functional Requirements:

  • Performance: Realtime streaming performance
  • Reliability
  • Availability
  • Scalability
  • Durability

Data needed:

  • number of users
  • daily active users
  • the average number of videos watched per day/ per user
  • the average size of the video
  • number of videos total/ uploaded per day

Let me borrow the high-level architecture image


Microservices-based architecture: Netflix is an early adapter of microservices and helped popularize the use of microservices. Microservices help Netflix manage its critical services by keeping them stateless, secured, scalable, available, and reliable.

CDN or Content Delivery Network: In the image above we see Open Connect, which is Netflix’s CDN. For any application which has consumers across multiple geographies, CDN is an important piece. This helps deliver content like images, videos, JavaScript, and other files from a location nearest to the user helping improve performance. In addition, Netflix provides Open Connect Appliances to ISPS free of cost, which helps ISPs save bandwidth and helps Netflix Cache content for better performance.

Transcoding: Any video getting uploaded to Netflix then gets converted to videos of various resolutions. The video gets uploaded to a queue from where it is taken up by transcoder workers who after converting the video upload them to AWS S3. When a user clicks on a video to be played, the best option is chosen based on the client and bandwidth.

API Gateway: ZUUL is the API gateway used by Netflix, which provides features for gateway like security, authentication, routing, decorating requests, Beta testing (based on routing), etc.

Resiliency: It is a resiliency library by Netflix. It handles scenarios like timeout handling, failing fast by rejecting requests when the thread pool is full, circuit breaker when the error rate is heavy, fallback to default response, etc.

Cache: Netflix uses EV cache to provide performance, reduced latency, better throughput, and reduced overall cost. EV cache is a custom implementation of Memcache, which is not dependent on RAM and can use SSD.

Database: Netflix uses MySQL for data that needs ACID properties, data like user data. Read replicas are used to improve query performances. Cassandra is used for NoSQL, to keep data like browsing and watching history. Older history data can be moved to the compressed cheaper data store.

Logs Management: All log data is sent to Chukwa through Kafka. You can view logs on the dashboard. Finally, logs can be sent to S3 for further retention and usage.

Search: Elastic Search is used for indexing and searching.

Recommendations: Spark is used for data analysis. it helps rank content based on user history as well as using data from users with similar tastes. For example, if two users have given similar ratings to a movie, their tastes might be similar. Also if a user watches comedy content mostly, the recommendation engine might suggest more comedy content.


Building Reactive Systems

In today’s world, we are always striving for building applications which can adapt to constantly changing needs. We want our systems to flexible, resilient, scalable and withstand end user’s high expectations.

Considering these needs, a Reactive Manifesto was put together, with best practices which will help us build robust applications. Following four pillars makes a strong base of reactive application.

1. Responsive
2. Resilient
3. Elastic
4. Message Driven

You can see none of these concepts are new in nature, and you might be already implementing them in the applications you build. Reactive Manifesto brings them under one umbrella and emphasizes their importance. Let’s take a look at these pillars one by one and see what all well-known patterns we can use to implement each of them.

Responsive: An application is responsive if it responds to the user in a timely manner. A very simple example is you clicked on a button or link in a web application, it does not give you a feedback that button was clicked and the action gets completed after few seconds. Such an application is non-responsive as the user is left guessing if he is performing the right action.

Some of the well-known practices and design patterns which help us make sure if the application is responsive
– Asynchronous communication
– Caching
– Fanout and quickest reply pattern
– Fail-fast pattern

Resilience: An application is called resilient if it can handle failure conditions in a graceful manner.

Some of the patterns that help maintain resilience
– Circuit breaker pattern
– Failure handling pattern
– Bounded Queue Pattern
– Bulkhead Pattern

Elasticity: An application is called elastic if it can stand increase or decrease in load without any major impact on overall working and performance.

Some of the practices and patterns to implement elasticity
– Single responsibility
– Statelessness
– Autoscaling
– Self-containment

Message Driven: An application which uses message driven communication makes sure we are implementing various components and services in a loosely coupled manner. This helps us keep our components scalable and makes failure handling easy.

Practices used to implement Message driven communicaiton
– Event driven
– Publisher Subscriber pattern
– Idempotency pattern

I have covered some of these patterns in details in my book on Design Patterns and Best Practices.