In the series of exploring design for popular systems, I will look at a file share system like dropbox.
Functional Requirements
- User is able to upload or download files via a client application or web application
- User is able to sync and share files
- User is able to view the history of updates
Non Functional Requirements
- Performance: Low latency while uploading the files
- Availability
- Concurrency: Multiple users are able to update the same file
Scaling Assumptions
- Average size file – say 200 MB
- Total user base- 500 million
- Daily active users- 100 million
- Daily file creations- 10 per user
- Total files per user- 100
- Average Ingress per day: 10 * 100 million * 200 MB = 200 petabytes per day
Services Needed
- User management Service
- File Handler Service
- Notification Service
- Synchronization Service
File Sync
When Syncing the files we will break the file into smaller chunks, so that only the chunk which has undergone updates will be sent to the server. This architecture is helpful in contrast to sending the file to the server for every update. Say a 40 MB file gets broken into 2 MB chunks each.
This architecture helps solve problems like
- Concurrency: If two users are updating different chunks, there is no conflict
- Latency: Faster and parallel upload
- Bandwidth: Only chunk updated is sent
- History Storage: New version only need a chunk of data rather than full file space
The most important part of this design is the client component.
- Watcher: This component keeps an eye on a local folder for any changes. It informs Chunker and Indexer about changes.
- Chunker: As discussed above, the chunker is responsible for breaking a file into manageable chunks
- Indexer: On receiving an update from watcher, Indexer updates the internal database with metadata details. It also communicates with Synchrnozation service sending or receiving information on updates happening to files and syncing the latest version.
- Internal DB: To maintain file metadata locally on the client.
Cloud Storage finally stores the files and updates. Metadata server maintains metadata and helps inform clients about any updates through synchronization service. Synchronization service adds data to the queue which is then picked by various clients based on availability (if a client is offline, it can read messages later and sync up the data). Edge store helps provide details to clients from the nearest location.