The problem we are trying to solve is to create a service that can take a large URL and return a shorter version, for example, say take https://kamalmeet.com/cloud-computing/cloud-native-application-design-12-factor-application/ as input and give me https://myurl.com/xc12B2d, a URL easy to share.
The application looks simple, but it does provide a few interesting aspects.
Database: The Main database will be used to store long URLs, short URLs, created dates, created by, last used, etc. as we can see this will be a read-heavy database and should be able to handle large datasets, a NoSQL document-based database should be good for scalability.
Data Scale:
- Long URL – 2 KB (2048 chars)
- Short URL – 7 bytes (7 chars)
- Created at – 7 bytes (7 chars for epoch time)
- last used – 7 bytes
- created by – 16 bytes (userid)
- Total: ~2KB
2KB * 30 million URLs per month = ~60 GB per month or 7.2 TB in 10 years
Format: The next challenge is to decide the format of the tiny URL. The decision is an easy one, Base 10 URL would give you 10^7 or 10 million combinations for a 7-character string whereas a Base 62 format will give 62^7 or 3.5 trillion combinations for 7 character string.
Short URL Generator: Another challenge to solve is how to choose a random 7 Base 62 string for each URL.
Soln 1: Use MD5 which returns a string of 20+ chars, we can take the first 7 characters. The problem here is taking the first 7 characters might lead to a collision where multiple strings have MD5 with the same first 7 characters
Soln 2: Use a counter-based approach. A counter service will generate the counter which gets converted to Base 62, making sure all requests get a unique Base 62 string. To scale it better, we will have a distributed counter generator.