Recently, I was using this tiny URL service which basically takes a URL from us (henceforth referred to as the long URL) and creates a shortened URL for us. We can now use the short URL generated
in the browser and it will redirect to the longer URL (original URL). At first look, this looks like an easy project to make but as we bring in the scale and distributed environment with a lot
of data in the picture, things get complicated very quickly. Let's try and design this system together!
To begin with, let's come up with the functional and non-functional requirements.
Functional Requirements:
1. Given a long URL, the service should return a shortened URL which should be as short as possible.
2. Given a short URL, it should be able to redirect to the longer URL.
Non-functional requirements:
1. The mapping should be persisted for 5 years.
2. The system should be highly available.
3. The system should be scalable and should be able to support 1 billion queries per month.
4. The system should be designed keeping in mind that the problem statement is read-heavy and not write-heavy. The ratio of read: write = 10: 1.
Based on these requirements, let's do some "back of the envolope" calcultaions to estimate the scales of the system.
Query per second (QPS) calculation:
1 billion queries per month = (1 billion) / 30 * 24 * 60 * 60 ~ 385 QPS. Max QPS ~ 2 * normal QPS = 2 * 385 ~ 800 QPS
Storage-based calculations:
Assuming a 1:10 write-to-read ratio, the write requirements for 5 years = 5 * 12 * 1 billion * 1/10 * 10 KB (assuming the size of each record) ~ 60 TB of space
Max unique URLS's needed:
As we are storing the URLs for 5 years, we would need 1 billion * 1/10 * 12 * 5 = 6 billion unique URLs for 5 years.
URL shortening algorithm:
To make the shortened URL as short as possible, let's try to use 62-base encryption (https://en.wikipedia.org/wiki/Base62). We can assign a UUID for each long URL we receive and we can encrypt the UUID and
generate the short URL using that instead of the long URL. This would ensure that we have no conflicts (the probability of getting the same UUID is almost zero if not exactly zero!).
So if we assume 62 characters (1-9, a-z and A-Z), if we use a small URL of size 7, we may accommodate 62^7 ~ 3.5 trillion URLs! So for our use case, we are good with a 7-character long short URL.
Now, based on the above constraints, let us try to answer come up with each component that we would be needing to build the system. The components would be:
1. Database: Given that the data is highly structured and the space requirements are also not in Petabytes, we may opt for a simple SQL-based database, probably MySQL.
2. Cache: We might want to use the most frequently used URLs and store them in a cache. Any key-value-based cache will do. For now, let's go with Redis.
3. We will also need other components like load balancers, servers etc.
Now, with all this information in hand, let's try and come up with the first draft of the design for our system.
in the browser and it will redirect to the longer URL (original URL). At first look, this looks like an easy project to make but as we bring in the scale and distributed environment with a lot
of data in the picture, things get complicated very quickly. Let's try and design this system together!
To begin with, let's come up with the functional and non-functional requirements.
Functional Requirements:
1. Given a long URL, the service should return a shortened URL which should be as short as possible.
2. Given a short URL, it should be able to redirect to the longer URL.
Non-functional requirements:
1. The mapping should be persisted for 5 years.
2. The system should be highly available.
3. The system should be scalable and should be able to support 1 billion queries per month.
4. The system should be designed keeping in mind that the problem statement is read-heavy and not write-heavy. The ratio of read: write = 10: 1.
Based on these requirements, let's do some "back of the envolope" calcultaions to estimate the scales of the system.
Query per second (QPS) calculation:
1 billion queries per month = (1 billion) / 30 * 24 * 60 * 60 ~ 385 QPS. Max QPS ~ 2 * normal QPS = 2 * 385 ~ 800 QPS
Storage-based calculations:
Assuming a 1:10 write-to-read ratio, the write requirements for 5 years = 5 * 12 * 1 billion * 1/10 * 10 KB (assuming the size of each record) ~ 60 TB of space
Max unique URLS's needed:
As we are storing the URLs for 5 years, we would need 1 billion * 1/10 * 12 * 5 = 6 billion unique URLs for 5 years.
URL shortening algorithm:
To make the shortened URL as short as possible, let's try to use 62-base encryption (https://en.wikipedia.org/wiki/Base62). We can assign a UUID for each long URL we receive and we can encrypt the UUID and
generate the short URL using that instead of the long URL. This would ensure that we have no conflicts (the probability of getting the same UUID is almost zero if not exactly zero!).
So if we assume 62 characters (1-9, a-z and A-Z), if we use a small URL of size 7, we may accommodate 62^7 ~ 3.5 trillion URLs! So for our use case, we are good with a 7-character long short URL.
Now, based on the above constraints, let us try to answer come up with each component that we would be needing to build the system. The components would be:
1. Database: Given that the data is highly structured and the space requirements are also not in Petabytes, we may opt for a simple SQL-based database, probably MySQL.
2. Cache: We might want to use the most frequently used URLs and store them in a cache. Any key-value-based cache will do. For now, let's go with Redis.
3. We will also need other components like load balancers, servers etc.
Now, with all this information in hand, let's try and come up with the first draft of the design for our system.
Figure 1: A HLD of a simple tiny URL service design
In the above image, we have not added load balancers and properly distributed systems but these should be there. I leave it to the readers to improvise on this and come up with the best diagram for it. The lower-level details around it are mentioned above. Better approaches and improvements are welcome.
Amrit Raj
Comments