As I was scrolling down my Facebook news feeds, I marvelled at how gracefully the system handles the traffic of millions of users and yet does not fail to share the most relevant posts to the correct set of people! I decided to decipher the system and spent some time learning about it from various sources. I must also redirect you to the official Facebook engineering news feed. Let us try to design the system and tackle the challenges for ourselves!
Let us first limit the scope of this design adventure to a few functional and non-functional requirements. The functional requirements could be as follows for our use case:
- The system should allow users to do all sorts of posts which may include text, images and videos.
- The users should be able to see the posts done by their friends whom they have followed. This may also include celebrities.
- Users must get notifications whenever there is a new post from their friends.
- Assumptions :
- We have a graph DB to store who is a friend of whom.
- We have a system in place to keep updating the graph DB.
Similarly, the non-functional requirements can be as follows for our use case:
- The system need not be in a strongly consistent state and eventual consistency will be good enough for us. But the system should be highly available and scalable.
- The system should have very low latency.
- The system should be fault tolerant.
Now, we can start thinking about how we can achieve these objectives by thinking out loud and gathering our thoughts in one place. Looking at the problem at hand, we can easily figure out these points:
- To provide high availability and scalability, we must opt for distributed systems with great horizontal scaling capabilities.
- To provide low latency, we would need caches at various levels and places which should be in sync with the main databases.
- We need to solve the celebrity issue separately.
Based on all these things that we have discussed, we may come up with some back-of-the-envelope calculations for the problem statement:
- Let us assume that we have around 1 billion daily active users (DAU)! Of these, around 1 million are classified as celebrities (with more than 100k friends/followers).
- Of the 1 B DAUs, we can safely assume that the ratio of posting to reading posts is around 1: 100, and hence, our system will daily handle new posts = (1/100 ) * 1B = 10M new posts daily.
- We can assume that on average, a non-celebrity user has 100 friends to which each post must be delivered. This would mean that our system should fan out 10M * 100 = 1B feeds excluding celebrity content.
- Assuming that out of the 10M posts daily, the ratio of text to media content would be 1:1 and hence 5M text and 5M media content would be posted daily. If we assume that an average text post would be around 1KB and an average media content would be around 10MB. This would mean our daily storage capacity requirements would be close to 1KB * 5M + 10,000KB * 5M ~ 50TB daily. For a post to be retained for a year, we would need 50 TB * 365 ~ 19 PB. To store it for around 10 years (which Facebook does!) we would need around 200 PB of storage!
Now, with the numbers in hand and some idea around requirements, let's start thinking about how would we deliver the content as storing content might not be that big of an issue relatively (It is a big issue with such data! But relatively!). There are two models of delivering content. One way is the pull model or the user who wants to see the latest news feed must pull it from our nearest caches or databases by an API call. The other way is the push model where whenever someone creates a new post, we employ a notification service which will push the content to the nearest cache or database and notify the user of the content which will be delivered to the user as soon as they come online without any request. Perhaps we would need both models. Let us understand why. A push-based model is very convenient for users but it involves heavy lifting of transferring posts to all related friends nearest caches. This will involve a lot of data movement especially if the number of friends is too large. Hence this model might not be suitable for celebrities who have many followers. For the celebrities, we may well rely on the good old pull model as we may want to see celebrity posts only after we have seen all our friend's posts and we may afford to make an API call to get them. This will save us a lot of heavy lifting!
Based on the discussions above, I think now we are in a position where we may draft out our component diagram for the whole system. We are assuming that the obvious components like load balancers etc are already taken into account. The diagram may look something like this:
Figure 1: A proposed component diagram of the news feed system
The above component diagram represented in Figure 1 deals with all our use cases very well. Each of the components is scalable at there own level and the system as a whole is also scalable. The system is highly available (thanks to the distributed architecture), highly scalable (thanks to the NoSQL DBs and managed storages like S3 and managed services like Queues etc) and fault tolerant (again thanks to the distributed architecture!). Also, the system allows users to post and view each other's posts in a manner as described above.
This was an attempt to design a news feed system which might be something like what Facebook or Instagram might have! I am still learning and exploring these domains and great architectures which handle PBs of data and millions to billions of daily active users! Feel free to comment, share and provide your feedback!
-Amrit Raj
Comments