System design of Twitter

System Design of Twitter

Requirements
1. Follow others
2. Create Tweets with Images and Videos
3. News feed

Capacity estimation
Around 600 million users are using twitter, Daily active users are 200 million, each user read 10 tweets per day then 20 billion reads per day, on an average if a tweet is about 1 mb then the total reads would be 20 PetaBytes, that’s a lot, right.

Design
If 600 million users are there then the system would be a microservice which is horizontally scaling, a load balancer is attached to it to balance the load and distribute the request coming to it, so for load balancing we can use the consistent hashing to distribute the load to each server.
An authentication service has to be there to check the users request, and while creating the new Tweet and updating that tweet, so that Authentication and authorisation is handled in that service.
For tweets we have tweet service in which it handles reading, creating and updating the tweets, for creating the tweet, it would take, tweetId, userId, text, media( image, video ) and metaData, that data will be stored inside the relational database, and the media like images and videos could be saved inside the Object storage like Amazon S3 of Google cloud storage.

For accessing these data are going to have a caching server so that the data could be accessed faster, without reaching the database, like first the read request hits the cache and checks for the data, if it’s there then it’ll get the data faster, if we don’t have the data inside the cache then we are going to hit the database for that data, if that data is accessed frequently then we are going to place that data inside the cache these type of things are done using MRU or LRU or something like that.

For getting the data faster we are using the CDN for the Images or the videos or sometimes it may contain the actual tweet text also. by using the CND we can quickly get the tweets. for getting the tweets for tweets feed we only need userId, with the help of userId we’ll get the tweets that are made by the user that he follows or recommended tweets, when we are fetching the tweets we are going to index on the peoples that the user follow, it actually a range of tweets that i wanted to get and filled in the feeds section.
So its actually a read heavy than write, because on an average a person is reading 10 tweets per day than 20 billion reads per day, that a lot so that why its a read heavy.

For this we can maintain a master slave architecture, where the master will take care about the write operations and the slaves are responsible for the reads, in this case we can have multiple slaves for speedy access to the user, where that slaves are pulling the data after 5 - 10 seconds to update themselves or to stay consistent with each other this is a eventual consistency, it may have 5 seconds of delay while reading the data from the master.

For loading the Feeds instantly, we have to do like we have to cache the users followee tweet like it could be a user who’s active in last 1 week or 10 days, that feed could be cached and stored in the feeds cache to access quickly, like when the user is requesting for the feed then we are not going to query the database and get the data and do something to that and pass it to the user, no, we are doing like if there is any update in the tweet then we are pushing it to the messaging queue, and then with the feed service it’s going to load that 20 feeds into the feed cache, so that the user’s will get the feeds instantly.

System Design ( Day - 30 )

System Design of Twitter

Subscribe to my newsletter

Manoj Kumar

Manoj Kumar