System Design ( Day - 21 )

Manoj KumarManoj Kumar
5 min read

System Design of TikTok

Functionality

1 . Upload Videos
2 . Feed / Timeline video - Distribution
3 . Storing the Videos

System should have
Consistency
High availability
Fault Tolerant
Upload performance is good
Low latency between the upload and user visibility ( few minutes )
low latency during the distribution of video

Active users
10 Million daily active users ( DAU )
100K video Creators

Things to keep in mind
write ratio 1:100 for every every 100 viewers we have 1 creator. this is important for designing the DB.

Design
For Uploading Videos from the creators we’ll take the userData, videoMetaData, actual video, and we need to store these first ill handle the video storage, the storage would be like the reads are scalable than writes because one video would be consumed by may be Hundreds of thousands of people so for that we can use the Amazon S3.
for the video meta data, we can go with the NoSql like MongoDB for storing the key Value storage, for the User data we can go for the mysql, we know how the user data looks like so, this looks good without scaling, but we need to write the data to the server like 100k videos per day, if the creator uploads one file or video, if he/she uploads 5 videos then the total video would be 500k so we have to think something to scale, like while calling the api’s we have to add to the queue, for uploading the video, after adding to the queue, we have to give some acknowledgement to the user that, yeah we got your video we’ll upload it soon some thing like that or response code 202 like your request is been accepted
Why S3?
Because this would be provided by the amazon so that would be reliable and we are talking about scale so, we can tie up the S3 with the CDN, for high availability and low latency, because with the help of CDN the videos will be fetched from the nearest region, without going to the actual server may be it could be present in US or UK or somewhere else, so by using this we can scale up, or we can do fast reads with the help of CDN’s + S3.
Why NoSql for the Video meta data?
Because we want some flexibility in that schema, some times we want to change something, or add another data or something like this means we can easily do this with the help of the key-value storage, because its flexible add or remove or modify something in the schema and compared to Sql database the NoSql database is faster to access with the help of key’s or in the future if we are building the Recommendation system for the users then we have to store the video contents that user is watching, based on that we can build the recommendation for this we need flexible schema so that’s also another reason for choosing NoSql,
We can have a separate service only for uploading the videos and the formats of the video is different and there should be different resolution because there are different devices and different bandwidth, so that’s why we have this huge requirements, and also, if we want to store everything in the S3 it’ll be costly, and if we want we can build our own datacenter or something like that for cost cutting.

Upload flow
Whenever the user uploads the video, then we already pushing it into the queue for the processing, in that processing we’ll break down that to smaller units like while we are processing we have to do like we have to categories that video and validate that video length is within the limited guideline or not or some other validations, and we can change the resolutions for that video parallelly like we can have 4 chunks of the same video and then converting that video into the format that we want at the same time, and we can change the formats of the video as well, this will improve the performance of the upload because we are doing all the things at the same time and then upload that multiple format and multiple resolution videos to the S3, when we are uploading the videos then we have to store the replicas in different regions for high availability and fault tolerance, like we have different format of videos and different resolutions are there we have to store one resolution in every S3 Bucket and based on the location of the user most of the videos are uploaded into the nearest S3’s through the CDN’s those videos are served to the users quickly

Accessing the Video Service
we have one service which is going to serve the videos from the S3 to the device, for that, while making the request, we have to send the type of the video , and the current bandwidth of the user, and as per the recommendation system we can provide the videos to the users as quickly as possible.

Request Flow
When a user wants to see another users profile then we have to make a request for the userData and video metadata databases to get the users details and the list of videos that the user has and its thumbnail, url or something like that, when the user clicks on any thumbnail then that request will goes to the CDN and gets the video directly to the user.

Cache
For the trending videos and and most frequently videos’s meta data could be saved in the cache, in this case we don’t need to query the database again and again, we can directly get it from the cache and then it could be accessed from the CDN easily.

So this covers every features or non-functional features that are mentioned in the above points.

So this is the High level architecture of the TikTok, if you have some suggestions or improvements then we can connect and discuss on this Manoj Kumar or www.linkedin.com/in/manoj82202

Thank You

0
Subscribe to my newsletter

Read articles from Manoj Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Manoj Kumar
Manoj Kumar