Distributed Object Storage System (MiniS3)

Introduction
Yesterday, I was directed to a job posting on Amazon’s career website. The position was for a Software Development Engineer matching my skills and experience. I was given some advice recently to build systems and work on projects tailored to specific projects. As my dream is to work at Amazon, I was really excited to begin building a system tailored to this role. I actually worked on a project yesterday as soon as I saw this posting, but I’ll write about that one tomorrow.
Understanding The Problem
Amazon’s SDE roles require engineers who can independently design scalable, distributed systems. I set out to build one that mimicked real-world complexity, inspired by Amazon S3.
System Highlights (Right Here — Quantified Results)
Handles over 100 requests per second
Achieves a 99% success rate across upload, retrieval, and delete operations
Features automated object lifecycle management based on access time
Designed to mirror core principles behind Amazon S3 and Glacier
System Architecture
So, after identifying what Amazon was looking for, I came up with an idea of a Distributed Object Storage System. I thought of this idea because it reminded me of Amazon S3, so I felt that it would be an appropriate project for this position. The next step was coming up with a tech stack. I already knew I wanted to use MySQL as the database (purely because it’s the relational database I’m most familiar with) to store metadata. I ultimately chose to use Spring Boot as the backend/logic for my endpoints because I felt that it gave me easier integration with MySQL.
Lifecycle and Node Design
Now that the tech stack was decided, I needed to decide how this system was going to be designed. I knew I wanted to mimic S3 Lifecycle Policies, so I came up with the idea of creating 3 classes as the managers for storage:
Storage Node - For immediate retrieval of data and persistent storage of data.
Backup Node - Higher capacity for storage, but data is stored in the MySQL database, and retrieval time is somewhat slower (similar to S3 Glacier)
Delete Node - Where deleted objects are sent. The user specifies which object they wish to delete, that data is removed from the node it’s currently stored in and sent to this node where it’s maintained for one additional day before it’s purged from storage completely.
Challenges and Solutions
The first issue I encountered was how object data was going to be stored. I realized that the classes: StorageNode, BackupNode, and DeleteNode couldn’t store both node information as well as all information about the objects in that node. This is when I came up with the idea to have a StorageNodeObject, a BackupNodeObject, and a DeleteNodeObject. That way, objects were associated with the correct node while also separating the list of objects from the node information.
API Endpoints
I then implemented the /upload, /retrieve, and /delete methods. The flow of an object after an upload went as follows (assuming it’s not accessed within this flow). Object is added to the Storage Node → 30 minutes go by → A method is triggered to move the object from the Storage Node to the Backup Node. The day is maintained in the Backup Node until accessed or deleted. If the data is accessed, it is moved to the Storage Node again.
Load Testing and Performance
After these endpoints were implemented and the backend was integrated with the database, I was ready to test, debug, and load test. After debugging, I wrote a script that sends upload, retrieve, and delete requests to the backend to ensure that the correct behavior was achieved. After seeing the system behave in the correct manner with different variations of data and requests, I was pleased to have completed such a scalable and efficient system.
Final Thoughts
This is one of the largest-scale systems I’ve built alone. Hopefully, next time I can build something bigger as part of a team.
Subscribe to my newsletter
Read articles from Mohamed Badawi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
