Scale file storage system

MUKUL JHAMUKUL JHA
4 min read

Where do we store any files?

The answer to the question isn't very simple. Let's deep dive into the question and


There are many ways to store the file.

  • Store files on the server.

    Local file system

    We can store files on the server disk like we store files on our laptop.
    A server is a machine like your PC or laptop so you can store files in the local file system. It works perfectly when you have a single server and fewer requests to serve.

    In case of increased requests overwhelming a single server, adding a server along with a load balancer can effectively manage the heightened demand, ensuring efficient handling of requests.
    Let's add one more server with the same configuration and a load balancer to route the request.

    Application with two servers

Let’s assume you are uploading two images: image_1.jpg and image_2.jpg

Request 1 -> UploadImage[image_1.jpg] and load balance route request to server 1 and server 1 stored file into disk or local storage.

Request 2-> UploadImage[image_2.jpg] and load balance route request to server 2 and server 2 stored file into disk or local storage.

Files are successfully stored on the server. Now let's try to download a file from the server.
Request 3 -> downloadImage[image_2.jpg] we know image_2 is stored inside the server 2.
Due to the load balancer algorithm request 3 may or may not be served by server 2.
if the request is served by server 2 then the client will be able to download the image.
if the request is served by server 1, there is no image_2.jpg in the server 1.

  • There are major drawbacks to storing files on the server.
    What if any server goes down or crashes,
    In case of a crash, you may lose the data. if the server goes down for some time, you may lose access to the file that has been stored on the server, and that will lead to data inconsistency.

We need centralized file stores to overcome these fallbacks.

2. Store files in the database

You can store files in the database. File can’t be directly stored in the database. You have to store file content.

@Entity
public class FileInfo{
  @Id  
  private String id;
  private String name;
  @Lob
  private byte[] content; // File content
 // getters and setters
}

Upload / Download Files using Spring Boot and MySQL.
https://www.javaguides.net/2019/09/spring-boot-file-upload-download-with-hibernate-mysql-database.html

There are drawbacks to using a database for file storage.

  • The latency to read/write files in a database is always higher than in a file system.

  • A high volume of data could be stored in the database and that will be expensive.

3. Distributed File System

In a distributed file system, data is stored as files and directories in a file system that spans multiple servers, and the file system metadata is managed by a centralized component, such as a file server or a cluster of servers. Clients can access the file system over a network and access and modify the files as if they were stored locally.

Examples of distributed file systems include Network File Systems (NFS), Server Message Blocks (SMB), and the Hadoop Distributed File System (HDFS). These systems are commonly used in enterprise environments, cloud computing, and big data processing.

https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

4. Distributed Object Storage System

Distributed object storage systems are a type of data storage system that stores and manages data as objects, rather than as files or blocks. In an object storage system, an object consists of a file and its associated metadata, which describes the properties and characteristics of the data.

Distributed Object Storage System

Some of the key benefits of a distributed object storage system include:

  1. Scalability: Object storage systems can scale horizontally, allowing administrators to add more storage nodes as needed to accommodate growing amounts of data.

  2. High availability: Object storage systems are designed to be highly available, with multiple copies of data stored across multiple nodes to ensure that data is always accessible.

  3. Durability: Object storage systems typically use erasure coding or other techniques to ensure that data is protected against failures, even if multiple nodes or disks fail.

  4. Cost-effectiveness: Object storage systems can be more cost-effective than traditional file or block storage systems, especially when storing large amounts of unstructured data.

Here is a list of some famous distributed object storage systems.

  • Amazon S3

  • Google Cloud Storage

  • Azure Blob Storage

  • IBM Cloud Object Storage

There are many advantages of object storage over a distributed file system.

  • Object storage systems are highly scalable as they distribute data across multiple servers and have no inherent hierarchy, whereas file storage may encounter limitations in scalability due to its hierarchical structure.

  • Object storage employs APIs for access, making it more suitable for cloud-based and web-scale applications. File storage utilizes network protocols for access and is more commonly used in traditional file systems.

  • Object storage is easy to integrate/use but file storage requires manpower to set up a distributed file system.

0
Subscribe to my newsletter

Read articles from MUKUL JHA directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

MUKUL JHA
MUKUL JHA