Chunked Uploads: A Step-by-Step Guide

"Size actually matters in web"

Uploading Large files at once from client to server is both risky as well as bandwidth heavy. Risky in a sense that if suddenly during the upload the internet connection goes down, the whole upload is to be restarted which is also the wastage of bandwidth. In programming world, anything that is too big is handled by breaking it down into smaller sub-parts. We can take example from the concept of algorithm which is Divide and Conquer which means bigger problem is to be divided into smaller and each smaller sub-problems is to be solved in bottom up approach to finally solve the bigger problem as a whole.

The alternative approach to this is chunking or breaking down large file into smaller chunks, uploading each chunk one by one, saving those chunk and keeping track of order of chunks and finally bringing all those chunks together in order to combine into original file. This approach has majorly following advantages:

Chunks uploaded before the connection loss can be tracked later when same file is uploaded. (to be explored later)
In case of video, the chunks can be kept for later on to be used for video streaming where instead of sending whole video at once, it could be sent in chunks. (to be explored later)

The above two advantages are not fully discussed in this article but will be later on explored in some other blogs in future. This article mainly focuses on basic setup for chunk upload that has main feature as uploading file in chunks by dividing the file into appropriate size chunks, uploading them and finally assembling them in order to create complete file.

This chunk uploading has major three steps:

Creating an upload session
Uploading each chunks of file and keeping track of it in opened upload session
Commiting the upload and validating if all chunks of file is uploaded successfully or not
- if sucessfully uploaded then assembling chunks into a single file
- otherwise returning error assembling response.

Our server has mainly three endpoints:

const upload = multer({ storage: multer.memoryStorage() });

//creating upload session route
router.post("/upload-session", uploadSessionController);

//uploading chunk route
router.post(
  "/upload-chunk/:uploadId",
  upload.single("chunk"),
  uploadChunkController
);

//confirming all chunks upload route
router.post("/upload-commit/:uploadId", uploadCommitController);

Creating Upload session:

async (req, res, next) => {
    // get filename and totalsize of file to be uploaded
    const { filename, totalSize } = req.body;
    // generate unique identifier for a session
    const uploadId = generateRandomUUID();
    // create new session
    const newSession = new UploadSession({
      filename,
      totalSize,
      uploadId,
      uploadedChunks: [],
    });
    await newSession.save();
    // return uploadId which is a session Id for uploading
    // all chunks of current file
    return res
      .status(200)
      .json(new ApiResponse(200, { uploadId }, "Upload session established"));
  }

Uploading chunk with chunk Number to keep track of order of chunks later on to assemble:

async (req, res, next) => {
    const { uploadId } = req.params;
    const { chunkNumber } = req.body;
    const chunkData = req.file.buffer;

    console.log(uploadId, chunkNumber, req.file, chunkData);

    try {
      const session = await UploadSession.findOne({
        uploadId: { $eq: uploadId },
      });

      if (!session)
        return res
          .status(404)
          .json(new ApiResponse(404, null, "Upload session not found"));

      const filename = generateTempFilename(uploadId, chunkNumber);
      const filePath = path.join(__dirname, "/uploads/", filename);
      await fs.promises.writeFile(`./uploads/tmp/${filename}`, chunkData);
      session.uploadedChunks.push(chunkNumber);
      await session.save();
      return res
        .status(200)
        .json(new ApiResponse(200, null, "Chunk uploaded successfully"));
    } catch (err) {
      console.log(err);
      return res
        .status(400)
        .json(new ApiResponse(400, null, "Error occured uploading a chunk"));
    }
  }

Commiting and validating upload of all chunks and finally reassembling all the uploaded chunks:

  async (req, res, next) => {
    const { uploadId } = req.params;

    const chunkSize = 30000;

    try {
      const session = await UploadSession.findOne({
        uploadId: {
          $eq: uploadId,
        },
      });

      if (!session)
        return res
          .status(404)
          .json(new ApiResponse(404, null, "Upload session not found"));

      const expectedChunks = Math.ceil(session.totalSize / chunkSize);
      const isUploadComplete =
        session.uploadedChunks.length === expectedChunks &&
        session.uploadedChunks.every((chunk) => chunk >= 0);

      if (!isUploadComplete) {
        return res
          .status(400)
          .json(new ApiResponse(400, null, "Missing chunk while uploading"));
      }

      const finalFilePath = path.join("./uploads", session.filename);
      const writeStream = fs.createWriteStream(finalFilePath);

      for (let chunkNumber = 0; chunkNumber < expectedChunks; chunkNumber++) {
        const tempFileName = generateTempFilename(uploadId, chunkNumber);
        const tempFilePath = path.join("./uploads/tmp/", tempFileName);
        try {
          const chunkData = await fs.promises.readFile(tempFilePath);
          writeStream.write(chunkData);
        } catch (error) {
          console.error(error);
          await fs.promises.unlink(finalFilePath); // Delete partially assembled file on error
          return res
            .status(500)
            .json(new ApiResponse(500, null, "Error assembling file"));
        } finally {
          await fs.promises.unlink(tempFilePath); // Cleanup temporary chunk file
        }
        writeStream.on("finish", async () => {
          console.log("write finished");
          writeStream.end(); // Close write stream
          await UploadSession.findByIdAndDelete(session._id);
          res
            .status(200)
            .json(new ApiResponse(200, null, "File Uploaded successfully"));
            // here after successfull file upload, the path of the uploaded file 
            // can be returned to be used in frontend side
        });
      }
    } catch (err) {
      res.status(500).json(new ApiResponse(500, null, "Error occured"));
    }
    return res;
  }

These are all the api endpoints needed for successfull chunk upload and reassemble of those chunks. You can implement this in frontend and it will be different for different frontend tools you would use. But the basic idea is the same three steps.

For testing of this api and see if the file chunk uploads and file assembling work, you can test this using postman for now. If you wanna do that, you can follow below steps:

First thing is you would need chunks of a single file. That you can do using any online tool. If you are a linux user then you can using simple command line tool as:

~ ls -l resume_macad.pdf 
-rw-r--r-- 1 macad macad 86256 May 30 21:42 resume_macad.pdf

# this shows that i have a file with name resume_macad.pdf with size 
# 86256 in bytes

~ split -b 30000 resume_macad.pdf part # command to split file
# -b means bytes that is 30000bytes

~ ls #list all files
partaa  partab  partac  resume_macad.pdf
# parta(a,b,c) are parts of resume_macad.pdf each having max size of 30000bytes

Create an upload session:

// the is the payload of /upload-session
{
    "filename":"resume_macad.pdf",
    "totalSize":86256 
}

// this returns response
{
    "uploadId": "817cc9dc-8079-475a-a349-2f1d76a55778"
}

Create all three chunks as:

// /upload/{uploadId}
// the is the payload of /upload-chunk/817cc9dc-8079-475a-a349-2f1d76a55778
//first chunk
{
    "chunkNumber":0,
    "chunk": partaa 
}

//second chunk
{
    "chunkNumber":1,
    "chunk": partab 
}

//second chunk
{
    "chunkNumber":2,
    "chunk": partac 
}

Commit the chunk uploads:

// /upload-commit/{uploadId}
// the is the payload of /upload-commit/817cc9dc-8079-475a-a349-2f1d76a55778

// this will check if all chunks are uploaded
// assembles all the chunks as one if successfull

// before comitting
/tmp/102d6429-e8c6-4270-aa4e-cd3b931aa1f0_0 // chunk 0
/tmp/102d6429-e8c6-4270-aa4e-cd3b931aa1f0_1 // chunk 1
/tmp/102d6429-e8c6-4270-aa4e-cd3b931aa1f0_2 // chunk 2

// after comitting
/uploads/resume_macad.pdf 
// reassembles all chunks into a file with original filename

This is the basic implementation of chunk upload. This can be used for any type of file. This can be further enhanced by using these chunks for video streaming in case of video and many more. You can explore more and also wait for my future enhancement of this api and its use case in real application.

Till then stay tuned Macad

Github, linkedin, portfolio

Breaking It Down: A Step-by-Step Guide to Chunked Uploads

Subscribe to my newsletter

Macad

Macad