Building a Highly Scalable API for Bulk Data Insertion in Prisma and MongoDB

Hariharan ReddyHariharan Reddy
5 min read

Introduction

In this post, I’ll walk you through building a highly scalable API designed to insert large batches of data efficiently using Prisma and MongoDB. This setup allowed me to handle 100,000 entries, optimizing performance and ensuring stability.

I decided to write this blog because, during the development process, I initially tried using Prisma’s interactive transactions to ensure atomicity and consistency during batch insertions. However, I faced significant issues like transaction timeouts and deadlocks when handling large volumes of data, especially in MongoDB. This led me to rethink my approach and build custom logic to handle the data insertion efficiently, without relying on Prisma's transactions.

By the end of this post, you’ll understand:

  • How to build an API that handles batch insertion efficiently

  • Why Prisma’s built-in transactions might fail under certain conditions and how to overcome those challenges

Step 1: Designing the Data Schema

We first need a robust database schema that can handle different entities and their relationships efficiently. I’ve used Prisma to manage data modeling with MongoDB. Here's an example schema I used for managing projects and associated tasks:

model Project {
  id          String  @id @default(auto()) @map("_id") @db.ObjectId
  name        String  
  description String  
  status      ProjectStatus @default(PENDING)
  taskCount   Int     @default(0)
  ownerId     String  @db.ObjectId

  createdAt   DateTime  @default(now()) @map(name: "created_at")
  updatedAt   DateTime  @default(now()) @map(name: "updated_at")

  owner User @relation(fields: [ownerId], references: [id], onDelete: Cascade)

  Task Task[]

  @@map(name: "projects")
}

enum TaskStatus {
  PENDING
  IN_PROGRESS
  COMPLETED
  CANCELLED
}

model Task {
  id          String  @id @default(auto()) @map("_id") @db.ObjectId
  projectId   String  @db.ObjectId
  name        String  
  status      TaskStatus  @default(PENDING) 
  createdAt   DateTime  @default(now()) @map(name: "created_at")
  updatedAt   DateTime  @default(now()) @map(name: "updated_at")

  project Project @relation(fields: [projectId], references: [id], onDelete: Cascade)

  @@map(name: "tasks")
}

In this example:

Project: Represents a project, with fields like name, description, and status. Each project can have multiple associated tasks.

Task: Represents an individual task, belonging to a project, with fields such as name and status.

Step 2: API Logic for Bulk Data Insertion

The goal is to create an API endpoint that inserts a large number of tasks under a project. Here's a simplified version of how the API works:

  1. Input validation: We use the Zod library to validate the input payload.

  2. Batch insertion: The API is designed to insert tasks in batches of 10,000 to optimize performance.

  3. Error handling: If a failure occurs during the insertion, the entire project is rolled back.

import { TaskStatus, Prisma } from "@prisma/client"
import { getSession } from "some-auth-library"
import { z } from "zod"

import { db } from "@/lib/db"
import { CustomError, handleApiError } from "@/lib/utils"

const projectSchema = z.object({
  name: z.string().min(1, "Project name is required"),
  description: z.string().min(10, "Description must be at least 10 characters long"),
  tasks: z
    .array(z.object({ name: z.string().min(1, "Task name is required") }))
    .min(1, "At least one task is required.")
    .max(100000, "A maximum of 100,000 tasks can be added.")
});

const BATCH_SIZE = 10000;

export async function POST(req: Request) {
  const startTime = performance.now();
  let projectId = "";

  try {
    const body = await req.json();
    const { name, description, tasks } = projectSchema.parse(body);

    const session = await getSession();

    if (!session) {
      throw new CustomError("Unauthorized", 403);
    }

    const { user } = session;

    // Step 1: Create the project
    const project = await db.project.create({
      data: { name, description, ownerId: user.id }
    });

    projectId = project.id;

    const preparedTasks = tasks.map(task => ({
      projectId,
      name: task.name,
      status: TaskStatus.PENDING
    }));

    // Step 2: Insert tasks in batches with error handling
    for (let i = 0; i < preparedTasks.length; i += BATCH_SIZE) {
      const batch = preparedTasks.slice(i, i + BATCH_SIZE);
      try {
        await db.task.createMany({ data: batch });
      } catch (error) {
        console.error(`Batch insertion failed at index ${i}:`, error);

        // Rollback: Delete the project if batch insertion fails
        await db.project.delete({ where: { id: projectId } });
        throw new CustomError("Failed to insert tasks, project deleted.", 500);
      }
    }

    const endTime = performance.now();
    console.log(`Insertion completed in ${endTime - startTime} milliseconds`);

    return new Response(JSON.stringify({ success: true }), { status: 201 });
  } catch (error) {
    // Ensure project is deleted in case of any error
    if (projectId) {
      await db.project.delete({ where: { id: projectId } });
    }
    return handleApiError(error);
  }
}

Step 3: Testing and Optimizing for Performance

Inserting large amounts of data in one go can be risky. To prevent overload, we use batch processing—splitting the data into smaller chunks for insertion.

In this example, I’ve used a batch size of 10,000, but you can adjust it based on your system’s performance.

Conclusion

Initially, I considered using Prisma’s interactive transactions to handle the batch insertion and rollback logic in a cleaner, more atomic way. However, I encountered a few key challenges that led me to take a different approach:

  • Transaction Timeout Issues: Prisma's default transaction timeout is 5 seconds, and I quickly ran into the error:

    • "Transaction already closed: A query cannot be executed on an expired transaction." Even after increasing the transaction timeout to 20 seconds, I still encountered similar problems due to the time it took to process large batches of data. MongoDB's transaction timeout often caused the transaction to expire before completion.
  • Deadlocks and Write Conflicts: When I increased the timeout, Prisma transactions started failing with:

    • "Transaction failed due to a write conflict or a deadlock. Please retry your transaction." This error was common when inserting large amounts of data. MongoDB, being optimized for highly concurrent operations, isn’t always suited for long-running transactions, especially when a large number of write operations are involved.

Given these constraints, I had to move away from relying on Prisma’s built-in transaction management and implement a custom batch insertion strategy. This allowed me to:

  1. Process data in smaller chunks: By splitting the insertion into batches, I minimized the risk of transaction timeouts and deadlocks.

  2. Manual Rollback: Instead of Prisma's transaction rollback, I manually handled rollback in case of errors by deleting the project when a batch failed.

This solution may not have the atomic guarantees of interactive transactions, but it provided the flexibility needed to handle high volumes of data with minimal errors and much better performance.

In summary, while Prisma's transactions can work well for smaller operations, handling massive bulk insertions—especially with MongoDB—required a custom approach to overcome limitations like transaction timeouts and write conflicts.

0
Subscribe to my newsletter

Read articles from Hariharan Reddy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Hariharan Reddy
Hariharan Reddy