Building a Subtitle Service for Your App Using AWS Transcribe

Introduction

Over the past two years, I've worked on three different video learning platforms. A recurring requirement across these projects has been subtitles, regardless of the target audience, tech stack, or business goal. Subtitles are essential for improving accessibility, accommodating users in sound-sensitive environments, and enhancing comprehension.

However, integrating subtitles at scale isn't as straightforward as toggling a switch. You need a system that can reliably handle transcription, process different media formats, and keep the architecture maintainable.

In this article, I’ll walk you through a reusable subtitle service architecture built using Amazon Transcribe. By the end, you’ll know how to:

  • Automatically transcribe video/audio content using Amazon Transcribe

  • Store and serve subtitle files securely from S3

  • Build a reusable subtitle service you can plug into any of your applications

Whether you're developing an e-learning app, a video streaming platform, or something in between, this approach will save you time and improve the user experience with minimal effort.

Understanding AWS Transcribe and the Project Goal

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it easy to add real-time or batch transcription to your applications. It’s capable of converting speech from audio or video files into text, supporting a wide range of languages and input formats such as .mp3, .mp4, and more.

Transcribe outputs its results in a structured .json format, and with a little processing, you can convert this into common subtitle formats such as .vtt or .srt. This flexibility makes it a solid choice for building custom subtitle pipelines.

The goal of this article is to build a Node.js-based subtitle service that does the following:

  • Accepts a video or audio file URL stored in Amazon S3

  • Uses AWS Transcribe to generate a transcript

  • Converts the transcription into a .vtt subtitle file

  • Uploads the subtitle file back to S3 with public read access

  • Returns a public URL for use in video players

This modular flow ensures reusability, making it easy to integrate into any application that handles media playback

Setting Up Your AWS Environment

Before diving into code, you need to set up a few AWS resources. These are essential for securely storing your media files, triggering transcription jobs, and handling the resulting subtitle files.

  1. S3 Bucket Setup

    Create an S3 bucket where you’ll:

    • Upload input media files (video/audio)

    • Store output subtitle files (.vtt or .srt)

Important: The subtitle service we'll build will work even if your input media files are stored in a different bucket or even a different region. We’ll show this in a later section.

Suggested folder structure:

your-subtitle-service-bucket/
├── inputs/
│   └── my-video.mp4
└── outputs/
    └── my-video.vtt

Bucket Policy

You’ll need to attach a bucket policy to:

  • Allow Amazon Transcribe to write subtitle files to your bucket

  • Optionally allow public read access to subtitle files (or serve them via signed URLs)

Sample Bucket Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::transcription-subtitles-files/*"
        },
        {
            "Sid": "AllowTranscribePutObject",
            "Effect": "Allow",
            "Principal": {
                "Service": "transcribe.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::transcription-subtitles-files/*"
        }
    ]
}

This policy permits Amazon Transcribe to write the output and optionally expose the files publicly (which you can adjust based on your use case).

CORS Configuration

If you plan to use the subtitle files in a frontend app (e.g., loading them into an HTML5 <video> tag), you’ll also need to enable CORS (Cross-Origin Resource Sharing) on your bucket. Without this, the browser will block requests from your frontend.

Sample CORS Configuration:

[
    {
        "AllowedHeaders": ["*"],
        "AllowedMethods": ["HEAD", "GET", "PUT", "POST", "DELETE"],
        "AllowedOrigins": ["*"],
        "ExposeHeaders": ["ETag"]
    }
]

This configuration allows your frontend (from any domain) to fetch subtitle files without CORS errors. You can adjust AllowedOrigins to restrict access to your app's domain for more control.

  1. IAM Role

    You’ll also need an IAM role or user with the right permissions to start transcription jobs and access media in S3.

    Required Permissions:

    • transcribe:StartTranscriptionJob

    • transcribe:GetTranscriptionJob

    • s3:GetObject – to fetch media

    • s3:PutObject – to store subtitle files

Minimal IAM Policy Example:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:PutObject"
          ],
          "Resource": "arn:aws:s3:::your-subtitle-service-bucket/*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "transcribe:StartTranscriptionJob",
            "transcribe:GetTranscriptionJob"
          ],
          "Resource": "*"
        }
      ]
    }

Attach this to the process that triggers transcription, such as a backend service, Lambda function, or container task.

  1. Region Considerations

    Amazon Transcribe is not available in all regions. For best results:

    • Choose a supported region like us-east-1, us-west-2, or eu-west-1

    • Ensure your S3 buckets and transcription jobs are in the same region when possible to reduce latency and avoid cross-region errors

Writing the Transcription Service in Node.js

Let’s now build the core logic of our subtitle service using Node.js. The service will accept a media file (hosted in any S3 bucket or region), transcribe it using Amazon Transcribe, convert the result to .vtt, and upload it back to your configured bucket.

We’ll start by looking at the complete code and then explain each part step-by-step.

Full Code: transcriptionService.mjs

// transcriptionService.mjs

import {
  TranscribeClient,
  StartTranscriptionJobCommand,
  GetTranscriptionJobCommand,
} from "@aws-sdk/client-transcribe";
import {
  S3Client,
  PutObjectCommand,
  CopyObjectCommand,
  DeleteObjectCommand,
} from "@aws-sdk/client-s3";
import axios from "axios";
import vttConvert from "aws-transcription-to-vtt";
import { v4 as uuidv4 } from "uuid";

const REGION = "us-west-1"; // Change to your AWS Region
const S3_BUCKET = "transcription-subtitles-files"; // Bucket for storing VTT files and temporary videos
const S3_BASE_URL = `https://${S3_BUCKET}.s3.${REGION}.amazonaws.com`;

const transcribeClient = new TranscribeClient({ region: REGION });
const s3Client = new S3Client({ region: REGION });

async function transcribeAndGenerateSubtitle(videoS3Url) {
  let mediaFileUri = videoS3Url;
  let copiedKey = null;

  // Check if the video S3 URL matches our region bucket
  const { hostname, pathname } = new URL(videoS3Url);
  const bucketRegion = hostname.split(".")[2]; // extract region part like "us-east-1"

  if (bucketRegion !== REGION) {
    console.log(`[i] Video is from different region (${bucketRegion}), copying to correct region...`);

    // Copy the object into our transcription bucket under transcribed-video/
    const sourceBucket = hostname.split(".")[0]; // get bucket name
    const sourceKey = decodeURIComponent(pathname.slice(1)); // remove leading "/"

    copiedKey = `transcribed-video/${uuidv4()}-${sourceKey.split("/").pop()}`; // Create unique path

    await s3Client.send(new CopyObjectCommand({
      CopySource: `/${sourceBucket}/${sourceKey}`,
      Bucket: S3_BUCKET,
      Key: copiedKey,
    }));

    console.log(`[+] Copied video for transcription: ${copiedKey}`);
    mediaFileUri = `${S3_BASE_URL}/${copiedKey}`;
  }

  const jobId = `transcription-job-${uuidv4()}`;

  const startParams = {
    TranscriptionJobName: jobId,
    LanguageCode: "en-US",
    MediaFormat: "mp4",
    Media: {
      MediaFileUri: mediaFileUri,
    },
    OutputBucketName: S3_BUCKET,
  };

  // Start transcription job
  await transcribeClient.send(new StartTranscriptionJobCommand(startParams));
  console.log(`[+] Started transcription job: ${jobId}`);

  // Polling transcription job status
  let completed = false;
  let transcriptFileUri = '';
  while (!completed) {
    await new Promise((r) => setTimeout(r, 5000)); // wait 5 seconds
    const { TranscriptionJob } = await transcribeClient.send(
      new GetTranscriptionJobCommand({ TranscriptionJobName: jobId })
    );

    const status = TranscriptionJob.TranscriptionJobStatus;
    console.log(`[i] Job status: ${status}`);

    if (status === "COMPLETED") {
      completed = true;
      transcriptFileUri = TranscriptionJob.Transcript.TranscriptFileUri;
    } else if (status === "FAILED") {
      throw new Error(`Transcription job failed: ${TranscriptionJob.FailureReason}`);
    }
  }

  // Download transcription JSON
  const response = await axios.get(transcriptFileUri);
  const transcriptionJson = response.data;

  // Convert transcription to VTT
  const vttData = vttConvert(transcriptionJson);

  // Upload VTT file back to S3
  const vttKey = `subtitles/${jobId}.vtt`;

  await s3Client.send(new PutObjectCommand({
    Bucket: S3_BUCKET,
    Key: vttKey,
    Body: vttData,
    ContentType: "text/vtt",
  }));

  const vttUrl = `${S3_BASE_URL}/${vttKey}`;
  console.log(`[+] Subtitle uploaded: ${vttUrl}`);

  // delete the copied video if we created one
  if (copiedKey) {
    try {
      await s3Client.send(new DeleteObjectCommand({
        Bucket: S3_BUCKET,
        Key: copiedKey,
      }));
      console.log(`[i] Cleaned up temporary copied video: ${copiedKey}`);
    } catch (cleanupError) {
      console.warn(`[!] Failed to delete temporary video (safe to ignore):`, cleanupError.message);
    }
  }

  return vttUrl;
}

export { transcribeAndGenerateSubtitle };

Code Walkthrough

  1. Setup AWS Clients

     const transcribeClient = new TranscribeClient({ region: REGION });
     const s3Client = new S3Client({ region: REGION });
    
  2. Handle Cross-Region Video Files

    If the input video is in a different region, we copy it into our main S3 bucket to avoid region mismatch errors.

     const { hostname, pathname } = new URL(videoS3Url);
     const bucketRegion = hostname.split(".")[2]; // extract region part like "us-east-1"
    
     if (bucketRegion !== REGION) {
       // Copy logic here...
     }
    
  3. Start Transcription Job

    We generate a unique job ID, configure the job with language and format, and send the request.

     const jobId = `transcription-job-${uuidv4()}`;
     const startParams = { ... };
     await transcribeClient.send(new StartTranscriptionJobCommand(startParams));
    
  4. Poll Until Completion

    We periodically check the job status every 5 seconds until it completes or fails.

     while (!completed) {
         await new Promise((r) => setTimeout(r, 5000)); // wait 5 seconds
         const { TranscriptionJob } = await transcribeClient.send(...);
    
         const status = TranscriptionJob.TranscriptionJobStatus;
    
         if (status === "COMPLETED") {
           completed = true;
           transcriptFileUri = TranscriptionJob.Transcript.TranscriptFileUri;
         } else if (status === "FAILED") {
           throw new Error(`Transcription job failed: ${TranscriptionJob.FailureReason}`);
         }
       }
    
  5. Convert to .vtt Format

    Once the transcription is ready, we fetch the JSON, convert it to VTT using the aws-transcription-to-vtt package, and upload it to S3.

     const vttData = vttConvert(transcriptionJson);
     await s3Client.send(new PutObjectCommand({ ... }));
    
  6. Clean Up Temporary Files

    If we copied the video earlier, we clean it up at the end.

     if (copiedKey) {
       await s3Client.send(new DeleteObjectCommand({ ... }));
     }
    

Design Choices That Make This Reusable

  • Accepts any media URI: Works across buckets and regions by copying to a known location, enabling use across teams or environments.

  • Job name namespacing: Each job uses a unique identifier (job-userId-timestamp) to prevent collisions, especially in high-concurrency environments.

  • Formats and language are parameterized: The design can easily support multilingual subtitles or alternate media types by tweaking just a few parameters.

  • Scoped resource access: All outputs and temporary assets are stored in a predictable subtitles/ and transcribed-video/ S3 path, making organization and lifecycle management easier.

  • Idempotent and modular logic: Small, composable functions enable wrapping into different execution models like Lambda functions, HTTP endpoints, or background queues.

  • Automatic cleanup of temporary files: Temporary copies are deleted after use to minimize cost and clutter.

Usage and Integration in Applications

Now that we have a reusable transcription service, let’s look at how to integrate it into real-world applications. This service is flexible enough to be used inside:

  • A backend API route (e.g., Express or Fastify)

  • A serverless Lambda function

  • A CLI tool or job processor

Sample Usage: Calling the Service

import { transcribeAndGenerateSubtitle } from "./transcriptionService.mjs";

const videoUrl = "https://some-bucket.s3.us-east-1.amazonaws.com/uploads/my-video.mp4";

transcribeAndGenerateSubtitle(videoUrl)
  .then((subtitleUrl) => {
    console.log("Subtitle available at:", subtitleUrl);
  })
  .catch((err) => {
    console.error("Failed to generate subtitles:", err.message);
  });

Once the function completes, it returns a URL pointing to the .vtt subtitle file stored in your configured S3 bucket. For example:

https://transcription-subtitles-files.s3.us-west-1.amazonaws.com/subtitles/transcription-job-2cdbfd78-a3cc-47f3-a414-dcb27ac163c9.vtt

Using the Subtitle in HTML Video

You can plug the subtitle URL directly into an HTML <video> tag using the <track> element:

<video controls crossorigin="anonymous" width="640">
  <source src="https://some-bucket.s3.amazonaws.com/uploads/my-video.mp4" type="video/mp4" />
  <track
    src="https://transcription-subtitles-files.s3.us-west-1.amazonaws.com/subtitles/transcription-job-2cdbfd78-a3cc-47f3-a414-dcb27ac163c9.vtt"
    kind="subtitles"
    srclang="en"
    label="English"
    default
  />
</video>

This will show a “CC” button that lets users toggle subtitles on/off in supported browsers, no extra libraries are required.

Gotchas & Best Practices

Before you ship this transcription service in production, here are a few key considerations and pitfalls to be aware of:

Region Mismatches

Amazon Transcribe can only access videos in S3 buckets in the same region as the transcription job. As implemented in our code, if the input video is hosted in a different region, we automatically copy it to the bucket in the target region before starting the transcription job. This ensures compatibility without requiring upstream changes to the source video hosting.

Transcribe Limitations

Amazon Transcribe has a few important limitations:

  • Maximum audio/video length: 4 hours

  • Maximum file size: 2 GB

Files exceeding these limits will cause the job to fail.

Tip: Validate media files ahead of time and reject or trim them before attempting transcription.

Conclusion

In this guide, we built a complete transcription service that:

  • Accepts any S3-hosted video file, even across regions.

  • Automatically transcribes it using Amazon Transcribe.

  • Converts the result into .vtt subtitle format.

  • Uploads the subtitle to a central S3 bucket for easy access.

  • Cleans up temporary files to keep things tidy.

The core function transcribeAndGenerateSubtitle(videoS3Url) handles the full lifecycle. It’s modular, serverless-friendly, and designed to plug into a web app, job queue, or CLI.

You can also extend the solution in the following directions:

  • Multi-language support: Pass LanguageCode dynamically to support users in multiple locales.

  • Real-time transcription: Consider integrating Amazon Transcribe Streaming for live captioning in calls or webinars.

You can find the complete code on GitHub: https://github.com/poly4concept/transcription-service

Thank you for reading

You can follow me on LinkedIn and subscribe to my YouTube Channel, where I share more valuable content. Also, Let me know your thoughts in the comment section

1
Subscribe to my newsletter

Read articles from Alhazan Mubarak Olajuwon directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alhazan Mubarak Olajuwon
Alhazan Mubarak Olajuwon