đź§  From Frustration to Automation: My Journey to Transcribing YouTube Member-Only Videos with Whisper and Google Colab

Introduction

There are several online tools that can generate transcripts simply by providing a YouTube video URL. These work well for public videos, making them a convenient solution in most cases. However, when it came to member-only videos, I quickly found that these tools fell short. That’s when I decided to get my hands dirty and build something custom that could handle authentication and give me full control over the process.

As a developer, I often rely on YouTube tutorials to learn new concepts. However, revisiting specific parts of these videos for revision can be time-consuming. I realized that having a transcript would make it easier to review and reinforce what I’d learned. This led me to explore ways to generate transcripts from YouTube videos—especially member-only content.


Phase 1: The Local Setup

Getting Started with yt-dlp and Whisper

My first attempt was to use yt-dlp to download the video audio and transcribe it using OpenAI's Whisper model. I started by installing the required packages:

pip install yt-dlp
pip install git+https://github.com/openai/whisper.git

Member-only videos posed an authentication challenge. Initially, I ran this quick line in the browser console:

console.log(document.cookie);

I saved the result into a cookies.txt file—but got this error:

ERROR: 'cookies.txt' does not look like a Netscape format cookies file

Turns out yt-dlp needs cookies in Netscape format. I solved this using the Cookie-Editor Chrome extension:

  • Open the YouTube video tab

  • Click the Cookie-Editor extension

  • Grant access

  • Click "Export" → cookies are copied to clipboard

  • Paste into cookies.txt

That did the trick.

FFmpeg Integration

Next, yt-dlp required FFmpeg for audio extraction. I downloaded FFmpeg from the official site and installed it. But VS Code's terminal still couldn’t recognize the ffmpeg command.

I fixed it by editing VS Code’s settings to include the FFmpeg binary path:

"terminal.integrated.env.windows": {
  "PATH": "C:\\ffmpeg\\bin;${env:PATH}"
}

System Configuration

All this was done on my Windows laptop:

  • Processor: Intel(R) Core(TM) Ultra 7 155H @ 3.80 GHz

  • RAM: 16 GB

Local Workflow Steps

  1. Add cookies.txt

  2. Use yt-dlp to download the audio

  3. Run Whisper locally to generate the transcript

Unfortunately, even after all that setup, my local machine could only transcribe 10 minutes of audio before slowing to a crawl. It was time for a better solution.


Phase 2: Moving to Google Colab

Google Colab, with its free access to NVIDIA Tesla T4 GPUs, seemed like a perfect solution—offering power, speed, and zero system strain.

Colab Setup

I began by installing Whisper, FFmpeg, and also the yt-dlp library which is crucial for downloading the audio:

!pip install yt-dlp
!pip install git+https://github.com/openai/whisper.git
!apt update && apt install -y ffmpeg

I uploaded the cookies.txt file:

from google.colab import files
uploaded = files.upload()

And then accessed it in another cell using:

cookies_file = list(uploaded.keys())[0]

Then, I used yt-dlp in a separate cell:

import yt_dlp

video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
audio_file = "downloaded_audio.mp3"

ydl_opts = {
    'format': 'bestaudio/best',
    'outtmpl': audio_file,
    'cookiefile': cookies_file,
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'mp3',
        'preferredquality': '192',
    }],
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download([video_url])

Transcription and Export

Whisper provides several models with different sizes and capabilities: tiny, base, small, medium, and large. Each one balances speed and accuracy differently. For this project, I selected the medium model because it provides significantly better transcription accuracy than the smaller models while still being lightweight enough to run comfortably on Colab’s T4 GPU.

In a single cell, I transcribed the audio and downloaded the transcript:

import whisper
from google.colab import files

model = whisper.load_model("medium")
result = model.transcribe("downloaded_audio.mp3")

with open("transcript.txt", "w", encoding="utf-8") as f:
    f.write(result["text"])

files.download("transcript.txt")

Performance Boost

The difference was stunning. While my local setup barely handled 10 minutes, Colab’s free Tesla T4 GPU breezed through the entire video in the same amount of time.

Whisper + Colab = absolute game-changer.


What’s Next

In the upcoming parts of this series, I plan to expand this project beyond just YouTube—potentially supporting other platforms as well. Each post will dig deeper into new layers of functionality, from automating repetitive steps to designing a simple UI for non-technical users.

This setup is working smoothly now, but I have plans to make it even better like:

  • 🚀 Automate cookie generation using Selenium (no Chrome extension required)

  • 🖥️ Add a UI with Streamlit for a more user-friendly experience


Final Thoughts

What started as a small idea—to make YouTube studying easier—quickly turned into a hands-on dev journey involving browser automation, audio processing, and GPU-accelerated transcription.

Now, I have a fast, reliable, and repeatable pipeline to generate transcripts from YouTube member-only videos—one that’s entirely powered by free tools and cloud resources.

More updates and experiments coming soon.

21
Subscribe to my newsletter

Read articles from Akash Khandelwal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akash Khandelwal
Akash Khandelwal