đź§ From Frustration to Automation: My Journey to Transcribing YouTube Member-Only Videos with Whisper and Google Colab

Introduction
There are several online tools that can generate transcripts simply by providing a YouTube video URL. These work well for public videos, making them a convenient solution in most cases. However, when it came to member-only videos, I quickly found that these tools fell short. That’s when I decided to get my hands dirty and build something custom that could handle authentication and give me full control over the process.
As a developer, I often rely on YouTube tutorials to learn new concepts. However, revisiting specific parts of these videos for revision can be time-consuming. I realized that having a transcript would make it easier to review and reinforce what I’d learned. This led me to explore ways to generate transcripts from YouTube videos—especially member-only content.
Phase 1: The Local Setup
Getting Started with yt-dlp
and Whisper
My first attempt was to use yt-dlp
to download the video audio and transcribe it using OpenAI's Whisper model. I started by installing the required packages:
pip install yt-dlp
pip install git+https://github.com/openai/whisper.git
Cookie Handling Woes
Member-only videos posed an authentication challenge. Initially, I ran this quick line in the browser console:
console.log(document.cookie);
I saved the result into a cookies.txt
file—but got this error:
ERROR: 'cookies.txt' does not look like a Netscape format cookies file
Turns out yt-dlp
needs cookies in Netscape format. I solved this using the Cookie-Editor Chrome extension:
Open the YouTube video tab
Click the Cookie-Editor extension
Grant access
Click "Export" → cookies are copied to clipboard
Paste into
cookies.txt
That did the trick.
FFmpeg Integration
Next, yt-dlp
required FFmpeg for audio extraction. I downloaded FFmpeg from the official site and installed it. But VS Code's terminal still couldn’t recognize the ffmpeg
command.
I fixed it by editing VS Code’s settings to include the FFmpeg binary path:
"terminal.integrated.env.windows": {
"PATH": "C:\\ffmpeg\\bin;${env:PATH}"
}
System Configuration
All this was done on my Windows laptop:
Processor: Intel(R) Core(TM) Ultra 7 155H @ 3.80 GHz
RAM: 16 GB
Local Workflow Steps
Add
cookies.txt
Use
yt-dlp
to download the audioRun Whisper locally to generate the transcript
Unfortunately, even after all that setup, my local machine could only transcribe 10 minutes of audio before slowing to a crawl. It was time for a better solution.
Phase 2: Moving to Google Colab
Google Colab, with its free access to NVIDIA Tesla T4 GPUs, seemed like a perfect solution—offering power, speed, and zero system strain.
Colab Setup
I began by installing Whisper, FFmpeg, and also the yt-dlp
library which is crucial for downloading the audio:
!pip install yt-dlp
!pip install git+https://github.com/openai/whisper.git
!apt update && apt install -y ffmpeg
I uploaded the cookies.txt
file:
from google.colab import files
uploaded = files.upload()
And then accessed it in another cell using:
cookies_file = list(uploaded.keys())[0]
Then, I used yt-dlp
in a separate cell:
import yt_dlp
video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
audio_file = "downloaded_audio.mp3"
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': audio_file,
'cookiefile': cookies_file,
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([video_url])
Transcription and Export
Whisper provides several models with different sizes and capabilities: tiny
, base
, small
, medium
, and large
. Each one balances speed and accuracy differently. For this project, I selected the medium model because it provides significantly better transcription accuracy than the smaller models while still being lightweight enough to run comfortably on Colab’s T4 GPU.
In a single cell, I transcribed the audio and downloaded the transcript:
import whisper
from google.colab import files
model = whisper.load_model("medium")
result = model.transcribe("downloaded_audio.mp3")
with open("transcript.txt", "w", encoding="utf-8") as f:
f.write(result["text"])
files.download("transcript.txt")
Performance Boost
The difference was stunning. While my local setup barely handled 10 minutes, Colab’s free Tesla T4 GPU breezed through the entire video in the same amount of time.
Whisper + Colab = absolute game-changer.
What’s Next
In the upcoming parts of this series, I plan to expand this project beyond just YouTube—potentially supporting other platforms as well. Each post will dig deeper into new layers of functionality, from automating repetitive steps to designing a simple UI for non-technical users.
This setup is working smoothly now, but I have plans to make it even better like:
🚀 Automate cookie generation using Selenium (no Chrome extension required)
🖥️ Add a UI with Streamlit for a more user-friendly experience
Final Thoughts
What started as a small idea—to make YouTube studying easier—quickly turned into a hands-on dev journey involving browser automation, audio processing, and GPU-accelerated transcription.
Now, I have a fast, reliable, and repeatable pipeline to generate transcripts from YouTube member-only videos—one that’s entirely powered by free tools and cloud resources.
More updates and experiments coming soon.
Subscribe to my newsletter
Read articles from Akash Khandelwal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
