Automate M4A File Transcriptions with Python and OpenAI Whisper

Karl BolingerKarl Bolinger
5 min read

Transcribing audio files can be a tedious task, especially if the files are long or if you have multiple files to process. To tackle this challenge, I developed a Python-based application that automates the transcription of M4A files using the OpenAI Whisper model. This blog post will walk you through the project, highlighting its key components and functionality.

Project Overview

The project consists of several Python scripts:

Together, these scripts form a comprehensive solution for splitting M4A files into manageable segments, transcribing the audio, and presenting the results in a user-friendly GUI. Let's dive into the details of each component.

1. setup.py

The setup.py script is responsible for setting up the application using py2app, a Python setup tool for creating standalone macOS applications.

from setuptools import setup

APP = ['app.py']
DATA_FILES = []
OPTIONS = {
    'argv_emulation': True,
    'packages': ['pydub', 'argparse', 'tkinter'],
    'includes': ['tkinter']
}

setup(
    app=APP,
    data_files=DATA_FILES,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
)

2. app.py

The app.py script is the core of the application, containing the GUI implementation and the main logic for processing M4A files. The application uses tkinter for the GUI and integrates OpenAI's API for transcription.

import tkinter as tk
from tkinter import filedialog, messagebox, ttk
import os
import threading
from split_m4a import split_m4a
import subprocess
import openai

# Initialize OpenAI client
openai.api_key = os.getenv("OPENAI_API_KEY")

def transcribe_file(file_path):
    command = [
        'curl',
        '--request', 'POST',
        '--url', 'https://api.openai.com/v1/audio/transcriptions',
        '--header', f'Authorization: Bearer {openai.api_key}',
        '--header', 'Content-Type: multipart/form-data',
        '--form', f'file=@{file_path}',
        '--form', 'model=whisper-1',
        '--form', 'language="en"'
    ]
    result = subprocess.run(command, capture_output=True, text=True)
    return result.stdout

def update_status(progress_bar, status_label, current, total, message):
    progress_bar['value'] = (current / total) * 100
    status_label.config(text=f"{message} ({current}/{total})")

def process_file(progress_bar, status_label):
    file_path = filedialog.askopenfilename(filetypes=[("M4A files", "*.m4a")])
    if not file_path:
        return

    def process():
        try:
            base_name = os.path.splitext(os.path.basename(file_path))[0]
            output_directory = os.path.join(os.path.dirname(file_path), base_name)
            os.makedirs(output_directory, exist_ok=True)

            update_status(progress_bar, status_label, 0, 1, "Splitting audio")
            segments = split_m4a(file_path, output_directory, overlap_seconds=10)

            total_segments = len(segments)
            transcript = ""
            for i, segment in enumerate(segments):
                update_status(progress_bar, status_label, i + 1, total_segments, "Transcribing segments")
                transcript += transcribe_file(segment)

            transcript_path = os.path.join(output_directory, base_name + ".txt")
            with open(transcript_path, "w") as f:
                f.write(transcript)

            update_status(progress_bar, status_label, total_segments, total_segments, "Completed")
            messagebox.showinfo("Success", f"Transcription saved to {transcript_path}")
        except Exception as e:
            messagebox.showerror("Error", str(e))

    threading.Thread(target=process).start()

def create_gui():
    root = tk.Tk()
    root.title(".m4a Transcriber")

    frame = tk.Frame(root, padx=20, pady=20)
    frame.pack(padx=10, pady=10)

    label = tk.Label(frame, text="Select an .m4a file to transcribe:")
    label.pack(pady=5)

    button = tk.Button(frame, text="Select File", command=lambda: process_file(progress_bar, status_label))
    button.pack(pady=5)

    progress_bar = ttk.Progressbar(frame, orient="horizontal", length=300, mode="determinate")
    progress_bar.pack(pady=10)

    status_label = tk.Label(frame, text="Status: Waiting for file selection")
    status_label.pack(pady=5)

    root.mainloop()

if __name__ == "__main__":
    create_gui()

3. split_m4a.py

The split_m4a.py script handles splitting the M4A files into smaller segments to make the transcription process more manageable. This is particularly useful for long audio files.

from pydub import AudioSegment
import os

def split_m4a(file_path, output_directory, segment_duration=30000, overlap_seconds=10):
    audio = AudioSegment.from_file(file_path, format="m4a")
    segment_length = segment_duration - (overlap_seconds * 1000)
    segments = []

    for i in range(0, len(audio), segment_length):
        segment = audio[i:i + segment_duration]
        segment_path = os.path.join(output_directory, f"segment_{i // segment_length}.m4a")
        segment.export(segment_path, format="m4a")
        segments.append(segment_path)

    return segments

4. setup_transcriber.py

The setup_transcriber.py script is a setup script for preparing the environment, installing necessary packages, and running the application.

import subprocess

# Define the content of the setup.py script
setup_content = """
from setuptools import setup

APP = ['app.py']
DATA_FILES = []
OPTIONS = {
    'argv_emulation': True,
    'packages': ['pydub', 'argparse', 'tkinter', 'subprocess', 'threading', 'openai'],
}

setup(
    app=APP,
    data_files=DATA_FILES,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
)
"""

# Create the Python scripts
with open('split_m4a.py', 'w') as f:
    f.write(split_m4a_content)

with open('app.py', 'w') as f:
    f.write(app_content)

with open('setup.py', 'w') as f:
    f.write(setup_content)

# Install required packages
subprocess.run(['pip', 'install', 'pydub', 'argparse', 'tkinter', 'py2app', 'openai'])

# Run py2app to create the macOS app
subprocess.run(['python', 'setup.py', 'py2app'])

How It Works

  1. File Selection: The user selects an M4A file through a graphical interface.

  2. Audio Splitting: The selected file is split into smaller segments to ease the transcription process.

  3. Transcription: Each segment is sent to the OpenAI API for transcription using the Whisper model.

  4. Progress Tracking: The application provides real-time feedback on the transcription progress.

  5. Output: The complete transcription is saved as a text file in the same directory as the original M4A file.

Step-by-Step Guide to Create a Dockable macOS Application for a Python Script

Step 1: Prepare Your Environment

  1. Ensure Python and Dependencies are Installed:

    • Open Terminal.

    • Activate your virtual environment:

    source /Users/karl.bolinger/myenv/bin/activate
  • Install required packages:
    pip install pydub

Step 2: Create a Wrapper Script

  1. Create the Wrapper Script:

    • In Terminal, navigate to the directory containing your app.py script:
    cd /Users/karl.bolinger/Documents
  • Create the run_app.sh script:
    touch run_app.sh
  • Edit run_app.sh to include the following:
    #!/bin/bash
    source /Users/karl.bolinger/myenv/bin/activate
    /opt/homebrew/bin/python3 /Users/karl.bolinger/Documents/app.py
  • Make the script executable:
    chmod +x run_app.sh

Step 3: Create an Automator Application

  1. Open Automator:

    • Open Automator from the Applications folder or use Spotlight search.
  2. Create a New Application:

    • Choose "New Document" and select "Application".
  3. Add a Run Shell Script Action:

    • In the search bar, type "Run Shell Script" and drag the action into the workflow pane.

    • Set the shell to /bin/bash.

  4. Enter the Shell Script:

    • Enter the following script to run your run_app.sh script:
    /Users/karl.bolinger/Documents/run_app.sh
  1. **Save the

Automator Application**: - Save the Automator application to your Applications folder with a name like "MyPythonApp".

  1. Add the Application to the Dock:

    • Navigate to the saved Automator application in Finder.

    • Drag the application to your Dock.

Final Steps

  • Test the Application:

    • Click the new application icon in the Dock to ensure it launches your Python script correctly.
  • Debugging:

    • If the application doesn't work as expected, open the Automator application and check the shell script for any typos or incorrect paths.

By following these steps, you will have a dockable macOS application with a custom icon that runs your Python script with the necessary environment activated. If you encounter any issues, feel free to ask for further assistance!

Conclusion

This project demonstrates how Python can be used to automate the transcription of audio files, leveraging powerful APIs and providing a user-friendly interface. The modular design allows for easy customization and extension, making it a valuable tool for anyone who frequently works with audio transcriptions. Whether you're a journalist, researcher, or just someone looking to save time, this application can help streamline your workflow and improve productivity.

0
Subscribe to my newsletter

Read articles from Karl Bolinger directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Karl Bolinger
Karl Bolinger

I'm an artist who has technical skills. I am a practicing ceramicist, photographer, and installation artist. I've worked at cool places such as Qualys and Pendo. Today, I operate within HashiCorp's IT team, Corporate Digital Transformation. I run Digital Adoption and Knowledge Management. In other words, I drive value realization and curate the right knowledge to the right person and the best time. I've learned to pivot and take my creative and problem-solving skills and apply them in tech. I've built solutions, products, managed projects, and more.