5-day Gen AI Intensive Course with Google Capstone Project 2025Q1 by ironclawdevs

Aryan RajAryan Raj
39 min read

Project Title: Youtube Shorts Ideation & Scripting using Google Trends Data + Gemini AI

About 5-day Gen AI Intensive Course with Google:

This was a 5-day live event held from March 31 to April 4, 2025, by Google, focusing on the fundamental technologies and techniques of Generative AI. It is now available as a self-paced learning guide covering topics like foundational models, prompt engineering, embeddings, vector stores, AI agents, domain-specific LLMs, and MLOps for Gen AI, structured over five days of content.

Project Idea that tackles the Problem Statement:

Content creators on platforms like YouTube Shorts often struggle to come up with new and interesting ideas that match what audiences are interested in. Analyzing trends and writing scripts by hand takes a lot of time. We can use AI models like Google's Gemini to automate research and find out what's trending in specific areas, helping to create scripts for short videos like reels with a bonus support of eBooks based RAG for enhanced content creation knowledge.

Project Goal & Objectives:

  • To develop a proof-of-concept system that leverages near real-time trend data from Google Trends and Generative AI (Google Gemini) to automate the ideation, multi-style script generation, and initial evaluation process for YouTube Shorts content.

  • Fetch relevant topics/keywords using available trend data sources (SerpApi Google Trends endpoint).

  • Enrich selected topics with context, sentiment, and audience personas using Gemini.

  • Generate diverse YouTube Shorts scripts (Informative, Comedic, Listicle) tailored to the enriched topic using Gemini.

  • Evaluate the generated scripts automatically using Gemini based on defined criteria.

  • Generate supplementary assets like voice-over audio and thumbnail concepts.

  • Provide an interactive interface (optional) for user input.

Project Flow:

Environment, Tech Stack and Platform:

  • Kaggle Notebook

  • Python and its libraries

  • SerpApi’s Google Trends API

  • Google's Gemini 2.0 Flash model

  • Text-to-Speech API (Google Cloud TTS, Edge-TTS with AsyncIO and Google Cloud Text-to-speech API)

  • Text-to-Image API (Google’s Imagen 3 via Vertex AI)

  • Lang-chain

  • ChromaDB (Vector Database)

Installing Dependencies:

Select Add-ons > Install Dependencies option on Kaggle Notebook to install all the libraries.

Commands to run for installing dependencies:

!pip --quiet install pandas matplotlib seaborn google-generativeai gTTS ipywidgets google-search-results edge-tts google-cloud-texttospeech google-cloud-aiplatform chromadb>=0.4.20 pymupdf langchain beautifulsoup4 ipython
!pip --quiet install -q playwright nest_asyncio
!playwright install chromium

Importing Libraries:

import subprocess
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import google.generativeai as genai
from google.generativeai import types
from gtts import gTTS
import ipywidgets as widgets
from IPython.display import display, Markdown, Audio, Image as IPImage
import os
import time
from datetime import datetime, timedelta
from serpapi import GoogleSearch
from kaggle_secrets import UserSecretsClient
import nest_asyncio
nest_asyncio.apply()
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import re
import json
from tqdm import tqdm
import edge_tts
from google.cloud import texttospeech
from google.cloud import aiplatform
from vertexai.preview.vision_models import ImageGenerationModel
import fitz
import chromadb
from google.api_core import retry
from chromadb import Documents, EmbeddingFunction, Embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from google.auth import default

Importance of these Libraries:

  • subprocess: Executes shell commands and interacts with system processes from within Python.

  • sys: Provides access to system-specific parameters and functions, including runtime environment control.

  • pandas as pd: Used for data manipulation and analysis through powerful DataFrame structures.

  • matplotlib.pyplot as plt: Plots data and visualizes trends with customizable static charts.

  • seaborn as sns: Enhances Matplotlib with prettier and more complex statistical visualizations.

  • google.generativeai as genai: Interfaces with Google’s Generative AI APIs like Gemini for AI tasks.

  • from google.generativeai import types: Provides access to type definitions and utilities for Google’s generative models.

  • from gtts import gTTS: Converts text to speech using Google's Text-to-Speech API.

  • ipywidgets as widgets: Creates interactive UI elements in Jupyter Notebooks for better user interaction.

  • from IPython.display import display, Markdown, Audio, Image as IPImage: Displays rich media outputs (text, audio, images) in Jupyter.

  • os: Interacts with the operating system to handle file paths, directories, and environment variables.

  • time: Offers time-related functions like sleeping, time tracking, and formatting.

  • from datetime import datetime, timedelta: Handles date/time arithmetic and formatting for scheduling or tracking.

  • from serpapi import GoogleSearch: Integrates with SerpAPI to perform Google Search and scrape result data.

  • from kaggle_secrets import UserSecretsClient: Accesses stored API keys and secrets securely in Kaggle environments.

  • import nest_asyncio: Enables reentrant use of asyncio event loop, useful in environments like Jupyter.

  • nest_asyncio.apply(): Applies the patch to allow nested use of asyncio, avoiding runtime errors.

  • import asyncio: Enables writing concurrent asynchronous code using async/await syntax.

  • from playwright.async_api import async_playwright: Automates browser tasks asynchronously for scraping or testing.

  • from bs4 import BeautifulSoup: Parses and navigates HTML/XML documents for web scraping and data extraction.

  • import re: Provides tools for matching and manipulating strings using regular expressions.

  • import json: Parses and serializes JSON data for configuration, API responses, or data interchange.

  • from tqdm import tqdm: Displays progress bars for loops, useful for tracking long operations.

  • import edge_tts: Uses Microsoft Edge TTS API to convert text into natural-sounding speech.

  • from google.cloud import texttospeech: Uses Google Cloud’s TTS API for converting text to audio.

  • from google.cloud import aiplatform: Interacts with Google Vertex AI for managing and deploying ML models.

  • from vertexai.preview.vision_models import ImageGenerationModel: Generates images using Google Vertex AI's Vision models.

  • import fitz: Interacts with PDFs via PyMuPDF for reading, writing, and editing PDF content.

  • import chromadb: Works with Chroma, a vector database for storing and querying embeddings.

  • from google.api_core import retry: Adds robust retry logic to API calls, improving reliability of cloud interactions.

  • from chromadb import Documents, EmbeddingFunction, Embeddings: Defines document structures and functions for vector storage and retrieval.

  • from langchain.text_splitter import RecursiveCharacterTextSplitter: Splits large texts into manageable chunks for LLM processing.

  • from google.auth import default: Retrieves default Google credentials for authenticating cloud services.

Setting-up API Keys and Configurations in Kaggle:

Select Add-ons > Secrets option on Kaggle Notebook to set your API keys.

Use Add Secret option to add API keys like this.

You get 100 free API calls for a month in SerpApi’s Google Trends API on successful account creation.

You have to enable Generative Language API for using Gemini, Cloud Text-to-Speech API for voiceovers and Vertex AI API for suing Imagen 3 Model in API and Services of your Google Cloud Console and then create a Service account.

Note: I am using Free Tier Account, you may face Billing charges so beware!

To enable your code to use Google Cloud's Vertex AI services, so as to use the Text-to-Image model Imagen 3, you need to ensure the Service Account your code uses for authentication has the required permissions. This is typically done by granting specific IAM roles to the Service Account within your Google Cloud project.

The key steps involve:

  1. Accessing the Google Cloud Console and selecting your project.

  2. Navigating to "IAM & Admin" > "IAM".

  3. Identifying the specific Service Account email address (the one linked to the JSON key file your code uses for authentication, via GOOGLE_APPLICATION_CREDENTIALS2 in my case).

  4. Editing the permissions for that Service Account.

  5. Adding the role "Vertex AI User" (roles/aiplatform.user).

This role provides the necessary permissions for the Service Account (and thus your code) to interact with Vertex AI, allowing it to utilize models like Imagen for image generation.

Now, generate the key and a JSON file is downloaded.

Copy entire JSON content from that file, create a new secret with label “GOOGLE_APPLICATION_CREDENTIAL” and paste the JSON content as value.

Your code then uses the JSON content to authenticate its API calls to Vertex AI.

In my case, I worked on Voiceovers first so I didn’t enable Vertex AI first so I am using two Credentials one for using Text-to-Speech Model and one for using Imagen 3 via Vertex AI.

As per the Documentation of SerpApi’s Google Trends API (Refer to the Documentation of SerpApi’s Google Trends API: SerpApi's Google Trends API), we need a keyword for query ‘q’ parameter, and as per our use case we wanted trending now keywords so web scraping was the most viable and common solution, so we use Inspect option of our Chrome browser to get the necessary HTML tag for the keywords that are trending now.

Note: These tags may get updated after sometime so the scraping needs to updated with time, if any error is thrown.

SerpApi in Action:

After scraping the keywords, use them as queries to get insights and details like Interest Over time Data, Top Related Queries and Rising Related Queries by using:

interest_params = {
    "q": ",".join(seed_keywords),
    "geo": selected_region_code,
    "hl": selected_language,
    "tz": selected_timezone,
    "date": interest_timeframe,
    "data_type": "TIMESERIES",
    "engine": "google_trends"
}
interest_results = get_serpapi_trends_data(interest_params)

We can also visualize Interest over Time and Top Rising Related Queries (by extracted value):

I used Google’s Gemini 2.0 Flash that was used during the 5-day Gen AI Intensive Course:

Keyword Enrichment:

Each keyword is enriched using this engineered prompt:

prompt = f"""
Analyze the search term or topic: "{keyword}"

Assume this term is gaining attention or is relevant within the region of {region_display_name}.

Provide the following information, using Markdown headings for each section exactly as specified below:

## Explanation

Briefly explain why "{keyword}" might be trending or relevant right now, considering potential recent events or general interest.

## Sentiment

Describe the general sentiment (e.g., positive, negative, neutral, mixed) surrounding "{keyword}". Provide a brief reasoning.

## Audience Persona

Describe the likely audience persona interested in "{keyword}" specifically within {region_display_name}. Consider demographics, interests, and potential needs or motivations relevant to this region.

## Sub-Topics

List 3-5 related sub-topics or micro-trends associated with "{keyword}" that could be explored further. List each on a new line, optionally starting with a bullet (* or -).
"""

Output for 1 keyword:

Script Generation:

Enriched Data is used to create 3 scripts by utilizing the generated Explanation, Audience Persona and Subtopics based on 3 different sentiments: Comedic, Listicle and Informative using this engineered prompt.

prompt = f""" Generate 3 distinct YouTube Shorts script ideas for the topic: "{keyword}".

Goal: Create scripts that are engaging, surprising, "spicy", and avoid generic statements. Include factual information that is specific, verifiable, and less commonly known, particularly relevant to India.

Context for Tailoring:

Topic Explanation (Basic): {context_explanation}

Target Audience (India Focus): {context_persona}

Potential Sub-Topics (for ideas): {context_subtopics}

Task: Create scripts for the following 3 styles, focusing on intrigue and unique angles:

Informative/Myth-Busting: Debunk common myths or reveal surprising statistics/facts about '{keyword}' in India.

Comedic/Satirical Skit: Base humor on specific, perhaps slightly absurd, real-world implications, misunderstandings, or unexpected applications of '{keyword}' observed in India. Avoid generic jokes.

"Secrets"/Listicle: Frame the list as 'hidden truths,' 'unbelievable facts,' 'things they don't tell you,' or 'underrated aspects' about '{keyword}', backed by specific, less common details or data points.

Output Format: Provide the output STRICTLY as a single valid JSON object. Do not include any text, explanation, or markdown formatting before or after the JSON object itself.

The JSON object must have exactly three top-level keys: "Informative", "Comedic", "Listicle".

The value for each style key must be another JSON object containing the following keys:

"hook": A string (approx. 3-5 seconds long) designed to be highly attention-grabbing – use a surprising fact, debunk a myth, ask a provocative question, or present a counter-intuitive statement related to '{keyword}'.

"main_points": A list of 3 strings. Crucially, each point MUST incorporate specific, verifiable, and less commonly known factual information, data, or concrete examples related to '{keyword}', relevant to India where possible. Avoid vague claims.

"cta": A string for the Call to Action, fitting the specific script's tone (e.g., "Think you knew AI? Think again!", "Share if this blew your mind!").

"visual_cues": A list of 3-5 strings suggesting dynamic or specific visual ideas that complement the 'spicy' tone and factual content (e.g., "[Split screen: Myth vs. Reality]", "[Close up on surprising data point chart]", "[Dramatic reveal animation]").

"title_suggestions": A list of exactly 3 intriguing title suggestions reflecting the non-bland angle (e.g., "AI's Secret Weapon in India?", "Bollywood Myth BUSTED!", "Stock Market Hack They DON'T Teach You").

"hashtags": A list of exactly 5 relevant hashtags (including #shorts and perhaps one more specific/niche tag).

Example JSON Structure (Illustrative - follow key names precisely, focus on content quality as described above):
{{
  "Informative": {{
    "hook": "Think AI is stealing jobs in India? The REAL numbers might shock you!",
    "main_points": [
      "Fact: While some roles shift, AI is projected to create X million *new* tech jobs in India by 2028 (Source: NASSCOM/Specific Report).",
      "Surprise: India's AI adoption isn't just in IT! Specific example: AI in [Region Name]'s mango farms increased yield by Y% using [Specific Technique].",
      "Myth Debunked: 'AI needs huge data centers' - Actually, edge AI deployment in India uses Z% less energy for tasks like [Specific Task]."
    ],
    "cta": "Know the facts! Follow for more AI reality checks. #shorts",
    "visual_cues": [
      "[Graph showing job creation vs. displacement]",
      "[Drone footage of AI tech in Indian agriculture]",
      "[Side-by-side comparison: old method vs. Edge AI]",
      "[Fast-paced text overlays with stats]"
    ],
    "title_suggestions": [
      "AI Job MYTHS Busted (India!)",
      "India's SECRET AI Advantage?",
      "Shocking AI Facts (India Focus)"
    ],
    "hashtags": [
      "#shorts",
      "#AIIndia",
      "#AIFacts",
      "#TechMyths",
      "#DigitalIndia"
    ]
  }},
  "Comedic": {{ ... focus on specific Indian context humor ... }},
  "Listicle": {{ ... focus on 'secrets' or 'hidden facts' with data ... }}
}}
Remember to prioritize specific, less common facts and maintain an engaging, non-bland tone throughout all script elements. Ensure the final output adheres strictly to the JSON structure ONLY.
"""

Results:

Generated Script Data:

--- Results for Keyword: 'real madrid - athletic' ---
{
  "Informative": {
    "hook": "Did you know a Basque player has NEVER played for Real Madrid?",
    "main_points": [
      "Fact: Athletic Bilbao's unique 'cantera' policy (Basque-only players) is so strict, no player born outside the Basque Country has ever played for their first team *or* Real Madrid.",
      "Surprise: Despite facing Real Madrid countless times, Athletic's defensive record against them is surprisingly strong. They've conceded fewer goals per game against Real Madrid than against other top La Liga teams in the last decade.",
      "Real Madrid's strategy for beating Athletic focuses on exploiting the wings due to Athletic's typically narrow defensive setup, leading to higher crossing attempts. This tactical adaptation impacts Indian betting patterns on the over/under for crosses in the match."
    ],
    "cta": "Mouth agape? Subscribe for more mind-blowing football facts! #shorts",
    "visual_cues": [
      "[Split screen: Map of Basque Country vs. Real Madrid Stadium]",
      "[Animation showing Athletic Bilbao's defensive formation]",
      "[Statistical graphic comparing goals conceded by Athletic against different teams]",
      "[Overlay text highlighting betting trends in India related to crosses]"
    ],
    "title_suggestions": [
      "Real Madrid vs Athletic: The BASQUE CURSE?!",
      "Shocking La Liga Secret REVEALED!",
      "The One Team Real Madrid Can't Dominate"
    ],
    "hashtags": [
      "#shorts",
      "#RealMadrid",
      "#AthleticBilbao",
      "#LaLiga",
      "#FootballFacts"
    ]
  },
  "Comedic": {
    "hook": "What if Real Madrid tried poaching players from Indian Kabaddi League teams?",
    "main_points": [
      "Imagine Florentino Perez scouting a raider from the Pro Kabaddi League, impressed by their agility. The transfer fee is quoted in Rupees, causing confusion in Madrid's finance department.",
      "Vinicius Junior trying to learn the 'Dubki' move from a PKL expert for improved dribbling. The result? A series of hilarious failed attempts and a viral TikTok trend in India.",
      "A headline in Indian sports news: 'Real Madrid sign kabaddi star! Will this finally break Athletic Bilbao's defense?'"
    ],
    "cta": "Kabaddi + Football = Chaos! Like if this made you laugh! #shorts",
    "visual_cues": [
      "[Florentino Perez bewildered by Rupee exchange rates]",
      "[Vinicius Junior awkwardly attempting the Dubki]",
      "[Humorous animation of a Kabaddi player scoring against Real Madrid]",
      "[Fake Indian news headline graphic]"
    ],
    "title_suggestions": [
      "Real Madrid KABADDI Edition?!",
      "Vinicius Learns KABADDI (GONE WRONG)",
      "When Football Meets Kabaddi..."
    ],
    "hashtags": [
      "#shorts",
      "#RealMadrid",
      "#Kabaddi",
      "#Funny",
      "#India"
    ]
  },
  "Listicle": {
    "hook": "3 hidden truths about Real Madrid vs. Athletic Bilbao!",
    "main_points": [
      "Secret #1: Athletic's youth academy, Lezama, is statistically more likely to produce professional players than Real Madrid's La Fabrica, especially those playing in top European leagues (data from specific football observatory reports).",
      "Secret #2: The economic impact of an Athletic Bilbao victory on the Basque region's GDP is disproportionately higher compared to a Real Madrid win's impact on the Madrid region (cite specific economic study). This is due to Bilbao's unique community ownership.",
      "Secret #3: While Real Madrid dominates social media globally, Athletic Bilbao has a significantly higher per-capita social media engagement rate within the Basque diaspora in India, particularly in regions like Goa (provide example of Basque cultural association in Goa)."
    ],
    "cta": "Mind blown? Share this with your football fanatic friends! #shorts",
    "visual_cues": [
      "[Comparison graphic: Lezama vs. La Fabrica stats]",
      "[Chart showing GDP impact of wins for each team]",
      "[Screenshot of Basque cultural association's social media page in Goa]",
      "[Fast-paced editing with text overlays revealing secrets]"
    ],
    "title_suggestions": [
      "Real Madrid vs. Athletic: Hidden TRUTHS!",
      "La Liga SECRETS They Don't Want You To Know",
      "3 Shocking Facts About This Rivalry!"
    ],
    "hashtags": [
      "#shorts",
      "#RealMadrid",
      "#AthleticBilbao",
      "#LaLigaSecrets",
      "#Football"
    ]
  }
}

Script Evaluation:

This process uses the Gemini API to automatically evaluate the scripts it previously generated. For each script, details are sent to Gemini with an evaluation prompt requesting scores and reasoning on predefined criteria like Engagement, Clarity, and Factual Integration. The parse_gemini_evaluation function then specifically parses Gemini's text response using pattern matching. It extracts the score and reasoning for each criterion, providing a structured evaluation. This allows the code to programmatically assess script quality.

evaluation = {
    'Engagement_Score': None,
    'Engagement_Reasoning': None,
    'Clarity_Score': None,
    'Clarity_Reasoning': None,
    'Structure_Score': None,
    'Structure_Reasoning': None,
    'Hook_Strength_Score': None,
    'Hook_Strength_Reasoning': None,
    'Factual_Integration_Score': None,
    'Factual_Integration_Reasoning': None,
    'Overall_Comments': None
}

criteria_map = {
    "Engagement": "Engagement",
    "Clarity": "Clarity",
    "Structure": "Structure",
    "Hook Strength": "Hook_Strength",
    "Hook": "Hook_Strength",
    "Factual Integration": "Factual_Integration"
}

Prompt for Evaluation:

prompt = f"""
Please evaluate the following YouTube Shorts script based on the criteria below.

**Script Details:**

* **Style:** {style}
* **Topic Focus:** {keyword}
* **Hook:** {hook}
* **Main Points:** {main_points}
* **Call to Action:** {cta}
* **Visual Cues:** {visual_cues}
* **Title Suggestions:** {title_suggestions}
* **Hashtags:** {hashtags}

**Evaluation Criteria:** Evaluate the script based on the following criteria. For each, provide a score out of 10 and brief reasoning:

* **Engagement (1-10):** How likely is this script to hold viewer attention? Is the angle interesting, surprising, or "spicy"? Does it avoid being generic?
* **Clarity (1-10):** Is the core message clear and easily understandable within the short format?
* **Structure (1-10):** Does the script follow a logical flow (effective hook -> concise points -> clear CTA)?
* **Hook Strength (1-10):** How effective is the hook at _immediately_ grabbing attention and setting up the specific angle or intrigue?
* **Factual Integration (1-10):** How well are _specific, verifiable, and less common_ facts or data points integrated into the main points? (Evaluate more strictly for Informative/Listicle styles).

**Output Format:** Please format your response clearly. List each criterion on a new line, starting with the criterion name (e.g., "**Engagement:**" or "- Engagement:"), followed by its score (e.g., "8/10" or "Score: 8") and then the reasoning for that score.
"""

For every script, its full content is sent back to the Gemini API within a specific evaluation prompt. This prompt instructs Gemini to score the script (out of 10) based on five criteria: Engagement, Clarity, Structure, Hook Strength, and Factual Integration. Gemini provides these scores and detailed reasoning in a text response. Then, average score is calculated.

Results:

The evaluation results for all scripts are compiled into a pandas DataFrame. An average score is calculated for each script by averaging its scores across the five evaluation criteria. This DataFrame is displayed, providing a clear overview of how each script performed, including all individual scores and reasoning. Based on the calculated average scores, the code identifies and presents the best-performing script style for each keyword and lists the top 5 scripts overall, highlighting their quality based on Gemini's assessment.

Bug: Factual_Integration_Score, Hook_Strength_Score, Factual_Integration_Reasoning and Hook_Strength_Reasoning were returned ‘None’ for all the keywords by the Gemini.

Script Evaluation Results:
keywordstyleAverage_ScoreClarity_ScoreEngagement_ScoreFactual_Integration_ScoreHook_Strength_ScoreStructure_ScoreClarity_ReasoningEngagement_ReasoningFactual_Integration_ReasoningHook_Strength_ReasoningStructure_ReasoningOverall_Comments
0real madrid - athleticInformative8.33333398NoneNone8** 9/10 - The core message is relatively clear...** 8/10 - The script presents interesting and ...NoneNone** 8/10 - The script follows a logical structu...Here's an evaluation of the YouTube Shorts scr...
1real madrid - athleticComedic8.33333398NoneNone8** 9/10. The core message - "What if Real Madr...** 8/10. The premise of Real Madrid poaching K...NoneNone** 8/10. The script follows a logical flow: ho...Here's an evaluation of the YouTube Shorts scr...
2real madrid - athleticListicle8.66666798NoneNone9** 9/10 - The listicle format with clear "Secr...** 8/10 - The "hidden truths" angle is intrigu...NoneNone** 9/10 - The script follows a logical structu...Here's an evaluation of the YouTube Shorts scr...
3grizzlies vs thunderInformative8.66666798NoneNone9** 9/10\nThe message is very clear: NBA is gro...** 8/10\nThe script taps into the surprising g...NoneNone** 9/10\nThe script follows a well-structured ...Here's an evaluation of the YouTube Shorts scr...
4grizzlies vs thunderComedic8.33333398NoneNone8** 9/10. The core message – difficulty explain...** 8/10. The cultural clash angle (Desi parent...NoneNone** 8/10. The script follows a logical structur...Here's an evaluation of the YouTube Shorts scr...
5grizzlies vs thunderListicle8.33333398NoneNone8** 9/10 - The listicle format helps to keep th...** 8/10 - The script leverages the unexpected ...NoneNone** 8/10 - The structure is solid. The hook set...Here's an evaluation of the YouTube Shorts scr...
6celticsInformative8.66666798NoneNone9** 9/10. The script is structured around clear...** 8/10. The "hidden connection" angle is intr...NoneNone** 9/10. The script follows a well-defined str...Okay, here's the evaluation of the YouTube Sho...
7celticsComedic8.33333398NoneNone8** 9/10 The core message – Celtics mania manif...** 8/10 The "Celtics in India" premise is inhe...NoneNone** 8/10 The script follows a logical progressi...Here's the evaluation of the YouTube Shorts sc...
8celticsListicle8.33333398NoneNone8** 9/10 - The listicle format ensures clarity....** 8/10 - The "secrets" angle is inherently en...NoneNone** 8/10 - The script follows a standard listic...Here's an evaluation of the YouTube Shorts scr...
9what is open on easter sundayInformative8.33333389NoneNone8** 8/10 The script is relatively clear, and th...** 9/10 The script tackles a counter-intuitive...NoneNone** 8/10 The structure is generally good: a str...Here's the evaluation of the YouTube Shorts sc...
10what is open on easter sundayComedic8.33333398NoneNone8** 9/10. The script clearly outlines the core ...** 8/10. The script leverages relatable situat...NoneNone** 8/10. The script follows a logical structur...Here's an evaluation of the YouTube Shorts scr...
11what is open on easter sundayListicle8.33333398NoneNone8** 9/10 - The listicle format makes the inform...** 8/10 - The "secrets" angle is intriguing, a...NoneNone** 8/10 - The script follows a standard listic...Okay, here is the evaluation of the YouTube Sh...
12nancy maceInformative7.66666778NoneNone8** 7/10 - The main points are relatively clear...** 8/10 - The script presents an intriguing, s...NoneNone** 8/10 - The script follows a standard, effec...Here's an evaluation of the YouTube Shorts scr...
13nancy maceComedic8.33333398NoneNone8** 9/10 - The concept is easily grasped. Each ...** 8/10 - The concept is original and potentia...NoneNone** 8/10 - The script follows a good structure....Okay, here's the evaluation of the YouTube Sho...
14nancy maceListicle8.097NoneNone8** 9/10 - The script is structured clearly wit...** 7/10 - The "India Perspective" angle is int...NoneNone** 8/10 - The structure follows a logical flow...Here's an evaluation of the provided YouTube S...
15memphis grizzlies vs okc thunder match player ...Informative8.66666798NoneNone9** 9/10. The script breaks down complex game d...** 8/10. The script has a good hook that hints...NoneNone** 9/10. The script follows a well-defined str...Here's an evaluation of the YouTube Shorts scr...
16memphis grizzlies vs okc thunder match player ...Comedic8.33333398NoneNone8** 9/10. The core message – the absurdity of u...** 8/10. The premise is bizarre and unexpected...NoneNone** 8/10. The structure is solid. The hook imme...Here's an evaluation of the YouTube Shorts scr...
17memphis grizzlies vs okc thunder match player ...Listicle8.66666798NoneNone9** 9/10\nThe script is very clear and direct. ...** 8/10\nThe "hidden truths" angle is inherent...NoneNone** 9/10\nThe listicle structure is well-suited...Okay, here's an evaluation of the YouTube Shor...
18nancy mace newsInformative8.66666798NoneNone9** 9/10. The main points are concise and relat...** 8/10. The "twist" hook is good. Highlightin...NoneNone** 9/10. The script follows a logical structur...Here's the evaluation of the YouTube Shorts sc...
19nancy mace newsComedic8.33333398NoneNone8** 9/10. The script clearly outlines the comed...** 8/10. The "fish out of water" scenario of N...NoneNone** 8/10. The script follows a logical structur...Here's an evaluation of the YouTube Shorts scr...
20nancy mace newsListicle8.078NoneNone9** 7/10 - The points are relatively clear and ...** 8/10 - The "hidden truths" angle is general...NoneNone** 9/10 - The script follows a classic listicl...Okay, here is the evaluation of the Nancy Mace...
21is ross open on easter sundayInformative8.66666798NoneNone9** 9/10 - The core message – that holiday reta...** 8/10 - The script offers a surprising compa...NoneNone** 9/10 - The script follows a strong, logical...Okay, here's an evaluation of the YouTube Shor...
22is ross open on easter sundayComedic9.333333109NoneNone9** 10/10. The concept is straightforward and e...** 9/10. The script leverages a highly relatab...NoneNone** 9/10. The script follows a standard, effect...Here's an evaluation of the YouTube Shorts scr...
23is ross open on easter sundayListicle8.33333398NoneNone8** 9/10 - The listicle format makes the points...** 8/10 - The "secrets they don't tell you" fr...NoneNone** 8/10 - The script follows a standard listic...Here's an evaluation of the YouTube Shorts scr...
24nancy mace electionInformative8.078NoneNone9** 7/10. The core message of Mace having a mor...** 8/10. The hook is intriguing and uses the "...NoneNone** 9/10. The script follows a logical and effe...Here's an evaluation of the YouTube Shorts scr...
25nancy mace electionComedic8.33333398NoneNone8** 9/10. The premise and plot points are easy ...** 8/10. The Bollywood-inspired concept is une...NoneNone** 8/10. The script follows a standard narrati...Okay, here's the evaluation of the YouTube Sho...
26nancy mace electionListicle8.66666798NoneNone9** 9/10 The listicle format ensures a structur...** 8/10 The angle of focusing on the Indian-Am...NoneNone** 9/10 The script follows a classic listicle ...Here's an evaluation of the provided YouTube S...
27nancy mace videoInformative8.66666789NoneNone9** 8/10. The script breaks down complex topics...** 9/10. The script uses a trending topic (Nan...NoneNone** 9/10. The script follows a strong structure...Here's an evaluation of the provided YouTube S...
28nancy mace videoComedic8.33333389NoneNone8** 8/10. The script presents three distinct, a...** 9/10. The premise is inherently absurd and ...NoneNone** 8/10. The script follows a classic hook-rev...Here's an evaluation of the YouTube Shorts scr...
29nancy mace videoListicle7.66666778NoneNone8** 7/10 - The listicle format inherently aids ...** 8/10 - The script uses intrigue and promise...NoneNone** 8/10 - The script follows a standard and ef...Here's an evaluation of the YouTube Shorts scr...

Best Performing Style per Keyword (based on Avg Score):

keywordstyleAverage_Score
0celticsInformative8.666667
1grizzlies vs thunderInformative8.666667
2is ross open on easter sundayComedic9.333333
3memphis grizzlies vs okc thunder match player ...Informative8.666667
4nancy maceComedic8.333333
5nancy mace electionListicle8.666667
6nancy mace newsInformative8.666667
7nancy mace videoInformative8.666667
8real madrid - athleticListicle8.666667
9what is open on easter sundayInformative8.333333

Top 5 Best Performing Scripts (based on Average_Score):

keywordstyleAverage_Score
0is ross open on easter sundayComedic9.333333
1memphis grizzlies vs okc thunder match player ...Informative8.666667
2real madrid - athleticListicle8.666667
3grizzlies vs thunderInformative8.666667
4nancy mace videoInformative8.666667

Generating Voiceovers:

Unique Scripts are identified from Best Performing Style per Keyword (based on Avg Score) and Top 5 Best Performing Scripts (based on Average_Score) and then these unique scripts are used for generating voiceovers using Text-to-Speech libraries (gTTS and Edge-TTS with AsyncIO) and Google Cloud Text-to-Speech API.

Comparison:

ParametergTTS (Google Text-to-Speech)Edge TTS (with asyncio)Google Cloud Text-to-Speech API
Library/Toolgttsedge_ttsgoogle.cloud.texttospeech
ProviderGoogleMicrosoft Edge (uses Azure Speech Services)Google Cloud
API Key Required❌ No❌ No (uses Edge browser endpoint)✅ Yes (Google Cloud project & credentials required)
Asynchronous Support❌ No (synchronous only)✅ Yes (built on asyncio)❌ No native asyncio, but can be used in async environments
Voices AvailableLimited (around 30–50 voices in few languages)✅ 400+ voices across 140+ languages and locales✅ 380+ voices across 50+ languages and variants
Languages Supported✅ ~50✅ ~140+✅ ~50
Voice QualityBasic, roboticNatural, neural voicesAdvanced, WaveNet, Studio-quality voices
Custom Voice Support❌ No❌ No✅ Yes (via Google Cloud custom voice model)
SSML (Speech Synthesis Markup Language)❌ No✅ Yes (limited)✅ Full SSML support for prosody, pitch, breaks, etc.
Background Execution✅ Yes (simple to use in scripts)✅ Requires asyncio environment setup✅ Yes (can be integrated into services)
Installation Simplicity✅ Easy (pip install gtts)✅ Easy (pip install edge-tts)⚠️ Requires setup (pip install google-cloud-texttospeech) and credentials
Speed✅ Fast✅ Fast (depends on connection to Microsoft services)✅ Fast but slightly longer due to secure API call
Output FormatMP3MP3MP3, LINEAR16, OGG_OPUS, MULAW, ALAW
Audio Customization❌ No⚠️ Limited (some prosody features)✅ Extensive (pitch, speed, volume gain, speaking style)
Offline Support❌ No❌ No❌ No (all require internet connection)
Free Tier Availability✅ Yes (unofficial, uses Google's free service)✅ Yes (unofficial, no auth needed)✅ Yes (1 million characters/month free in trial)
Rate Limits / QuotasUnspecified, may hit CAPTCHA if overusedUnspecified, may change anytime✅ Clear limits, scalable billing after free tier
Commercial Use⚠️ Not officially allowed❌ Not officially permitted✅ Yes (enterprise-grade licensing)
Ideal Use CasesPersonal projects, quick scriptsReal-time apps, chatbots needing async TTSProduction-grade systems, high-fidelity TTS, compliance-heavy apps

I experimented with all three methods and found the Google Cloud Text-to-Speech API to be the most effective. However, if ease of use and avoiding billing charges are priorities, Edge-TTS with AsyncIO is the most suitable option for generating voiceovers.

Generating Thumbnail Concepts:

This process focuses on generating creative thumbnail concepts for the evaluated scripts using the Gemini API. It iterates through the selected scripts, extracting key details like the hook, main points summary, title suggestions, and existing visual cues for each. These details are then used to construct a specific prompt for Gemini, instructing it to devise a YouTube Shorts thumbnail concept. The prompt asks Gemini to describe the required visual elements, suggest a text overlay, define the emotional tone, and provide style notes (color, font) for a click-optimized thumbnail. The Gemini API is called with this prompt, specifically requesting a JSON response. The output is then parsed to store the structured thumbnail concept for each script.

Prompt for Thumbnail Concept Generation:

prompt = f""" Generate a compelling YouTube Shorts thumbnail concept based on the following script details:

Script Context:

Topic/Keyword: {keyword}

Style: {style}

Hook: "{hook}"

Core Idea Snippet: "{main_point_summary}"

Potential Title: "{title_suggestion}"

Existing Visual Cue Ideas: [{visual_cues_str}]

Task: Describe a YouTube Shorts thumbnail concept designed to maximize clicks and accurately represent the script's essence (intrigue, information, humor). Optimize for clarity and impact on small mobile screens. Build upon or incorporate the 'Existing Visual Cue Ideas' where appropriate.

Output Format: Provide the output STRICTLY as a single valid JSON object. Do not include any text, explanation, or markdown formatting before or after the JSON object itself.

The JSON object must have the following keys:

"visual_elements": (String) Describe the main imagery. Be specific (e.g., "Split screen: Left side shows confusing math symbols, Right side shows a bright AI brain graphic with connecting lines"). Reference the 'Existing Visual Cue Ideas'.

"text_overlay": (String) Suggest short, bold, attention-grabbing text (max 3-5 words) that complements the visual and hook/title. Ensure high readability.

"emotion": (String) The primary emotion or mood the thumbnail should evoke (e.g., "Curiosity", "Urgency", "Surprise", "Humor", "Intrigue").

"style_notes": (String) Brief notes on color palette (e.g., "High contrast: Bright colors on dark background"), font style (e.g., "Bold, impactful sans-serif font"), or overall aesthetic (e.g., "Clean and modern", "Grungy and energetic").

Ensure that the Thumbnail Concept is safe for Imagen 3 model and doesn't trips any safety filters, also ensure that the final output adheres strictly to this JSON structure ONLY. """

Results:

Concept for Keyword: 'what is open on easter sunday', Style: 'Informative':
Concept JSON:
{
  "visual_elements": "Split screen: Left - Cartoon Easter Bunny looking confused in front of the Taj Mahal. Right - Vibrant image of a bustling Indian marketplace with shops open and people shopping, overlaid with a simplified map of India highlighting Goa, Kerala, and Tamil Nadu in a bright color. Include a small cricket bat and ball in the marketplace image.",
  "text_overlay": "Easter? INDIA's Open!",
  "emotion": "Surprise",
  "style_notes": "High contrast, vibrant colors. Bold sans-serif font. Clean, modern aesthetic with slight humoristic cartoon elements."
}

Concept Breakdown:
Visual Elements: Split screen: Left - Cartoon Easter Bunny looking confused in front of the Taj Mahal. Right - Vibrant image of a bustling Indian marketplace with shops open and people shopping, overlaid with a simplified map of India highlighting Goa, Kerala, and Tamil Nadu in a bright color. Include a small cricket bat and ball in the marketplace image.

Text Overlay: Easter? INDIA's Open!

Emotion/Mood: Surprise

Style Notes: High contrast, vibrant colors. Bold sans-serif font. Clean, modern aesthetic with slight humoristic cartoon elements.

Generating Thumbnails via Imagen 3 Model:

'imagen-3.0-generate-002' # Imagen 3 Model ID used

This process generates the actual thumbnail images based on the creative concepts developed in the previous step, utilizing Google Cloud's Imagen 3 model via the Vertex AI API. It begins by initializing the Vertex AI client and loading the specific Imagen model. The code then iterates through each generated thumbnail concept dictionary. For every concept, it constructs a detailed image generation prompt using the concept's elements like visual description, text overlay, emotion, and style notes, ensuring the required 9:16 aspect ratio. This detailed prompt is sent to the imagen_model.generate_images method. If the API successfully returns an image, it is saved locally as a PNG file, and the file path is recorded for later use.

Caution: Key issues include hitting Vertex AI usage quotas for the Imagen model (Error 429), which halts further generation. The Imagen API might also return no images even if the call is technically successful, potentially due to the prompt triggering safety filters or moderation.

Results:

Pretty Accurate Thumbnails are generated by Imagen 3! 🎉

BONUS - Creating eBooks based RAG:

eBooks used:

  • Building a StoryBrand: Clarify Your Message So Customers Will Listen by Donald Miller

  • Contagious: Why Things Catch On by Jonah Berger

  • Content Machine: Use Content Marketing to Build a 7-Figure Business With Zero Advertising by Dan Norris

  • Copywriting Secrets: How Everyone Can Use the Power of Words to Get More Clicks, Sales, and Profits by Jim Edwards

  • Everybody Writes: Your Go-To Guide to Creating Ridiculously Good Content by Ann Handley

  • Steal Like an Artist: 10 Things Nobody Told You About Being Creative by Austin Kleon

  • The Content Fuel Framework: How to Generate Unlimited Story Ideas by Melanie Deziel

These ebooks offer a powerful toolkit for anyone looking to master branding, content creation, and persuasive communication. From Donald Miller’s clarity-driven storytelling approach to Melanie Deziel’s framework for generating endless content ideas, each book brings practical, actionable insights. Whether you’re a marketer, entrepreneur, or creative professional, they guide you through the art of engaging audiences, building trust, and standing out in the digital space. Together, they form a must-read collection for growing influence and impact through compelling content.

Process:

  • Document Loading: The system identifies and loads text content from all PDF files in a specified directory (e.g., /kaggle/input/ebooks).

  • Text Extraction: It extracts the raw text from each page of the PDFs using a library like PyMuPDF (fitz).

  • Document Chunking: The long extracted text is split into smaller, manageable sections or "chunks" using a RecursiveCharacterTextSplitter. This ensures chunks are small enough for embedding and processing but large enough to retain context.

  • Metadata Assignment: Each chunk is associated with metadata, typically including the source filename and its index within the original document.

  • Embedding Model Selection: A text embedding model is selected (in this case, a Gemini embedding model like models/text-embedding-004) capable of converting text into numerical vectors.

  • Embedding Function: A custom embedding function (GeminiEmbeddingFunction) is defined to interface the chosen embedding model with the vector database, handling tasks like document embedding and query embedding.

  • Vector Database Setup: A persistent vector database is initialized (ChromaDB), and a collection is created within it specifically for storing your document embeddings.

  • Document Embedding and Indexing: All the prepared document chunks are sent to the vector database in batches. The database uses the embedding function to convert each chunk's text into a vector and stores the vector along with the chunk's text and metadata. This builds the searchable index.

  • Interactive Query Loop: The system enters a loop, prompting the user to ask questions about the documents.

  • Query Embedding: When the user enters a question, the system uses the embedding model to convert the query into a vector, using the "retrieval_query" task type.

  • Vector Search (Retrieval): The system performs a similarity search in the ChromaDB collection using the query vector to find the 'N' most relevant document chunks (based on vector similarity).

  • Context Provision: The text content of the retrieved relevant chunks is extracted and formatted.

  • Prompt Construction: A prompt is dynamically created for the Generative Model (Gemini), incorporating the user's original question and the retrieved document chunks as context.

  • Answer Generation: The Generative Model processes this prompt and synthesizes a response that aims to answer the user's question using only the information found in the provided document chunks.

  • Answer Display: The final answer generated by the model is presented to the user, often with citations to the source documents.

Results:

Please enter your question about the documents (type 'quit' to exit):  list all the elements of a viral content.

Searching collection for: 'list all the elements of a viral content.' (returning 3 results)
Generating embeddings for 1 documents (Task: retrieval_document)...

--- Retrieved Chunks (for context) ---

Chunk 1 (Distance: 0.2228):
Source: Contagious_ Why Things Catch On.pdf, Chunk Idx: 438
--------------------
critical details and left out the extraneous ones.
—————
If you want to craft contagious content, try to build your own Trojan
Horse. But make sure you think about valuable virality. Make sure the
information you want people to remember and transmit is critical to the
narrative. Sure, you can make your narrative funny, surprising, or
entertaining. But if people don’t connect the content back to you, it’s not
going to help you very much. Even if it goes viral.
So build a Social Currency–laden, Tr...
--------------------

Chunk 2 (Distance: 0.2405):
Source: Everybody Writes_ Your Go-To Guide to Creating Ridiculously.pdf, Chunk Idx: 146
--------------------
Note: Adapted in part from “Create Stuff They've Just Gotta
Read: How to Write for #SocialNetworks” presentation at
MarketingProfs by Paul Gillin.
--------------------

Chunk 3 (Distance: 0.2407):
Source: Contagious_ Why Things Catch On.pdf, Chunk Idx: 501
--------------------
that helpful. Buying a Super Bowl ad or hiring Bono will help increase
the chance that content gets viewed and shared. Most people, however,
don’t have the funding or personal connections to make those things
happen. Instead, we focused on aspects of the content itself that were
linked to sharing.
More useful articles: A full description of our research on The New York
Times Most E-Mailed list, as well as our findings, can be found in
Berger, Jonah, and Katherine Milkman (2012), “What Makes Onli...
--------------------
Generating Final Answer (Improved Prompt):

============================== FINAL ANSWER ==============================
If you're looking to create viral content, here's a breakdown of the key elements to consider:

Social Currency: Build content that makes people look good when they share it. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)
Triggers: Associate your content with frequently occurring stimuli to keep it top of mind. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)
Emotion: Evoke an emotional response in your audience, as this increases the likelihood of sharing. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)
Public: Make your content visible and easily observable by others. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)
Practical Value: Ensure your content offers useful or valuable information to the audience. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)
Essentially, aim to create a "Trojan Horse" where your message is so integral to the narrative that people can't retell the story without it. (Source: Contagious_ Why Things Catch On.pdf, Chunk: 438)

===========================================================================

Please enter your question about the documents (type 'quit' to exit):  how to dominate marketing field using content?

Searching collection for: 'how to dominate marketing field using content?' (returning 3 results)
Generating embeddings for 1 documents (Task: retrieval_document)...

--- Retrieved Chunks (for context) ---

Chunk 1 (Distance: 0.1739):
Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk Idx: 75
--------------------
Don’t be picky about whether each piece of content is generating leads. Just
create as much value as you can for the most amount of people in your
--------------------

Chunk 2 (Distance: 0.1771):
Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk Idx: 16
--------------------
made content marketing work, and then give you the tools to do the same
for yourself.
In short, regardless of your experience with online content right now, my
goal is for you to finish this book and be in a position to build a high-
growth business without spending a cent on advertising.
Let’s get started.
--------------------

Chunk 3 (Distance: 0.1797):
Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk Idx: 14
--------------------
First, you will fully understand what content marketing is and how a
working content marketing strategy is put together. You may decide it’s not
for you, or you may jump in headfirst, screeching with excitement. Either
way, you need to understand it before you decide.
Second, you will learn a simple three-step framework for content marketing
success. It will help you focus on the right things, as opposed to simply
“writing every day.”
On top of that, I’ll make your life easier by providing a num...
--------------------
Generating Final Answer (Improved Prompt):

============================== FINAL ANSWER ==============================
If you're looking to dominate the marketing field using content, here's some advice based on the text:

Focus on Value: Create as much value as you can for the largest possible audience, rather than focusing on whether each piece of content immediately generates leads. (Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk: 75)

Understand Content Marketing: Before diving in, make sure you fully understand what content marketing is and how a successful strategy is structured. (Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk: 14)

Use a Framework: Use a simple framework (provided in the book) to focus your efforts and ensure your content marketing strategy is effective, rather than just creating content randomly. (Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk: 14)

Aim for High-Growth Without Ads: The book aims to equip you with the tools to build a high-growth business through content marketing, without relying on paid advertising. (Source: Content Machine_ Use Content Marketing to Build a 7-figure.pdf, Chunk: 16)

In summary, concentrate on providing value, understanding the fundamentals of content marketing, using a structured approach, and aiming for organic growth.

===========================================================================

Please enter your question about the documents (type 'quit' to exit):  quit
Exiting the query tool. Goodbye!

The provided results demonstrates good usability for a document-based Q&A system over a specific corpus.

  • Clear Interaction: The interactive loop is straightforward, prompting the user clearly and providing an exit command.

  • Relevant Retrieval: For the example queries ("elements of viral content", "how to dominate marketing using content"), the system successfully retrieved highly relevant chunks, primarily from "Contagious" and "Content Machine" respectively, which align with the known topics of those books.

  • Contextual Answering: Gemini effectively synthesized answers directly from the retrieved chunks, providing actionable advice and key elements mentioned in the documents.

  • Citation: The final answers include citations to the source files, adding transparency and allowing the user to locate the original information.

  • Informative Debugging: Printing the retrieved chunks for context (with distance and source) is helpful for understanding why the model answered the way it did and debugging retrieval issues.

Overall, for questions that can be directly answered or synthesized from the content of the indexed ebooks, the system appears user-friendly and effective in retrieving relevant information and generating coherent, context-grounded responses.

Future Aspects:

  • AI Agent based Workflow: Refer to Agent based Workflow Idea Draft

  • Streamlit UI: Implement a simple, interactive web interface using Streamlit to replace the notebook/command-line interaction, providing a user-friendly way to view results and control the workflow for demonstrations.

  • Web Application: Build a more robust and scalable application using a full web framework to support multiple users, handle data persistence more effectively, and enable production deployment.

  • Necessary Optimizations: Focus on improving performance by parallelizing API calls, reducing costs associated with paid services, enhancing error handling for reliability, and ensuring the system can scale to handle larger workloads.

Link to Kaggle Notebook: Kaggle Notebook

Link to Youtube Video: Video

Link to Github: ironclawdevs27

0
Subscribe to my newsletter

Read articles from Aryan Raj directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aryan Raj
Aryan Raj