AI Tool Summarizes YouTube Videos

In this article, we'll explore AI Agent, Telegram, and Discord nodes on n8n by building an AI-powered YouTube Video Summarizer automation workflow. This solution is perfect for people who enjoy learning from YouTube videos but don't have time to watch full-length content. The workflow generates video summaries, helping users decide whether watching the complete video would be worthwhile or not.

Prerequisites

YouTube Data API credentials
AI model API key
Notion API integration
n8n installation
Telegram API key
Heroku.

Setup

Here's how the workflow functions: When a user sends a YouTube video URL to a Telegram Bot, it triggers the workflow. First, an Information extractor LLM parses the video ID from the chat message. Using this ID, we retrieve the video's transcript by sending an HTTP request to a server that is responsible for retrieving the transcript.. Then, an AI model analyzes and summarizes the transcript content. The summary is saved to a Notion database, and finally, we send the results back to the user through the Telegram API.

Here is the complete picture of the workflow.

Setting up Telegram Bot

Setting up the telegram bot is a straightforward process though Telegram’s bot father. n8n has native support for the Telegram node, so we only need to provide the API key to start using it.

Setting up Notion

n8n also natively supports the Notion node. We need to create a new Notion database to store the video summaries. Don’t forget to active the API connection to the database so that n8n can access the database through API.

Setting up AI Prompts

For this setup, I am using Gemini AI model. However, it is not limited to Gemini AI. It is easy to change the model to any AI model.

I created two AI nodes in my workflow. The first node is a simple LLM to extract a YouTube video ID from the chat message. The second one is an AI Agent node. This node allows us to specify the model, memory, and tools. We use Gemini AI model as well to summarize the video using the transcript content passed from the previous. The node also connects to the YouTube API tool so that it is able to retrieve the video’s description and title, enhancing the result.

Here is the prompt I used for the first node.

You are an expert extraction algorithm.
Extract only the YouTube video ID from the provided text.
If you do not know the value of an attribute asked to extract, you may omit the attribute's value.

Here is the second prompt.

Role: You are an expert AI assistant specialized in analyzing video transcripts and generating concise, informative summaries.

Goal: Your primary goal is to create a clear and accurate summary of the provided video transcript, presented as a bulleted list. The summary should capture the main topics and key points of the video content, including references to important timestamps.

Context: You will receive a block of text representing the full transcript or captions of a YouTube video. This transcript may contain timestamps (e.g., [00:01:23], 0:05:10.234 --> 0:08:567) and the spoken words.

Instructions:

Get information regarding the Youtube Video. Use the Youtube API tool to retrieve the video title and description. 

Identify Main Topics: Read through the entire transcript to understand the core subjects discussed.

Extract Key Points: Pinpoint the most important statements, arguments, findings, or conclusions presented in the video.

Identify Key Timestamps: Note the approximate starting timestamps associated with the key points or main topic shifts you identify. Use the timestamps provided in the transcript.

Synthesize Information: Combine the main topics and key points into a coherent summary.

Include Timestamps: When mentioning a key point or topic in the summary, include its corresponding approximate start timestamp in parentheses (e.g., (0:02:15)).

Maintain Neutrality: Summarize the content objectively, without adding personal opinions or interpretations unless explicitly asked.

Focus on Content: Ignore conversational filler words (e.g., "um," "uh," "like," "you know") and repetitive phrases where possible, unless they are crucial to the meaning or tone. Focus on the substance of what is being said.

Desired Length (Optional - Adapt): Adjust the number of bullet points based on the desired level of detail (e.g., 3-5 points for a standard summary, more for detailed).

Output Format: Starts from the video Title followed by summarized video description. Then present the summary as a bulleted list. Each bullet point should represent a key topic or finding. Ensure timestamps are included parenthetically where relevant points are mentioned. Start directly with the bulleted list.

Input Data:

The video transcript will be provided below, enclosed in triple backticks or following a specific marker (e.g., "TRANSCRIPT START").

{{ $json.transcript_data }}

(Adjust the placeholder {{ $json.transcript_data }} based on how the transcript data is passed from the previous node in your n8n workflow. It might be $json.text, $input.item.json.transcript, etc.)

Example Interaction:

Input: (A full video transcript about making pasta, with timestamps)

Output (Bulleted List Summary):

The video provides a step-by-step guide on making fresh pasta from scratch (0:00:15).

It details the necessary ingredients, primarily flour and eggs (0:01:05).

The process involves mixing, kneading (0:02:30), and resting the dough (0:05:10).

Key techniques like using the right flour and sufficient kneading are emphasized for texture (0:04:00).

Demonstrates how to roll and cut the pasta into desired shapes, such as fettuccine (0:06:45).

Concludes by showing the final cooked pasta served with sauce (0:08:20).

Now, analyze the following transcript and generate the summary based on these instructions.

Retrieving YouTube Video Transcript

Retrieving YouTube Video transcript is quite challenging. YouTube Data API does not allow public users to retrieve video transcript unless you are the creator of the video. I have come across some outdated solutions which do not work anymore. So, this means I need to build my own custom solution.

One approach is to use Gemini AI directly by giving the video URL and asking the model to summarize it. This works in chat mode but not via an API call. For some reasons, the Gemini models gives the summary of a wrong video, and I don’t quite understand why this happens. So, I have to find another solution.

Fortunately, there are libraries that can retrieve transcripts or captions given a video ID. There is one library written in JavaScript and another library in Python . Since these are external libraries, it is not easy to use them within the n8n environment using the custom code node. It is not possible in Python because n8n uses Pyodide to provide Python support. This limits the available Python packages that can be used. It is possible though in JavaScript but it only works on a self-hosted n8n instance. This entails updating the environment variables and a Dockerfile, which is very technical.

So, I’ve come up with an idea to host a server on Heroku and install the python library. This approach does not require any changes on the n8n instance and it also works on both self-hosted and cloud-hosted n8n instances.

Another challenge is that YouTube bans IPs if there’s too many requests from those IPs. Fortunately, we can use a Proxy server as a workaround. One Proxy server provider that is recommended by the library’s author is WebShare. So I decide to try it and it works quite well.

If you want to setup this workflow yourself, you should try WebShare. I appreciate if you can use my referral link .

If you want to know the implementation detail, you can check my repository here .

Conclusion

The AI-powered YouTube Video Summarizer workflow using n8n offers a practical solution for individuals who want to efficiently learn from YouTube videos without watching them in full. By integrating various tools such as Telegram, Notion, and AI models, this workflow automates the process of extracting, summarizing, and storing video content. This not only saves time but also can be useful for future references. The flexibility of the system allows for further customization and extension, opening up possibilities for further creative applications.

What’s Next

We now have a automation workflow that lets you get a video summary and store it on Notion’s database. There are lots of possibilities for what you can do next. You can of course consume the information yourself, or feed the information to another AI agent or n8n workflow to perform other creative tasks, like recommending next topics for your social media posts or generating new articles. The sky is the limit here.

Let me know if you have some other ideas in the comments section below. If you have suggestions for new automation workflows that could be useful, I’d love to hear them.

AI-Powered Youtube Video Summarizer