Introduction

In today’s digital age, planning a vacation often involves extensive online research. YouTube has a lot of travel videos that give you helpful information about popular places to visit. However, sorting through hours of content to find relevant information can be a time-consuming task.

Here are some challenges with doing research in YouTube:

Information Overload: YouTube videos often contain a ton of information, making it difficult to identify key points.
Accessibility: Re-watching long videos repeatedly for specific details can be inefficient.
Organization: Converting unstructured video content into a structured travel guide requires significant effort.

I suggest using AI Agents to create travel guides from YouTube videos to make travel planning easier. By harnessing the power of LLMs, we can automate the following tasks:

Video Transcription: AI agents can accurately transcribe the audio content of YouTube videos, converting spoken words into text.
Keyword and Information Extraction: LLMs can identify key terms and phrases related to travel destinations, attractions, accommodations, and experiences, and extract relevant information from the transcribed text, such as locations, recommendations, and tips.
Content Summarization: AI agents can summarize the key points of each video, providing concise overviews of the content.
Guide Generation: Based on the extracted information the AI Agent can generate travel guides, including itineraries, recommendations, and tips.

In this blog, we will learn to build an AI Agent which will build a travel guide from Youtube Videos. We will use CrewAI which provides the ideal platform for building such AI agents. At the core of our AI agent lies the GPT-3.5-Turbo-0125 LLM model which processes natural language to analyze YouTube video content.

What is CrewAI?

CrewAI is an open-source multi-agent orchestration framework designed to facilitate the collaboration of autonomous AI agents. This Python-based platform allows agents to work together as a cohesive unit, or “crew,” to accomplish complex tasks efficiently. Each agent within CrewAI assumes a specific role, enabling them to delegate tasks, communicate, and utilize both existing and custom tools to achieve their goals. This framework aims to simplify the automation of multi-agent workflows, making it accessible for developers and researchers seeking to harness the power of collaborative AI.

Utilizing CrewAI for building AI agents offers several advantages that enhance the development and deployment of intelligent systems. Its modular design allows for easy customization and integration of various tools, enabling developers to create agents that can perform a wide range of tasks. The framework excels in facilitating seamless communication among agents, which is crucial for collaborative problem-solving and efficient task management. Additionally, CrewAI supports the use of open-source large language models (LLMs), allowing agents to leverage advanced reasoning capabilities.

CrewAI Core Concepts

Agents: These are independent units designed to perform tasks, make decisions, and communicate with other agents. They can use tools, ranging from simple search functions to complex integrations with other systems, APIs, etc.
Tasks: Tasks are jobs or assignments that an AI agent needs to complete. They can also include details like which agent should do it and what tools they might need.
Crew: A crew is a team of agents, each with a specific role, working together to achieve a common goal. Forming a crew involves gathering agents, defining their tasks, and setting up the order in which tasks will be done.

AI Agent Code Walkthrough

For our project we will require only these 2 dependencies:

pip install crewai crewai-tools

We’ve built four AI agents: three are tasked with extracting key information from YouTube videos, while the fourth, a writer agent, summarizes their findings into an inspiring travel guide in blog format.

The first extractor agent will gather information about the city’s tourist attractions from the video. The second agent will gather information on restaurants, while the third will focus on the city’s top shopping destinations.

Agent Code

city_local_guide = Agent(
   role='City Guide',
   goal='Provide indepth information on the tourist attraction in the city {city_name}',
   backstory="A knowledgeable local guide with extensive information "
       "about the city, it's attractions, and the best places to visit.",
   tools=[attraction_video_search],
   verbose=True
)

food_concierge = Agent(
   role='Food Concierge',
   goal='Provide indepth information on the best restaurants in the city {city_name}',
   backstory="A food enthusiast with a refined palate and extensive knowledge "
       "of the local food scene.",
   tools=[food_video_search],
   verbose=True
)

shopping_guide = Agent(
   role='Shopping Guide',
   goal='Suggest the best shopping destinations in the city {city_name}',
   backstory="A shopping expert with an eye for the latest trends and the best deals.",
   tools=[shopping_video_search],
   verbose=True
)

travel_writer = Agent(
   role='Travel Writer',
   goal='Write a creative and inspiring travel blog post on the city {city_name}',
   backstory="An experienced travel writer with a passion for exploring new destinations "
       "and sharing the experience with others.",
   allow_delegation=False,
   verbose=True
)

Let’s try to understand each of the Agent attributes:

role: Defines the agent’s function within the crew. It determines the kind of tasks the agent is best suited for.
goal: The individual objective that the agent aims to achieve. It guides the agent’s decision-making process.
backstory: Provides context to the agent’s role and goal, enriching the interaction and collaboration dynamics.
tools: Set of capabilities or functions that the agent can use to perform tasks.
allow_delegation: Agents can delegate tasks or questions to one another, ensuring that the most suitable agent handles each task. The default is True.
verbose: Setting this to True configures the internal logger to provide detailed execution logs, aiding in debugging and monitoring. The default is False.

Agent Definitions

City Local Guide: This agent embodies the role of an experienced city guide. Their primary goal is to share comprehensive information about tourist attractions in the specified city. The agent has a rich backstory as a long-time resident who has accumulated extensive knowledge about the city’s history, culture, and hidden gems. They’re passionate about showcasing the best the city has to offer, from iconic landmarks to off-the-beaten-path locations that only locals know about.
Food Concierge: Taking on the role of a culinary expert, this agent aims to guide visitors through the city’s gastronomic landscape. Their goal is to provide detailed insights into the best dining experiences the city has to offer. The agent’s backstory paints them as a passionate food lover with a sophisticated palate, who has spent years exploring the local food scene. They’re familiar with everything from high-end restaurants to beloved hole-in-the-wall eateries and can recommend dishes that truly capture the city’s flavors.
Shopping Guide: This agent serves as a knowledgeable shopping companion. Their goal is to direct visitors to the most rewarding shopping destinations in the city. The agent’s backstory describes them as a fashion-forward individual with a keen eye for trends and value. They’ve explored every corner of the city’s shopping districts, from luxury boutiques to local markets, and can offer advice on where to find the best deals, unique items, and latest fashion trends.
Travel Writer: This agent embodies the role of an experienced travel journalist. Their goal is to craft creative and inspiring blog posts about the specified city, capturing its essence and allure. The agent’s backstory portrays them as a seasoned globetrotter with a passion for storytelling. They’ve traversed diverse landscapes and cultures, developing a keen eye for the unique character of each destination. Drawing from their wealth of experiences, they weave engaging narratives that blend personal anecdotes with insightful observations, aiming to spark wanderlust in their readers and showcase the city’s hidden gems and vibrant atmosphere.

Agent Tool Definitions

All three extraction agents are equipped with a YouTube Video Search Tool, each assigned to different videos relevant to their specific extraction topics. Below is the definition of the YoutubeVideoSearchTool used.

attraction_video_search = YoutubeVideoSearchTool(
    youtube_video_url="https://www.youtube.com/watch?v=wlKic6yTUUs")
food_video_search = YoutubeVideoSearchTool(
    youtube_video_url="https://www.youtube.com/watch?v=W6W1vxdH9IE")
shopping_video_search = YoutubeVideoSearchTool(
    youtube_video_url="https://www.youtube.com/watch?v=CuJ531KNmxs")

YoutubeVideoSearchTool is part of the crewai_tools package and is designed to perform semantic searches within YouTube video content, utilizing Retrieval-Augmented Generation (RAG) techniques. Users can target their search on a specific YouTube video by providing its video URL in the youtube_video_url argument.

Now let’s discuss the tasks that have been assigned to each of these agents.

Task Code

city_tourist_info_task = Task(
   description='Provide point wise name and description on the tourist attractions in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the top tourist attractions in the city, including historical sites, museums, and landmarks.',
   agent=city_local_guide
)

food_recommendation_task = Task(
   description='Provide point wise name and description about the recommend best restaurants in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the curated list of the top restaurants in the city, including local favorites and hidden gems.',
   agent=food_concierge
)

shopping_recommendation_task = Task(
   description='Provide point wise name and description about the best shopping destinations in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the list of the best shopping destinations in the city, including malls, markets, and boutiques.',
   agent=shopping_guide
)

travel_writing = Task(
   description='Write an exhaustive engaging travel blog post on the city {city_name}, based on the information provided by the local guide, food concierge, and shopping guide.',
   expected_output='A 1500 words captivating travel blog post formatted in markdown that highlights the unique attractions, food, and shopping experiences in the city.',
   agent=travel_writer,
   context=[city_tourist_info_task, food_recommendation_task, shopping_recommendation_task],
   output_file='blog-posts/dubai_tour_guide.md'
)

Let’s first go through the task attributes:

description: A clear, concise statement of what the task entails.
expected_output: A detailed description of what the task’s completion looks like.
agent: The agent responsible for the task, assigned either directly or by the crew’s process.
output_file: Saves the task output to a file.
context: Specifies tasks whose outputs are used as context for this task.

Task Definitions

Let’s examine each task in detail.

City Tourist Info Task: This task focuses on providing comprehensive information about tourist attractions in the city. The agent responsible is the City Local Guide. The task requires a point-wise list of attractions with descriptions, aiming for a 1000-word detailed summary covering top sites, including historical places, museums, and landmarks.
Food Recommendation Task: Assigned to the Food Concierge agent, this task involves creating a curated list of the city’s best restaurants. It calls for a point-wise compilation of restaurant names and descriptions, culminating in a 1000-word summary that highlights both popular eateries and hidden culinary gems.
Shopping Recommendation Task: The Shopping Guide agent handles this task, which involves listing and describing the best shopping destinations in the city. The expected output is a 1000-word detailed summary covering various shopping venues, from malls to markets and boutiques.
Travel Writing Task: This task is assigned to the Writer agent and aims to create an engaging travel blog post about the city. It synthesizes information from the previous three tasks (tourist info, food, and shopping recommendations). The expected output is a 1500-word captivating post in markdown format, offering a comprehensive guide to the city’s attractions, dining scene, and shopping experiences. The final product will be saved as a markdown file.

Note the context attribute in the Travel Writing Task, its value is the list of 3 tasks: city_tourist_info_task, food_recommendation_task, and shopping_recommendation_task.

The context field makes sure that the output of these tasks is used as context for the writing task. If the context attribute is not set then an incomplete report will be generated containing extraction information from only one task.

In CrewAI the output of one task serves as the context for the next, so unless explicitly mentioned it will only take the output of one task in context.

Defining Crew

Next, we need to integrate the Agents and their respective Tasks, for that we would need to define a crew. A crew is a coordinated group of agents working collectively to complete a series of tasks. Each crew outlines the strategy for task execution, agent collaboration, and the overall workflow.

city_guide_crew = Crew(
   agents=[city_local_guide, food_concierge, shopping_guide, travel_writer],
   tasks=[city_tourist_info_task, food_recommendation_task, shopping_recommendation_task, travel_writing],
   manager_llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
   process=Process.hierarchical,
   verbose=True
)

Let’s first understand the crew attributes:

agents: A list of agents that are part of the crew.
tasks: A list of tasks assigned to the crew.
manager_llm: The language model used by the manager agent in a hierarchical process. Required when using a hierarchical process.
process: The process flow (e.g., sequential, hierarchical) the crew follows.
verbose: The verbosity level for logging during execution.

We need to discuss a couple of things here, first is the process. CrewAI currently supports two primary process types: sequential and hierarchical. In a sequential process, the tasks are executed one after another, allowing for a linear flow of work. In a hierarchical process, the manager agent coordinates the crew, delegating tasks and validating outcomes before proceeding.

In our crew, we are using a hierarchical process. This process emulates a corporate hierarchy, CrewAI allows specifying a custom manager agent or automatically creates one, requiring the specification of a manager language model (manager_llm). This agent oversees task execution, including planning, delegation, and validation. Tasks are not pre-assigned; the manager allocates tasks to agents based on their capabilities, reviews outputs, and assesses task completion.

It is mandatory to define a manager_llm or manager_agent when using a hierarchical process. In our example, we have used a GPT-3.5-turbo LLM model as the manager_llm.

Executing Crew

Once the crew is assembled, we can initiate the workflow with the kickoff() method. This starts the execution process according to the defined process flow.

city_info = {
   'city_name': 'Dubai'
}


city_guide_crew.kickoff(inputs=city_info)

The kickoff function accepts an inputs dictionary as an argument, which includes the city name for which we want to build the travel guide.

Response

The traveling guide generated by the Crew is shown below. Note that the report is based solely on the YouTube video information. It does not use information from any other sources.

Application Code

Below is the full code of this application.

import os
from crewai import Agent, Task, Crew, Process
from crewai_tools import YoutubeVideoSearchTool
from langchain_openai import ChatOpenAI

os.environ["OPENAI_MODEL_NAME"] = 'gpt-3.5-turbo-0125'

attraction_video_search = YoutubeVideoSearchTool(youtube_video_url="https://www.youtube.com/watch?v=wlKic6yTUUs")
food_video_search = YoutubeVideoSearchTool(youtube_video_url="https://www.youtube.com/watch?v=W6W1vxdH9IE")
shopping_video_search = YoutubeVideoSearchTool(youtube_video_url="https://www.youtube.com/watch?v=CuJ531KNmxs")

city_local_guide = Agent(
   role='City Guide',
   goal='Provide indepth information on the tourist attraction in the city {city_name}',
   backstory="A knowledgeable local guide with extensive information "
       "about the city, it's attractions, and the best places to visit.",
   tools=[attraction_video_search],
   verbose=True
)

food_concierge = Agent(
   role='Food Concierge',
   goal='Provide indepth information on the best restaurants in the city {city_name}',
   backstory="A food enthusiast with a refined palate and extensive knowledge "
       "of the local food scene.",
   tools=[food_video_search],
   verbose=True
)

shopping_guide = Agent(
   role='Shopping Guide',
   goal='Suggest the best shopping destinations in the city {city_name}',
   backstory="A shopping expert with an eye for the latest trends and the best deals.",
   tools=[shopping_video_search],
   verbose=True
)

travel_writer = Agent(
   role='Travel Writer',
   goal='Write a creative and inspiring travel blog post on the city {city_name}',
   backstory="An experienced travel writer with a passion for exploring new destinations "
       "and sharing the experience with others.",
   allow_delegation=False,
   verbose=True
)


city_tourist_info_task = Task(
   description='Provide point wise name and description on the tourist attractions in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the top tourist attractions in the city, including historical sites, museums, and landmarks.',
   agent=city_local_guide
)

food_recommendation_task = Task(
   description='Provide point wise name and description about the recommend best restaurants in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the curated list of the top restaurants in the city, including local favorites and hidden gems.',
   agent=food_concierge
)

shopping_recommendation_task = Task(
   description='Provide point wise name and description about the best shopping destinations in the city {city_name}',
   expected_output='A detailed summary of 1000 words on the list of the best shopping destinations in the city, including malls, markets, and boutiques.',
   agent=shopping_guide
)

travel_writing = Task(
   description='Write an exhaustive engaging travel blog post on the city {city_name}, based on the information provided by the local guide, food concierge, and shopping guide.',
   expected_output='A 1500 words captivating travel blog post formatted in markdown that highlights the unique attractions, food, and shopping experiences in the city.',
   agent=travel_writer,
   context=[city_tourist_info_task, food_recommendation_task, shopping_recommendation_task],
   output_file='blog-posts/dubai_tour_guide.md'
)


# With sequential task execution the agents will be executed in order where the output of one agent will be the input of the next agent.
city_guide_crew = Crew(
   agents=[city_local_guide, food_concierge, shopping_guide, travel_writer],
   tasks=[city_tourist_info_task, food_recommendation_task, shopping_recommendation_task, travel_writing],
   manager_llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
   process=Process.hierarchical,
   verbose=True
)

city_info = {
   'city_name': 'Dubai'
}

city_guide_crew.kickoff(inputs=city_info)

Conclusion

This blog has explored the potential of AI to revolutionize travel planning. By leveraging the CrewAI framework and the GPT-3.5-Turbo-0125 language model, we have demonstrated how an AI agent can effectively extract, summarize, and organize travel-related information from YouTube videos.

The YouTubeVideoSearchTool, a key component of this system, enables semantic searches within YouTube video content, ensuring accurate and relevant results. The agent's ability to extract information and generate detailed summaries of tourist attractions, restaurants, and shopping destinations offers a valuable resource for travelers seeking personalized recommendations.

As AI technology continues to advance, we can expect even more sophisticated and innovative applications in the travel industry. By harnessing the power of AI, we can create personalized, efficient, and enjoyable travel experiences.

If you have any further questions or would like to explore this topic in more depth, please feel free to leave a comment below.

I frequently create content on building LLM applications, AI agents, vector databases, and other RAG-related topics. If you’re interested in similar articles, consider subscribing to my blog.

If you’re in the Generative AI space or LLM application domain, let’s connect on Linkedin! I’d love to stay connected and continue the conversation. Reach me at: linkedin.com/in/ritobrotoseth

Building Travel Guides from YouTube using CrewAI Multi-Agent

Table of contents