Demystifying AI Agents: Building One from Scratch with Gemini Flash


• What Is an AI Agent?
An AI agent is not just a chatbot — it’s a software entity that can perceive, reason, act, and adapt based on its environment and goals. Imagine Siri, Google Assistant, or Iron Man’s JARVIS — these aren’t just answering questions, they’re taking actions for you.
An AI agent:
‣ Understands what you want (intent)
‣ Makes a plan to achieve it
‣ Uses tools (like files, APIs, or commands)
‣ Responds with results
• Why Do We Need AI Agents?
We live in a world filled with repetitive digital tasks: creating files, organizing folders, running commands, writing code, etc. Now imagine having an assistant that can:
‣ Understand your natural language
‣ Break it down step-by-step
‣ Use tools like terminal commands or file editors
‣ Give you structured, predictable results
That’s what an AI agent can do for you.
• Key Advantages of AI Agents
Feature | Benefit |
Autonomous Actions | Executes tasks without manual input |
Natural Language Input | Speak normally — it understands |
Reasoning | Plans before acting |
Tool Usage | Can run code, commands, and create files |
Chain-of-Thought | Transparent steps: Start → Plan → Action → Observe → Output |
• How I Built My Own AI Agent Using Gemini API
Yes — I built a custom AI assistant (think JARVIS) that can reason, run terminal commands,create web applications, and write files using structured step-based thinking, powered by Gemini 1.5 Flash. How I Built My Own AI Agent Using Gemini API
• Tools I Used
‣ Python
‣ Gemini 1.5 Flash API
‣ google.generativeai
‣ Terminal (for executing commands)
‣ Custom tool functions like write_file
, run_command
• Step-by-Step Breakdown
- Setting Up the Gemini Model
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel(model_name='gemini-1.5-flash')
chat = model.start_chat(history=[
{
"role": "user",
"parts": [f"You are a helpful assistant. Here's your role: {SYSTEM_PROMPT}"]
}
])
The SYSTEM_PROMPT
tells the model to think and respond like a smart assistant — breaking responses into:
‣ Start
: Understanding
‣ Plan
: Strategy
‣ Action
: Tool usage
‣ Observe
: Feedback
‣ Output
: Final response
→ Here’s a sample SYSTEM_PROMPT
- Designing the Chain-of-Thought Format
Here's what a Gemini response looks like:
- {"step": "Start", "content": "User asked to create a folder"}
- {"step": "Plan", "content": "Use mkdir command"}
- {"step": "Action", "function": "run_command", "input": "mkdir myFolder"}
- {"step": "Observe", "content": "Folder created"}
- {"step": "Output", "content": "Your folder is ready!"}
This format is amazing for transparency and debugging!
- Parsing Responses Using Python
def parse_input_string(s):
parsed_data = []
current_obj = None
lines = s.splitlines()
for line in lines:
stripped_line = line.strip()
if stripped_line.startswith('- {'):
if current_obj is not None:
# Process completed object
fixed = current_obj.replace('\n', '\\n')
parsed_data.append(json.loads(fixed))
current_obj = stripped_line[1:].strip()
else:
if current_obj is not None:
current_obj += '\n' + line
# Process last object
if current_obj is not None:
fixed = current_obj.replace('\n', '\\n')
parsed_data.append(json.loads(fixed))
return parsed_data
- Building Tool Functions
My assistant can:
‣ Create files using write_file(path, content)
‣ Run terminal commands using run_command(command)
All mapped neatly:
available_tools = {
"write_file": write_file,
"run_command": run_command,
}
We can add more tools if we want. Simply we have to create a function of the utility tool and map it in available_tools
dictionary .
Example: If we want to fetch the current weather of any city, simply we have to create a get_weather(city)
function using any API and map it in the available_tools
dictionary.
def get_weather(city: str):
url = f"https://wttr.in/{city}?format=%C+%t"
headers = {
"User-Agent": "Mozilla/5.0"
}
try:
response = requests.get(url,headers=headers)
if response.status_code == 200:
return f"The weather in {city} is {response.text.strip()}."
return f"❌ Couldn't fetch weather for {city} (Status code: {response.status_code})"
except Exception as e:
return f"⚠️ Error fetching weather: {e}"
available_tools = {
"write_file": write_file,
"run_command": run_command,
"get_weather":get_weather,
}
- Briefing a short description of the tools to the assistant
Now, in SYSTEM_PROMPT
we need to add short descriptions of our tools.
*** Available Tools:
- "run_command": Executes terminal/command-line instructions on the user's system.
- "write_file": Writes multi-line content to a specified file.
- "get_weather": Fetches current weather data for a given city.
- Core Chat Loop
# Start conversation
print("🚩 Your assistant is READY!")
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit", "bye", "see you soon", "brb", "see ya", "sleep"]:
print("👋 Assistant: See you soon!")
break
response = chat.send_message(user_input)
response_string = response.text
# # # Print raw response
print("🤖 Assistant:\n",response_string)
parsed_data = parse_input_string(response_string)
print(parsed_data)
observed_output = None
for data in parsed_data:
if data['step'] == 'Start':
print('▶ Start: ',data['content'])
elif data['step'] == 'Plan':
print('💡 Plan: ',data['content'])
elif data['step'] == 'Observe':
observed_output = data['content']
print("👀 Observe: ",observed_output)
elif data['step'] == 'Action':
# Tool Calling
if data['function'] in available_tools:
print(f"🚀 Action: Calling {data['function']} tool!")
tool_function = available_tools[data['function']]
if data['function'] == 'write_file':
file_path = data['input'].get('path')
if observed_output:
write_file(path=file_path,content=observed_output)
else:
observed_output = data['input'].get('content')
write_file(path=file_path,content=observed_output)
else:
tool_input = data['input']
observed_output = tool_function(tool_input)
else:
print(f"{data['function']} tool NOT AVAILABLE!")
elif data['step'] == 'Output':
print("🎯 Final Output:", data['content'])
• Example Interaction
‣ You: Create a folder named blog-agent
and inside it create an index.html
file.
‣ Agent:
▶ Start: The user wants to create a blog-agent folder and an HTML file.
💡 Plan: Create folder → create file
🚀 Action: mkdir blog-agent
👀 Observe: Folder created.
🚀 Action: write index.html inside blog-agent
👀 Observe: File created.
🎯 Output: blog-agent/index.html created!
Watch the demo video of the Gemini AI Agent in action:
• What All This Agent Can Do ?
‣ Create folders and files
‣ Run shell commands
‣ Customize outputs
‣ Respond in chain-of-thought reasoning
‣ Integrate with Git or Node.js setups
• Want to Try It Yourself?
I’ll soon publish the full source code on my GitHub. Until then, follow these steps and build your own agent. Trust me — it's one of the most satisfying AI projects you’ll do.
• Let’s Connect!
Found this useful?
Drop a comment or DM me on LinkedIn.
I love chatting about Generative AI, Machine Learning,Data Science, and real-world agent builds.
Subscribe to my newsletter
Read articles from Monideep Mistry directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
