Demystifying AI Agents: Building One from Scratch with Gemini Flash

Monideep MistryMonideep Mistry
5 min read

• What Is an AI Agent?

An AI agent is not just a chatbot — it’s a software entity that can perceive, reason, act, and adapt based on its environment and goals. Imagine Siri, Google Assistant, or Iron Man’s JARVIS — these aren’t just answering questions, they’re taking actions for you.

An AI agent:

‣ Understands what you want (intent)

‣ Makes a plan to achieve it

‣ Uses tools (like files, APIs, or commands)

‣ Responds with results


• Why Do We Need AI Agents?

We live in a world filled with repetitive digital tasks: creating files, organizing folders, running commands, writing code, etc. Now imagine having an assistant that can:

‣ Understand your natural language
‣ Break it down step-by-step
‣ Use tools like terminal commands or file editors
‣ Give you structured, predictable results

That’s what an AI agent can do for you.

• Key Advantages of AI Agents

FeatureBenefit
Autonomous ActionsExecutes tasks without manual input
Natural Language InputSpeak normally — it understands
ReasoningPlans before acting
Tool UsageCan run code, commands, and create files
Chain-of-ThoughtTransparent steps: Start → Plan → Action → Observe → Output

• How I Built My Own AI Agent Using Gemini API

Yes — I built a custom AI assistant (think JARVIS) that can reason, run terminal commands,create web applications, and write files using structured step-based thinking, powered by Gemini 1.5 Flash. How I Built My Own AI Agent Using Gemini API

• Tools I Used

Python

Gemini 1.5 Flash API

google.generativeai

‣ Terminal (for executing commands)

‣ Custom tool functions like write_file, run_command

• Step-by-Step Breakdown

  1. Setting Up the Gemini Model
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel(model_name='gemini-1.5-flash')
chat = model.start_chat(history=[
    {
        "role": "user",
        "parts": [f"You are a helpful assistant. Here's your role: {SYSTEM_PROMPT}"]
    }
])

The SYSTEM_PROMPT tells the model to think and respond like a smart assistant — breaking responses into:

Start: Understanding

Plan: Strategy

Action: Tool usage

Observe: Feedback

Output: Final response

→ Here’s a sample SYSTEM_PROMPT

  1. Designing the Chain-of-Thought Format

Here's what a Gemini response looks like:

- {"step": "Start", "content": "User asked to create a folder"}
- {"step": "Plan", "content": "Use mkdir command"}
- {"step": "Action", "function": "run_command", "input": "mkdir myFolder"}
- {"step": "Observe", "content": "Folder created"}
- {"step": "Output", "content": "Your folder is ready!"}

This format is amazing for transparency and debugging!

  1. Parsing Responses Using Python
def parse_input_string(s):
    parsed_data = []
    current_obj = None
    lines = s.splitlines()

    for line in lines:
        stripped_line = line.strip()
        if stripped_line.startswith('- {'):
            if current_obj is not None:
                # Process completed object
                fixed = current_obj.replace('\n', '\\n')
                parsed_data.append(json.loads(fixed))
            current_obj = stripped_line[1:].strip()
        else:
            if current_obj is not None:
                current_obj += '\n' + line

    # Process last object
    if current_obj is not None:
        fixed = current_obj.replace('\n', '\\n')
        parsed_data.append(json.loads(fixed))

    return parsed_data
  1. Building Tool Functions

My assistant can:

‣ Create files using write_file(path, content)

‣ Run terminal commands using run_command(command)

All mapped neatly:

available_tools = {
    "write_file": write_file,
    "run_command": run_command,
}

We can add more tools if we want. Simply we have to create a function of the utility tool and map it in available_tools dictionary .

Example: If we want to fetch the current weather of any city, simply we have to create a get_weather(city) function using any API and map it in the available_tools dictionary.

def get_weather(city: str):
    url = f"https://wttr.in/{city}?format=%C+%t"
    headers = {
    "User-Agent": "Mozilla/5.0"
      }
    try:
        response = requests.get(url,headers=headers)
        if response.status_code == 200:
            return f"The weather in {city} is {response.text.strip()}."
        return f"❌ Couldn't fetch weather for {city} (Status code: {response.status_code})"
    except Exception as e:
        return f"⚠️ Error fetching weather: {e}"
available_tools = {
    "write_file": write_file,
    "run_command": run_command,
    "get_weather":get_weather,
}
  1. Briefing a short description of the tools to the assistant

Now, in SYSTEM_PROMPT we need to add short descriptions of our tools.

*** Available Tools:
- "run_command": Executes terminal/command-line instructions on the user's system.
- "write_file": Writes multi-line content to a specified file.
- "get_weather": Fetches current weather data for a given city.
  1. Core Chat Loop
# Start conversation
print("🚩 Your assistant is READY!")

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit", "bye", "see you soon", "brb", "see ya", "sleep"]:
        print("👋 Assistant: See you soon!")
        break

    response = chat.send_message(user_input)
    response_string = response.text

    # # # Print raw response
    print("🤖 Assistant:\n",response_string)

    parsed_data = parse_input_string(response_string)
    print(parsed_data)

    observed_output = None
    for data in parsed_data:
        if data['step'] == 'Start':
            print('▶ Start: ',data['content'])

        elif data['step'] == 'Plan':
            print('💡 Plan: ',data['content'])

        elif data['step'] == 'Observe':
            observed_output = data['content']
            print("👀 Observe: ",observed_output)

        elif data['step'] == 'Action':
            # Tool Calling
            if data['function'] in available_tools:
                print(f"🚀 Action: Calling {data['function']} tool!")
                tool_function = available_tools[data['function']]
                if data['function'] == 'write_file':
                    file_path = data['input'].get('path')
                    if observed_output:
                        write_file(path=file_path,content=observed_output)
                    else:
                        observed_output = data['input'].get('content')
                        write_file(path=file_path,content=observed_output)
                else:
                    tool_input = data['input']
                    observed_output = tool_function(tool_input)
            else:
                print(f"{data['function']} tool NOT AVAILABLE!")

        elif data['step'] == 'Output':
            print("🎯 Final Output:", data['content'])

• Example Interaction

You: Create a folder named blog-agent and inside it create an index.html file.

Agent:

▶ Start: The user wants to create a blog-agent folder and an HTML file.
💡 Plan: Create folder → create file
🚀 Action: mkdir blog-agent
👀 Observe: Folder created.
🚀 Action: write index.html inside blog-agent
👀 Observe: File created.
🎯 Output: blog-agent/index.html created!

Watch the demo video of the Gemini AI Agent in action:

View demo on Google Drive

• What All This Agent Can Do ?

‣ Create folders and files
‣ Run shell commands
‣ Customize outputs
‣ Respond in chain-of-thought reasoning
‣ Integrate with Git or Node.js setups

• Want to Try It Yourself?

I’ll soon publish the full source code on my GitHub. Until then, follow these steps and build your own agent. Trust me — it's one of the most satisfying AI projects you’ll do.

• Let’s Connect!

Found this useful?
Drop a comment or DM me on LinkedIn.
I love chatting about Generative AI, Machine Learning,Data Science, and real-world agent builds.

0
Subscribe to my newsletter

Read articles from Monideep Mistry directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Monideep Mistry
Monideep Mistry