Building Your Own LLM Agent From Scratch

Nischal NikitNischal Nikit
12 min read

👋 Intro

Let’s be real for a second. It takes effort to learn something from scratch. You have all kinds of thoughts hitting you at the same time like whether what you’re learning is worth the effort or will you even finish what you started. The list is endless. In case of something like AI, things are moving so fast each day, you already feel you’re way behind before even starting.

I had all the same feelings before starting this and well, those feelings are still there haha, though I can now build a few stuff with AI. Let me bring you on the same boat and let’s build something for real :)

There are a few prerequisites before we begin:

  • Clone this starter-kit repo.

  • Install all the node_modules with npm install (Node.js v20+ is preferred).

  • You need an OpenAI Account - You’ll need to add $5 minimum, to get started.

  • Enable GPT-4o-mini in, go to limits -> allowed models.

That’s pretty much it. let’s get the proverbial 🍞.


📜 Primer

Let’s get down to the basics:

An LLM (Large language model) is a deep learning model trained on vast amounts of text data. It learns patterns in a language with a clear objective, to predict the next most likely token (word/set of words) in a sequence.

Usually, LLMs break a text into multiple tokens(sub-words/characters) with a typical context window of 2k-120k. Needless to say, more the context window, better the model. Then, it learns the statistical patterns in the input text and moves to the final part of inference process where it takes new, unseen data, processes it through its transformer layer and uses its learned patterns to generate tokens as the response.

The paper which introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014. It is the main architecture of large language models like those based on GPT. :
"Attention Is All You Need" paper (original Transformer paper)

Enough talk. Let’s code :)


🌎 Hello Word

Let’s write our first LLM code akin to the infamous “hello world”. This should be fun :)

  1. Add the API key in your .env file.

     OPENAI_API_KEY='[your api key]'
    
  2. Create a One-Off LLM Call function called runLLM. This function simply takes a single input message from the user and returns the LLM response.

    The one new thing here is the temperature parameter which controls the randomness of the text generated by the LLM. The temperature value ranges from 0 to 2.

     import type { AIMessage } from '../types'
     import { openai } from './ai'
    
     export const runLLM = async ({
       model = 'gpt-4o-mini',
       messages,
       temperature = 0.1,
     }: {
       messages: AIMessage[]
       temperature?: number
       model?: string
     }) => {
       const response = await openai.chat.completions.create({
         model,
         messages,
         temperature
       })
    
       return response.choices[0].message
     }
    
  3. Now inside the index.ts file, let’s use this runLLM function:

     import 'dotenv/config'
     import {runLLM} from './src/llm'
    
     const userMessage = process.argv[2]
    
     if (!userMessage) {
       console.error('Please provide a message')
       process.exit(1)
     }
    
     const response = await runLLM({
         messages: [{role: 'user', content: userMessage}]
     })
    
     console.log(response)
    
  4. Time for some action! Head to your system terminal iTerm/command prompt (I personally use Warp) and navigate to the project directory. Run the following command:

     npm run start "hello"
    

Voila! if everything went well, you will see your first LLM response through an api.


💭 Chat Memory

So far so good. We managed to build a one-off chat LLM response. However, it’s limitations can be found pretty quickly when you need some context from previous texts in the conversation.

This is what’s called Chat-Based Interactions. Here, not only the LLM will do text completions but will maintain conversation history, understand context from previous conversations. Its the same principal to what ChatGPT, Claude operates upon.

Now, you maybe thinking now that an LLM needs to maintain the context by saving the chat history, how does it store all the messages from the beginning and if so, does it need to differentiate between texts based on who sent it in the first place?

I’m so glad you asked :)

This is where Chat Memory comes in. It’s primary job is to maintain the collection of previous messages in the conversation thus keeping the context retained across multiple interactions with the LLM in a conversation.

It uses the following paradigm to maintain the conversation context:
PS. The roles ‘user’ and ‘assistant’ are self-explanatory. We’ll get to ‘system’ and ‘tool’ roles pretty soon.

// System message - Sets behavior
{
  role: 'system',
  content: 'You are a helpful assistant...'
}

// User messages - Human inputs
{
  role: 'user',
  content: 'What is the weather like?'
}

// Assistant messages - LLM responses
{
  role: 'assistant',
  content: 'Let me check the weather for you',
  tool_calls: [...]
}

// Tool messages - Function results
{
  role: 'tool',
  content: '{"temp": 72, "conditions": "sunny"}',
  tool_call_id: 'call_123'
}

Let’s add some memory to our runLLM function now:

  1. Add the following code in the memory.ts file. We are using low db here. It creates a local JSON database in node projects hassle-free. The methods we are interested here are addMessages and getMessages

     import { JSONFilePreset } from 'lowdb/node'
     import type { AIMessage } from '../types'
     import { v4 as uuidv4 } from 'uuid'
    
     export type MessageWithMetadata = AIMessage & {
       id: string
       createdAt: string
     }
    
     export const addMetadata = (message: AIMessage): MessageWithMetadata => ({
       ...message,
       id: uuidv4(),
       createdAt: new Date().toISOString(),
     })
    
     export const removeMetadata = (message: MessageWithMetadata): AIMessage => {
       const { id, createdAt, ...messageWithoutMetadata } = message
       return messageWithoutMetadata
     }
    
     type Data = {
       messages: MessageWithMetadata[]
     }
    
     const defaultData: Data = { messages: [] }
    
     export const getDb = async () => {
       const db = await JSONFilePreset<Data>('db.json', defaultData)
    
       return db
     }
    
     export const addMessages = async (messages: AIMessage[]) => {
       const db = await getDb()
       db.data.messages.push(...messages.map(addMetadata))
       await db.write()
     }
    
     export const getMessages = async () => {
       const db = await getDb()
       return db.data.messages.map(removeMetadata)
     }
    
  2. Update the index.ts file to incorporate the memory feature now.

     import 'dotenv/config'
     import {runLLM} from './src/llm'
    
     const userMessage = process.argv[2]
    
     if (!userMessage) {
       console.error('Please provide a message')
       process.exit(1)
     }
    
     await addMessages([
       {
         role: 'user',
         content: userMessage,
       },
     ])
    
     const loader = showLoader('Thinking...')
    
     const history = await getMessages()
    
     const response = await runLLM({
       messages: history,
     });
    
     await addMessages([response])
    
     loader.stop()
     return getMessages()
    
     console.log(response)
    
  3. If everything went well, you can now ask the LLM multi-part questions with context being maintained from previous interactions. Kudos, our LLM is a tad-bit more smarter now :)


🤖 An LLM with the ability to make decisions

We made our LLM able to retain the context of our conversation. However, it’s still not intelligent enough. Don’t believe me? Ask something like “What is the weather today?” and see for yourself.

This is primarily due to the fact it’s not LLM’s primary function to make decisions or get to an end goal. But precisely where an Agent comes in.

An LLM Agent is a more enhanced LLM which can simply put, make decisions*. It can make those decisions by keeping the context from previous interactions, has the inherent ability to chose the right external tool when it needs help to try and get to an end goal and be functioning in a loop until it reaches that goal.
*
It won’t be wrong to say that It’s an LLM on steroids. :)

Some real world examples of AI agents that you can check out now:

  • HubSpot's Service Hub

  • Intercom's Resolution Bot

  • Zendesk Answer Bot

Time to bring agent into our runLLM function:

  1. Add the following code in our index.ts file:

    ```javascript import 'dotenv/config' import { runAgent } from './src/agent'

const userMessage = process.argv[2]

if (!userMessage) { console.error('Please provide a message') process.exit(1) }

const messages = await runAgent({ userMessage, })


2. Create a `runAgent` function in our `agent.ts` file:

    ```typescript
    import type { AIMessage } from '../types'
    import { runLLM } from './llm'
    import { z } from 'zod'
    import { addMessages, getMessages, saveToolResponse } from './memory'
    import { logMessage, showLoader } from './ui'

    export const runAgent = async ({
      userMessage,
    }: {
      userMessage: string
    }) => {
      await addMessages([
        {
          role: 'user',
          content: userMessage,
        },
      ])

      const loader = showLoader('Thinking...')


      const history = await getMessages()
      const response = await runLLM({
        messages: history,
      })

      await addMessages([response])

      logMessage(response)
        loader.stop()
      return getMessages()
    }
  1. Modify the runAgent function to enable agent loop. This makes sure our agent will run in loop until it gets to an end goal. This is especially needed when our agent has to perform multiple functions and input of a function depends on the output of a previous function. It’s engineered to solve problems such as these:

    Example: "Book me a trip to New Delhi"

    • Search flights.

    • Check hotel availability.

    • Compare prices.

    • Make bookings.

    • Send confirmations.

This task requires multiple operations i.e., gathering information, processing the result, making a decision and then taking further actions. We’ll get to functions in a moment.

Modify the agent.ts file:

    import type { AIMessage } from '../types'
    import { runLLM } from './llm'
    import { z } from 'zod'
    import { runTool } from './toolRunner'
    import { addMessages, getMessages, saveToolResponse } from './memory'
    import { logMessage, showLoader } from './ui'

    export const runAgent = async ({
      turns = 10,
      userMessage,
      tools = [],
    }: {
      turns?: number
      userMessage: string
      tools?: { name: string; parameters: z.AnyZodObject }[]
    }) => {
      await addMessages([
        {
          role: 'user',
          content: userMessage,
        },
      ])

      const loader = showLoader('Thinking...')

      while (true) {
        const history = await getMessages()
        const response = await runLLM({
          messages: history,
          tools,
        })

        await addMessages([response])

        logMessage(response)

        if (response.content) {
          loader.stop()
          return getMessages()
        }

        if (response.tool_calls) {
          //tooling will come here...
        }
      }
    }

🛠️ Function Calling and Tooling

Now, since we have our end-to-end LLM structure built. Let’s add the missing piece in the puzzle with Function calling. As we observed previously, that agents have the capability to make decisions on what external helper entities they can use to solve a particular query and generate response, this is precisely the method what we were referring to. An LLM makes use of function calling to take help from external tools when it infers that it can’t solve an issue just by itself or the context provided till now.

Function calling allows LLMs to:

  • Convert natural language into structured function calls.

  • Select appropriate functions based on user intent.

  • Format parameters according to function specifications.

There is a process with which an LLM goes through in order to decide the “correct” external tool to take help from with function calling:

  1. Intent Recognition

    • LLM analyses user message for action intent.

    • Matches intent against function descriptions.

    • Evaluates parameter availability.

  2. Function Matching

    • Clear matches: "What's the weather?" → get_weather.

    • Ambiguous matches: LLM chooses based on context.

    • No matches: Regular response without function call.

To connect the dots, AI Tools are precisely those “external” functions which help fulfil a specific purpose and are called upon by LLM whenever a use case comes up. Let’s see it in action:

  1. Add the following code in the llm.ts file. This enables our runLLM function to consume tools as parameters.

    ```typescript import { zodFunction } from 'openai/helpers/zod' import { z } from 'zod' import type { AIMessage } from '../types' import { openai } from './ai'

export const runLLM = async ({ model = 'gpt-4o-mini', messages, temperature = 0.1, tools, }: { messages: AIMessage[] temperature?: number model?: string tools?: { name: string; parameters: z.AnyZodObject }[] }) => { const formattedTools = tools?.map((tool) => zodFunction(tool)) const response = await openai.chat.completions.create({ model, messages, temperature, tools: formattedTools, tool_choice: 'auto', parallel_tool_calls: false, })

return response.choices[0].message }


2. Modify the `agent.ts` file to enable function calling.

    ```typescript
    import type { AIMessage } from '../types'
    import { runLLM } from './llm'
    import { z } from 'zod'
    import { runTool } from './toolRunner'
    import { addMessages, getMessages, saveToolResponse } from './memory'
    import { logMessage, showLoader } from './ui'

    export const runAgent = async ({
      turns = 10,
      userMessage,
      tools = [],
    }: {
      turns?: number
      userMessage: string
      tools?: { name: string; parameters: z.AnyZodObject }[]
    }) => {
      await addMessages([
        {
          role: 'user',
          content: userMessage,
        },
      ])

      const loader = showLoader('Thinking...')

      while (true) {
        const history = await getMessages()
        const response = await runLLM({
          messages: history,
          tools,
        })

        await addMessages([response])

        logMessage(response)

        if (response.content) {
          loader.stop()
          return getMessages()
        }

        if (response.tool_calls) {
          const toolCall = response.tool_calls[0]
          loader.update(`executing: ${toolCall.function.name}`)

          const toolResponse = await runTool(toolCall, userMessage)
          await saveToolResponse(toolCall.id, toolResponse)

          loader.update(`executed: ${toolCall.function.name}`)
        }
      }
    }
  1. We would also need to add a new method in memory.ts file. This enables storing the tool response in our local db.

     export const saveToolResponse = async (
       toolCallId: string,
       toolResponse: string
     ) => {
       return await addMessages([
         { role: 'tool', content: toolResponse, tool_call_id: toolCallId },
       ])
     }
    
  2. Let’s create a tool called generateImage inside a new file called generateImage.ts inside a new folder called tools. Add the following code in the same file:

     import type { ToolFn } from '../../types'
     import { openai } from '../ai'
     import { z } from 'zod'
    
     export const generateImageToolDefinition = {
       name: 'generate_image',
       parameters: z
         .object({
           prompt: z
             .string()
             .describe(
               'The prompt to use to generate the image with a diffusion model image generator like Dall-E'
             ),
         })
         .describe('Generates an image and returns the url of the image.'),
     }
    
     type Args = z.infer<typeof generateImageToolDefinition.parameters>
    
     export const generateImage: ToolFn<Args, string> = async ({
       toolArgs,
       userMessage,
     }) => {
       const response = await openai.images.generate({
         model: 'dall-e-3',
         prompt: toolArgs.prompt,
         n: 1,
         size: '1024x1024',
       })
    
       const imageUrl = response.data[0].url!
    
       return imageUrl
     }
    
  3. Now, we will need to create a runTool method inside the toolRunner.ts file. This is where the logic for our tool selection will sit which will later be used in our runLLM function to get the tool response.

     import type OpenAI from 'openai'
     import { generateImage } from './tools/generateImage'
    
     export const runTool = async (
       toolCall: OpenAI.Chat.Completions.ChatCompletionMessageToolCall,
       userMessage: string
     ) => {
       const input = {
         userMessage,
         toolArgs: JSON.parse(toolCall.function.arguments),
       }
       switch (toolCall.function.name) {
         case 'generate_image':
           const image = await generateImage(input)
           return image
         default:
           throw new Error(`Unknown tool: ${toolCall.function.name}`)
       }
     }
    
  4. Inside the tools folder, we will need an index.ts file to collect all out tool definitions. It will go something like this:

     import { generateImageToolDefinition } from './generateImage'
    
     export const tools = [
       generateImageToolDefinition,
       redditToolDefinition,
       dadJokeToolDefinition,
     ]
    
  5. Final step! Let’s integrate our tools definitions list to the runAgent function.

     import 'dotenv/config'
     import { runAgent } from './src/agent'
     import { tools } from './src/tools'
    
     const userMessage = process.argv[2]
    
     if (!userMessage) {
       console.error('Please provide a message')
       process.exit(1)
     }
    
     const messages = await runAgent({
       userMessage,
       tools,
     })
    
  6. Time to take it for a spin! You can go ahead and paste this following command in your local terminal.

     npm start "Hi! Create an image of ancient romans learning chat-gpt"
    

    Did you get a response with an image URL? With LLMs you get all sorts of crazy results but it should be there or there about.


✨ Conclusion and Why this matters

And that’s pretty much it! Through this read, we were (hopefully) able to build an ai agent that’s:

  • Chat-based

  • Maintains context

  • Works in a loop with a goal to solve the problem.

  • Leverages tools with function calling.

There are tons of ways we can improve this agent flow more and take it to the next level through RAGs(Retrieval Augmented Generation), Advanced Memory Management, Human-in-the-Loop Function Calling, Evals(Evaluation Frameworks), Advanced Tool Usage and many more.

More of that later :)

I really hope this guide helped you create your first AI agent. If you have any questions, feel free to ask me in the comments or on my socials and share your thoughts! Keep building.

0
Subscribe to my newsletter

Read articles from Nischal Nikit directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nischal Nikit
Nischal Nikit

Hey! I am Nischal. 😁 What I am doing these days - Building things for the web/mobile. Learning product development. Currently Exploring everything AI.