Video Link

AI Agent Discussion - Opening Remarks and AI Agent Related Terms

Series Article Catalog

A total of six articles

Catalogue

AI Agent Discussion -- Core Elements of Local Agents

AI Agent Discussion -- From Where to Go to Towards the Universe

AI Agent Discussion -- Let Politics Be a Stable Cornerstone (To be continued)

AI Agent Discussion -- Irreplaceable Art Creation (To be continued)

AI Agent Discussion -- Labor Market Transformation Self-Help (To be continued)

AI Agent Discussion -- How to Realize Gundam Haro (To be continued)

Tips: LLM, AI Agent, Token, Prompt, RAG, MCP

Foreword

[AI Agent] ←→ (LLM + RAG + MCP)

This article will explore the concept and constituent elements of AI Agent, especially how to implement an intelligent application locally. It will introduce how Large Language Models (LLM), Retrieval Augmented Generation (RAG), and Model Context Protocol (MCP) work together to build a powerful local AI Agent. Such Agents can run directly on personal computers without cloud dependence, handling voice interaction, understanding semantics, and executing tasks, bringing infinite possibilities to future intelligent living.

Term	Definition
AI Agent	An intelligent entity capable of perceiving the environment, making decisions, and performing actions, aiming to complete specific tasks.
LLM (Large Language Model)	A large language model, acting as the "brain" of the Agent, responsible for understanding natural language, generating responses, and performing reasoning.
RAG (Retrieval Augmented Generation)	A method that combines information retrieval with language generation, allowing LLM to refer to external knowledge bases to generate more accurate responses.
MCP (Model Context Protocol)	Model Context Protocol, an interface used for Agent to communicate with external devices or services, such as voice input/output, hardware control, etc.
Token	The basic unit when a language model processes text, which can be a word, root, or character.
Prompt	Instructions or context provided to the language model to guide it to generate specific types of responses.

AI Agent = LLM + RAG + MCP

The core of AI Agent is LLM, which provides language understanding and generation capabilities. RAG enables LLM to have a richer knowledge background, especially in scenarios requiring precise information (although RAG may not be needed in simple ordering applications). MCP is the "hands and feet" of the Agent, enabling it to interact with various tools and interfaces in the real world, such as speech recognition, speech synthesis, and even future hardware control.

This architecture allows developers to build a fully functional AI Agent locally, realizing human-computer interaction, automated task processing, and laying the foundation for the popularization of personalized intelligent assistants in the future.

Tip: The following is the main content of this article, and the following cases can be used for reference

1. The Philosophy of Choosing Large Language Models (LLM)

In the development of local AI Agents, choosing the right LLM is a crucial step. There are many different AI models on the market, such as GPT, Claude, and Gemini. These models not only differ in speed, but more importantly, their overall style, capabilities, and way of thinking are also different.

You can imagine each model as an animal with different "genes," because the data, architecture, and goals used in their training are different, so they have developed unique "personalities."

For example: (personal opinion)

"Fanged and Clawed" Hunter-type Model: Good at quickly searching for data and grasping key points.
"Winged" Bird-type Model: Capable of divergent thinking, extending imagination, and developing creativity.
"Flexible Hands and Feet" Tool Master-type Model: Proficient in calling APIs, executing processes, and acting as an assistant.

Therefore, in the future, when choosing a model, we should not just ask "which is the strongest," but rather ask: "I want to do this, which model is most suitable?" This is a very important aspect of AI tool application. In addition, many optimized small language models for professional use will emerge in the future, and even with only 1-2B parameters, they can fully meet specific needs.

Taking the voice ordering Demo of a dumpling shop AI Agent as an example, since it needs to call MCP tools, a local model that is good at using tools must be selected. At the same time, in order to pursue fast response, we chose a smaller model that supports tools, currently small models that support MCP Tool are around 7-8B, such as LLaMA and Qwen.

Although they can seemingly call tools through the MCP protocol to perform tasks, in reality, their styles differ greatly.

LLaMA is more like someone who gets things done: As soon as it receives a task, it first thinks "what tools can I use to solve it?" It has strong initiative, like growing tentacles, and can immediately operate the system and call tools to execute tasks.
Qwen is like a very creative brain: It is good at expanding ideas, extending semantics, and handling open-ended questions, like flying freely in the language sky, suitable for tasks that require imagination or creativity. (Personal opinion, for reference only)

This illustrates: Different AI models have differences in design philosophy, and they are suitable for different application scenarios. We cannot expect to solve all problems with one model, but rather to "choose the right tool for the right job."

When developing the voice ordering system, we chose LLaMA as the main model because this scenario does not require the model to "use its imagination." We only need it to stably call the correct tools to complete the task and allow customers to order smoothly. At this time, LLaMA's style is particularly suitable.

2. Understanding the Importance of "Token" and "Role Prompt"

❓"Is it better to set more Tokens?"

When using large language models (LLM), we often hear a keyword called "token." You can imagine it as a small unit that AI uses to understand sentences.

For example, if you enter this sentence: 👉"The weather is refreshing today"

AI may break it down into several tokens, like this:

Today
Weather
Is
Refreshing This totals 4 tokens. Each token will occupy AI's "thinking space" and "computing cost."

So, a question arises: Is it better to set more tokens? For servers, usually yes. But for local AI Agent applications, the answer is: not necessarily. Since local models are usually smaller, they are more prone to hallucinations. And for local AI Agents, energy consumption and efficiency also need to be considered:

More tokens → slower processing → more resource consumption.

Hallucinations not only generate incorrect information but also consume unnecessary energy.

"Role Prompt": Defining AI's Work and Behavior

In addition to tokens, there is another super important thing called "role prompt."

Simply put, a prompt is what you tell the AI its job is at the beginning. It's like writing an "onboarding manual" for it, clearly telling it:

👉"You are now at the counter, responsible for taking customer orders. Remember to speak kindly, understand what the customer wants to eat, and then correctly help them check out."

This prompt will determine the AI's tone, style, and how it handles tasks, which is very crucial!

For example, when developing a voice counter ordering machine, the most common problem encountered is: The customer ordered a set meal, but the AI calculated the price as if it were à la carte, and the amount was completely wrong...

At first, you might suspect: "Is there a problem with the RAG setting? Is something not connected properly?"

Sometimes the issue is that the prompt wasn't clear enough. The AI didn’t know when to apply the set meal price versus individual item pricing, because it wasn’t explicitly told which pricing method to use in each situation.

3. LLM Combined with Other Technologies: Building Small Smart Robots

Many developments are now moving in one direction: LLM + RAG + MCP = Small Smart Robot.

Taking "Local AI Smart Ordering Machine" as an example, this concept is easy to understand:

LLM (Large Language Model) is the "brain" of the entire system, responsible for understanding what customers say and how to respond.

RAG (Knowledge Retrieval) is used to help LLM find information, just like fragments of the brain's memory. However, if it's just ordering, RAG may not necessarily be needed, because you can directly provide the "menu" as a prompt to the LLM beforehand, and it will know what meals can be ordered.

MCP (Model Context Protocol) is the channel for "external communication," such as:

Listening to voice: Using speech recognition tools to convert what the customer says into text.
Playing voice: Asking the AI to read out the response.
Cash register MCP: Directly helping customers check out.

When these three parts (LLM + RAG + MCP) are integrated, it is essentially a local small AI robot, except that it cannot walk or wave, but it can interact with people, understand voice, and help process tasks.

4. Humans and AI: A Symbiotic Future

Now that AI language models are progressing rapidly, many people worry: "Will robots replace humans in the future?" But I want to say:

Humans themselves are super-efficient designs: Whether you believe we were created by God or aliens, or evolved naturally—the human body has long been "optimized" in terms of energy consumption, flexibility, and self-repair, making it the most cost-effective "full-function creature" on Earth today.

So, rather than saying AI or robots will "replace humans," it's better to say that they are indispensable technologies for civilization to move towards interstellar travel, helping us complete more repetitive and cumbersome tasks.

5. Labor Market Transformation and Preparation

With the evolution of the times, the transformation of work patterns seems inevitable. Currently, what will directly affect jobs is not the walking and jumping robots, but the AI assistants that handle miscellaneous tasks. This kind of AI can help send emails, arrange schedules, and even automatically write reports or process data. One person with an AI assistant can complete the work that used to require two or three people. Therefore, companies are likely to reduce labor demand as a result.

However, there's no need to worry too much. This is part of the progress of the times. As long as we understand how to cooperate and work with these AIs, we will become more competitive. Moreover, in the next article, we will tell you that in the future, there will actually be more job opportunities than now, but we need to learn different skills.

Conclusion

AI Agents are rapidly evolving, no longer limited to the cloud, but becoming localized intelligent partners around us. Imagine an AI Agent that can assist with ordering on a laptop; although the speed needs to be improved at present, there will be significant progress in the next few years. These Agents are not complex engineering, but are realized by integrating existing software tools, and can be easily built even without programming background.

Currently, these local AI Agent systems are like toddlers—not fast enough, and may misunderstand intentions, but their potential is infinite. The popularization of this technology heralds a key shift: AI Agent is becoming a fundamental technology, not just a tool. It will profoundly affect our careers, industries, politics, and even art. This "AI Agent Discussion" series aims to delve into these changes, depicting a future where AI and robots not only do not replace humans, but instead become indispensable technologies that drive civilization towards greater goals, helping us cope with complex and arduous challenges.

AI Agent Discussion - Core Elements of Local Agents

Table of contents