AI Security: Vulnerabilities and Challenges

Not a lot of people are talking about it, the security of a large percentage of agentic systems is weak.

The truth is that, at this point, only a few people care about the security of AI systems and workflows. The tricky part is that to compromise these systems, most of the time, you don’t have to be too technical or too smart, all you need is the ability to use words well to manipulate.

AI systems being vulnerable to prompt injection is actually a critical issue that is quite overlooked. As interconnectivity expands, vulnerability becomes more impactful across different areas. The other day, we explored how tool poisoning could be used to hijack MCP-based systems and be used to exfiltrate data.

The danger that vulnerability to prompt injection presents is unprecedented, and it will be the downfall of many AI systems. It is similar to SQLi injections, which caused a lot of damage to infrastructure in the late 2000s. Prompt injection is actually easier and more dangerous. It can be used to hijack systems and get them to carry out malicious intent.

Why This Matters Now

AI is rapidly being integrated into healthcare, finance, defense, education, and critical infrastructure. With the surge in agentic AI and autonomous systems, the cost of failure has skyrocketed. It’s no longer just about chatbots misbehaving, it’s about entire systems being manipulated to leak confidential data, approve fraudulent transactions, or give false diagnoses. The silence on this issue is worrying.

How Do We Protect This Achilles’ Heel?

I did a lot of digging and brainstorming trying to figure out the best way to protect against the vulnerabilities in AI systems, and here are some tips you could apply to protect yourself:

1. NEVER GIVE THE SYSTEM EXPLICIT ACCESS TO SENSITIVE DATA:

Try your best to ensure your AI system doesn’t have direct access to sensitive data like API keys, database credentials, user tokens, private documents, internal tools, or any privileged commands. The reason is simple: AI systems can be manipulated. If a malicious prompt gets through and it will; it can convince the system to reveal, forward, or act on that data in ways you didn’t anticipate.

Don’t assume that just because your model is running “server-side” or “behind the scenes,” it’s safe. Prompt injection works because it abuses the logic of the system, not just the access level. Even something as seemingly harmless as asking the AI to “summarize recent logs” could be exploited to make it leak confidential info if those logs include tokens or personal data.

Use indirection wherever possible. Instead of handing the AI the key, build middleware APIs or isolated services that require validation or limit what can be exposed. Keep AI in a read-only context with strict filters and redactions, especially when dealing with anything production-facing or sensitive.

At the end of the day, your AI should act like someone on a need-to-know basis and most times, it doesn’t need to know.

2. Flag Malicious Prompts with Classifiers:

Classifiers are systems that are used to determine if a prompt is malicious or not but here’s the tricky part: you can’t always distinguish a malicious prompt from a normal one, and this can dwarf the efficiency of the entire system. When genuine prompts get blocked, it annoys genuine users and locks them out from using the system.

Using Classifiers is one sure fire way to keep malicious prompts from being executed by your AI system, but there has to be a balance to where you use it. So, you don’t end up pissing off your actual users for genuine prompts.

3. Sandbox Actions:

Sandbox everything, every prompt, especially those that lead to the system taking actions. Those actions should be executed in a sandbox. That way, the system’s internal structure will be protected if the prompt turns out to be malicious.

We need to start building with security in mind not as an afterthought. Developers, startups, and researchers must adopt the mindset that every AI system is a potential attack surface. Just like how web developers learned the hard way to sanitize input fields, AI builders need to assume prompts can be hostile.

This requires new tooling, new awareness, and perhaps most importantly, a community-wide shift in how we think about AI capabilities vs. AI safety.

Security is the Achilles’ heel of AI systems.

It’s not flashy, it’s not trendy but it’s what will determine whether these systems thrive or collapse under the weight of exploitation. We’re building the future, but unless we start securing it, we’re also building the cracks it will fall through.

Let’s not repeat the same mistakes we made with the early web.

Security: The Achilles’ Heel of AI Systems

Why This Matters Now

How Do We Protect This Achilles’ Heel?

Subscribe to my newsletter

Paul Fruitful

Paul Fruitful