Context Engineering for AI Agents: Key Lessons from Manus

Context SpaceContext Space
6 min read

Context engineering is emerging as one of the most critical disciplines in AI development, yet it remains largely experimental. Unlike traditional software engineering, where best practices have been established over decades, context engineering is still in its "Wild West" phase—full of trial, error, and hard-won insights.

On July 18, 2025, Yichao “Peak” Ji, Co‑Founder and Chief Scientist of Manus AI, shared their production experiences from building real-world AI agents, offering a rare glimpse into the practical realities of context engineering at scale. Their insights, earned through "four complete framework rebuilds," provide valuable lessons for anyone serious about building production AI systems.

The Performance Reality: KV-Cache as the North Star

Perhaps the most striking insight from Manus is their emphasis on KV-cache hit rate as the single most important metric for production AI agents. This isn't just a technical optimization—it's a fundamental architectural constraint that shapes everything.

Why KV-Cache Matters More Than You Think

In production AI agents, the context grows with every step while outputs remain relatively short. Manus reports an average input-to-output token ratio of around 100:1—dramatically different from typical chatbot scenarios. This makes prefix caching not just useful, but essential for economic viability.

The numbers are stark: with Claude Sonnet, cached tokens cost $0.30/MTok while uncached tokens cost $3.00/MTok—a 10x difference. For a system processing millions of interactions, this isn't just about performance; it's about survival.

Three KV-Cache Principles

Manus's approach reveals three core principles:

  1. Stable Prefixes: Even a single token difference can invalidate the entire cache downstream. Avoid dynamic elements like timestamps in system prompts.

  2. Append-Only Context: Never modify previous actions or observations. Ensure deterministic serialization—even JSON key ordering matters.

  3. Explicit Cache Breakpoints: When manual cache management is required, carefully place breakpoints to account for cache expiration patterns.

This represents a shift in thinking: context engineering isn't just about what information to include, but how to structure it for maximum reusability.

Tool Management: The "Explosion" Problem

As AI agents become more capable, they naturally accumulate more tools. Manus highlights what they call the "tool explosion" problem—where an agent's expanding toolkit actually makes it less effective, not more.

The Paradox of Choice

The core insight is counterintuitive: more tools can make your agent dumber. As the action space grows, models are more likely to select suboptimal actions or take inefficient paths. This is particularly problematic in systems that allow user-configurable tools.

Masking vs. Removal

Manus's solution is elegant: instead of dynamically removing tools (which breaks KV-cache), they mask tool availability using logits manipulation. This approach:

  • Preserves cache coherence by keeping tool definitions stable
  • Prevents confusion from referring to undefined tools
  • Allows fine-grained control over action spaces based on context

Their use of consistent tool prefixes (browser_*, shell_*) enables efficient group-based masking without complex state management.

Memory Architecture: Beyond Context Windows

Even with 128K+ context windows, Manus discovered that traditional context management isn't sufficient for complex, multi-step tasks. Their solution treats the file system as the ultimate context—unlimited, persistent, and directly manipulable by the agent.

Recoverable Compression

Rather than irreversible context truncation, Manus implements "recoverable compression" strategies:

  • Web page content can be dropped if the URL is preserved
  • Document contents can be omitted if file paths remain accessible
  • All compression maintains the ability to restore information when needed

This approach recognizes a fundamental truth: you can't predict which piece of information will become critical ten steps later.

Attention Management: The Art of Recitation

One of Manus's most interesting discoveries involves attention manipulation through recitation. Their agents create and continuously update todo.md files—not just for organization, but as a deliberate mechanism to guide model attention.

Fighting "Lost in the Middle"

With an average of 50 tool calls per task, maintaining focus becomes critical. By reciting objectives at the end of the context, Manus pushes the global plan into the model's recent attention span, reducing goal drift and misalignment.

This technique demonstrates how natural language can be used to bias model behavior without architectural changes—a form of "soft attention control."

Error Handling: Embracing Failure

Perhaps counterintuitively, Manus advocates for keeping error information in context rather than cleaning it up. Failed actions and stack traces provide crucial learning signals that help models avoid repeating mistakes.

Error Recovery as Intelligence Indicator

Manus argues that error recovery is "one of the clearest indicators of true agentic behavior," yet it's underrepresented in academic benchmarks that focus on success under ideal conditions. This highlights a gap between research and production realities.

Pattern Breaking: The Few-Shot Trap

A surprising insight involves the dangers of excessive few-shot prompting in agent contexts. While few-shot examples improve individual LLM outputs, they can create harmful patterns in multi-step agent scenarios.

The Rhythm Problem

Language models are excellent pattern matchers. If the context contains many similar action-observation pairs, the model may fall into a "rhythm," repeating actions because that's what it sees, even when suboptimal.

Manus's solution involves structured variation—introducing controlled randomness in serialization templates, phrasing, and formatting to break potentially harmful patterns.

The Meta-Lesson: Context Engineering as Experimental Science

Beyond specific techniques, Manus's experience reveals that context engineering is fundamentally an experimental science. Their team rebuilt their framework four times, each iteration revealing new insights about how to shape context effectively.

They term their approach "Stochastic Gradient Descent"—a combination of architecture searching, prompt refinement, and empirical testing. This isn't elegant, but it reflects the current reality of the field.

Implications for the Industry

Several broader lessons emerge:

  1. Performance First: Production context engineering must prioritize cache efficiency and cost optimization from day one.

  2. Stability Over Flexibility: Consistent, predictable structures often outperform dynamic, "intelligent" systems.

  3. Embrace Messiness: Real-world agent behavior includes errors, repetition, and suboptimal paths—design for this reality.

  4. Memory Externalization: Traditional context windows, no matter how large, need supplementation with external memory systems.

  5. Attention is Architecture: How you structure information is as important as what information you include.

Looking Forward: The Maturation of Context Engineering

Manus's experiences point toward context engineering evolving from an art into a science. Their systematic approach to identifying and solving production challenges provides a roadmap for others building serious AI systems.

Key areas for continued development include:

  • Standardized Metrics: Beyond task success rates to include cache efficiency, attention management, and error recovery
  • Tool Architecture: Better patterns for managing large, dynamic tool ecosystems
  • Memory Systems: More sophisticated approaches to external memory and context compression
  • Performance Optimization: Techniques that balance capability with computational efficiency

The Path Forward

The transition from experimental AI demos to production-grade agents requires this kind of systematic thinking about context engineering. Manus's willingness to share their hard-won insights accelerates the entire field's learning curve.

For teams building their own AI agents, these lessons offer a starting point for avoiding common pitfalls. More importantly, they demonstrate that context engineering success comes from careful measurement, systematic experimentation, and willingness to rebuild when better approaches emerge.

The future of AI agents will be built by teams that understand these production realities. Context engineering may still be experimental, but it's no longer optional.


Further Reading:

The field is young, the challenges are real, and the opportunities are enormous. The question isn't whether context engineering will become critical—it's whether you'll learn these lessons through experimentation or through others' experience.

0
Subscribe to my newsletter

Read articles from Context Space directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Context Space
Context Space