It’s everywhere. Social Media is full of MCP experts, cheat servers, new servers, servers that serve other servers… There is really no escape from it.

You have an idea, maybe, of what this thing is and it seems like it is the new, shiny kid on the block but you still do not know how to make use of it.

Don't worry, I've got you covered.

Think of MCP as the smart cousin of regular APIs – it's like giving your AI a really good memory and the ability to ask for exactly what it needs, when it needs it. Did it make sense? No? Let’s try another way.

Picture this: You're at a restaurant, and you realize none of the items on the menu sound good. You kind of wish you could tell the waiter or better yet the chef, “Hey, I'm on a diet, I'm allergic to nuts, I had Italian food yesterday, and I'm feeling like something spicy but not too spicy because I have a meeting in an hour." and let the chef prepare you the perfect meal.

You see, our regular APIs are like the menu - especially a drive-through menu. You order whatever is on the menu and you do not get to customize any of the items.

MCP is like going to the restaurant and telling the chef what you want and getting a perfect and customized meal - just for you. Simple enough, right? But it goes a bit beyond this.

I understand if you suddenly feel like a snack. Go ahead, take that well-deserved break and make yourself a nice, healthy MCP-style sandwich. You being the chef in this case.

Oh good, you’re back. Let’s continue…

I Know What You Did Last Summer

If you're building anything using AI these days (and let's be honest, who isn't?), you've probably run into this problem: the AI models are incredibly smart, but your application using them ends up being forgetful. It's like having a brilliant consultant who can't remember what you discussed in your last meeting.

The issue is in how we integrate AI into our applications, not with the AI models themselves. LLMs are "forgetful" because of token limits and cost constraints, not because they're poorly designed.

You see, most LLMs have context windows (like 4K, 8K, 32K, or even 128K tokens). Once you hit that limit, older conversation history gets truncated.

Even with larger context windows, sending your entire conversation history with every API call gets expensive fast. If you're paying per token, a long conversation can cost serious money. Larger contexts also mean slower response times and higher computational costs.

Due to these limits, we can't afford to send the full context every time, so we lose older parts of the conversation. It’s all about balancing the cost vs performance, so we design our apps to be stateless to save money.

At this point, you may say “Huh! ChatGPT remembers old conversations now.”.

Well, I am glad you said that. You are right and here is what happens with ChatGPT:

Selective Summarization: Instead of keeping full conversation transcripts, they extract and store key facts, preferences, and important context in a compressed format. So instead of storing "I told you last week that I'm a software developer who works primarily in C# and prefers clean, readable code..." they store something like "User: software dev, prefers C#, values code readability.". This is much cheaper (token-wise) than recalling the entire dialog, and it can be updated or overwritten as needed.
Vector Embeddings: They convert important conversation snippets into embeddings and store them in a vector database (turn text into numbers that capture meaning). When you start a new conversation, they can quickly search for relevant past context without loading everything. It’s like a smart memory that remembers the vibe, not the exact words.
Tiered Memory: They do multi-level recall, such as, recent conversation history, medium-term summaries (key points from recent chats) and long-term facts (persistent user preferences and important info).

In other words, It’s like a smart memory that remembers the vibe, not the exact words. Each layer balances cost vs usefulness.
Smart Retrieval: When you mention something that might connect to past conversations, they do a similarity search to pull in relevant memories rather than loading everything. In other words, rather than always pulling in memory, the system uses triggers — like keywords, names, or question types — to know when to grab related stuff. It’s like a "just-in-time context loader." So, when you say, “I’m working on my project XYZ again”, the system says “Oh, hey, I remember Project XYZ from earlier - let me bring that context back.”

All of this making sense yet? Good, because conceptually MCP is very similar. There are some differences in intent and architecture.

Let’s take a look.

Model Context Protocol (MCP): TL;DR

Model Context Protocol is designed to manage and reuse context across calls to LLMs — especially when you have large tasks or conversations that won’t fit neatly into one prompt.

It’s basically a context management system that lets you:

Store relevant memory (as structured chunks)
Retrieve just what's needed at call time
Reduce prompt size
Keep model calls stateless but still feel stateful

It has a different architecture to how ChatGPT remembers past conversations. Without going into too much technical stuff, here are some of the differences:

Concept	ChatGPT	Model Context Protocol (MCP)
Memory Format	Key facts, user prefs, embeddings	Structured content blocks (often JSON or Markdown chunks)
Where Stored	Internal memory, sometimes vector DB	External vector DB or context store (like Pinecone, Weaviate, Chroma)
When Used	At each session start or when triggered	Dynamically retrieved per model call
Trigger for Retrieval	User prompt context	Manual control (you decide what's relevant per call)
Tiered?	Yes: short/mid/long-term	Usually yes, but you control it explicitly

If you are using MCP with a local or API LLM, you define your context and tag them with metadata like topic, keywords, etc. When you want an output from your LLM, you say something like “Give me the most relevant 5 contexts for this query.” You retrieve them and inject them into the prompt, such as in a ### Context: section.

This is semantic recall plus prompt engineering — exactly like what ChatGPT does internally, just made explicit.

It’s basically ChatGPT-style memory, but under your control.

flowchart TD
    A[User Message] --> B[Client App]
    B --> C[MCP Server]
    C --> D[SessionManager: Find/Create Session]
    D --> E[Retrieve Relevant Contexts]
    E --> F[Build Prompt]
    F --> G[Call LLM]
    G --> H[Response]
    H --> I[Return to User]

    CS[(Context Store)]
    E --> CS

Figure: High-Level Flow of an MCP-Based Request

OK, but show me the money code!

Of course. All this will make much more sense once we build something.

So, with that in mind, I'm going to show you how to build a simple MCP server using ASP.NET Core. We'll create a weather and news service because, apparently, that's what everyone does in tutorials these days.

Our solution will consist of two projects. An MCP Server which is basically an ASP.NET Core Web API App and the other being a console app which is a client to the server and will make API calls.

Alright, let’s go.

(Cue montage music. Maybe some drum roll. After all, we are building a server.)

Setting Up the Server

Other than the models, we have one controller and two services in our server app.

Let’s take a look at the controller:

using Microsoft.AspNetCore.Mvc;
using NotAnotherMCP.Models;
using NotAnotherMCP.Services;
using NotAnotherMCP.SessionManagement;

namespace NotAnotherMCP.Controllers;

[ApiController]
[Route("api/[controller]")]
public class McpController : ControllerBase
{
    private readonly NewsService _newsService;
    private readonly SessionManager _sessionManager;
    private readonly WeatherService _weatherService;

    public McpController(SessionManager sessionManager, WeatherService weatherService, NewsService newsService)
    {
        _sessionManager = sessionManager;
        _weatherService = weatherService;
        _newsService = newsService;
    }

    [HttpPost("context")]
    public async Task<IActionResult> RequestContext([FromBody] McpRequest request)
    {
        // Validate request
        if (request == null || string.IsNullOrEmpty(request.Type))
            return BadRequest(new McpResponse
            {
                Status = "error",
                Content = "Invalid request format"
            });

        // Process different context types
        var response = request.Type.ToLower() switch
        {
            "weather" => await _weatherService.GetWeatherContext(request),
            "news" => await _newsService.GetNewsContext(request),
            _ => new McpResponse
            {
                Status = "error",
                Content = $"Unknown context type: {request.Type}"
            }
        };

        response.ResponseId = Guid.NewGuid().ToString();
        return Ok(response);
    }

    [HttpPost("message")]
    public async Task<IActionResult> ProcessMessage([FromBody] McpMessage message)
    {
        // Validate message
        if (message == null || string.IsNullOrEmpty(message.Content))
            return BadRequest(new McpResponse
            {
                Status = "error",
                Content = "Invalid message format"
            });

        // Get or create session
        var session = _sessionManager.GetOrCreateSession(message.SessionId);

        // Add message to session
        session.Messages.Add(new SessionMessage
        {
            Role = message.Role,
            Content = message.Content,
            Timestamp = DateTime.UtcNow
        });

        // Process message
        var response = new McpResponse
        {
            ResponseId = Guid.NewGuid().ToString(),
            Status = "success",
            Content = await ProcessMessageContent(message.Content, session)
        };

        return Ok(response);
    }

    private async Task<string> ProcessMessageContent(string content, Session session)
    {
        // This is a very simplified "AI" response generator

        var contentLower = content.ToLower();

        if (contentLower.Contains("weather"))
        {
            var weatherRequest = new McpRequest
            {
                Type = "weather",
                Query = "current",
                Parameters = new Dictionary<string, string> { { "location", "Sydney" } }
            };

            var weatherContext = await _weatherService.GetWeatherContext(weatherRequest);
            return $"Based on the weather data I've retrieved: {weatherContext.Content}";
        }

        if (contentLower.Contains("news"))
        {
            var newsRequest = new McpRequest
            {
                Type = "news",
                Query = "latest",
                Parameters = new Dictionary<string, string> { { "category", "technology" } }
            };

            var newsContext = await _newsService.GetNewsContext(newsRequest);
            return $"Here's the latest news I found: {newsContext.Content}";
        }

        return "I understand your message. To demonstrate the MCP functionality, try asking about the weather or news.";
    }
}

This controller handles our MCP requests. There is not much to this code. It’s pretty much your standard controller that calls services.

The two services WeatherService and NewsService, simulate the weather and news API. In real life, you'd call something like OpenWeatherMap, but for our demo, we'll generate fake data and I do not know where you are living but that randomness we have in our fake data is way more reliable than actual weather forecasts.

Up until here, it is pretty much standard API stuff. A controller, a couple of services. But here is where MCP gets interesting.

We have a class called SessionManager. Remember, how we were complaining about AI forgetting the context in our conversations? We use this class to maintain conversation state so the AI can remember what happened before. Here is the code for this class:

using NotAnotherMCP.Models;

namespace NotAnotherMCP.SessionManagement;

public class SessionManager
{
    private readonly Timer _cleanupTimer;
    private readonly ConcurrentDictionary<string, Session> _sessions = new();

    public SessionManager()
    {
        // Clean up old sessions every hour
        _cleanupTimer = new Timer(CleanupOldSessions, null,
            TimeSpan.FromHours(1), TimeSpan.FromHours(1));
    }

    public Session GetOrCreateSession(string sessionId)
    {
        if (string.IsNullOrEmpty(sessionId)) sessionId = Guid.NewGuid().ToString();

        if (!_sessions.TryGetValue(sessionId, out var session))
        {
            session = new Session { Id = sessionId };
            _sessions[sessionId] = session;
        }

        return session;
    }

    private void CleanupOldSessions(object? state)
    {
        var cutoff = DateTime.UtcNow.AddHours(-24);
        var oldSessions = _sessions.Where(kvp =>
                kvp.Value.Messages.LastOrDefault()?.Timestamp < cutoff)
            .Select(kvp => kvp.Key)
            .ToList();

        foreach (var sessionId in oldSessions)
            _sessions.Remove(sessionId);
    }
}

and of course, the corresponding model:

namespace NotAnotherMCP.Models;

public class Session
{
    public string Id { get; set; }
    public List<SessionMessage> Messages { get; set; } = new();
    public Dictionary<string, object> Context { get; set; } = new();
}

Then in our controller when we are processing the message, we have something like;

 // Add message to session
        session.Messages.Add(new SessionMessage
        {
            Role = message.Role,
            Content = message.Content,
            Timestamp = DateTime.UtcNow
        });

Bam! Our code remembers.

In our simple approach, we keep the conversation state in a persistent class. In real world application we would use a persistent database storage, something like Redis Cache or a combination of both.

While our implementation has limitations, it should do nicely for our purposes. In any case, we added a cleanup routine that cleans up the old sessions every hour.

💡

One thing to watch out for - if you're running this in a web app with concurrent requests, you might want to consider thread safety. That’s why in the code above we are using ConcurrentDictionary - otherwise you'd need to protect every access to the dictionary

How This Is Different from Regular APIs

Here's the thing that makes MCP special (and why I'm not just wasting your time):

Regular API conversation:

Client: "Give me weather data."
Server: "Here's a JSON blob with 47 fields you don't need. I also added planetary positions and some astrology data for good measure."
Client: "Cool, I'll ignore 46 of them and the rest. I really just needed the temperature."

MCP conversation:

Client: "I need weather info for planning a picnic in Sydney"
Server: "It's 25°C and sunny – perfect picnic weather! Want me to suggest some good parks?"
Client: "Yes, and remember I asked about this for next time"
Server: “Here are some good parks for you…”
Client: “Great - can you also find me some good takeaway places?”

See the difference? MCP is like having a conversation with someone who actually listens and remembers what you care about.

Real-World Applications

This isn't just academic stuff. MCP is perfect for:

Customer support systems that need to remember what the customer called about last week
Personal assistants that actually know your preferences
Content management systems that understand the context of what you're working on
Any AI system that needs to be more than just a fancy search engine

Wrapping Up

MCP isn't going to replace regular APIs anytime soon, but it's a great tool when you need your AI systems to be more contextually aware. It's like the difference between a vending machine and a barista – both serve their purpose, but one actually cares about what you want.

The full code is available on my GitHub, and if you build something cool with it, let me know! I love seeing what people create, especially when it's more creative than my weather service example.

Now go forth and build something that actually remembers what you told it five minutes ago. Your users will thank you for it.

Here is the source code in case you missed it in the text above: https://github.com/tjgokken/NotAnotherMCP

Appendix: Give Me Tokens

One of the biggest challenges in building conversational AI systems is using up tokens if you are not careful. You see if you send the entire conversation history every time, you could easily hit 1000+ tokens per request for long conversations for conversation history. And then your new message and context. And then the response tokens.

At $0.01 per 1K tokens, a 20-message conversation could cost $0.02+ per request. That adds up fast!

You can deal with this issue in a few ways:

Sliding Window (most common): Only send the last n messages.
Conversation Summarization: Summarize older parts of the conversation
Semantic Compression: Keep only the most relevant parts
Context Truncation Strategy: Set a token budget and trim accordingly.

They all have pros and cons. Popular LLM engines use a combination of these approaches.

We can demonstrate this concept in our demo as:

private async Task<string> ProcessMessageContent(string content, Session session)
{
    // Smart context building to manage tokens
    var relevantContext = await GatherRelevantContext(content);
    var conversationSummary = BuildConversationSummary(session, maxMessages: 3);

    var response = await _llmService.GenerateResponseAsync(
        userMessage: content,
        context: relevantContext,
        conversationHistory: conversationSummary
    );

    return response;
}

private string BuildConversationSummary(Session session, int maxMessages)
{
    var recentMessages = session.Messages.TakeLast(maxMessages);
    return string.Join("\n", recentMessages.Select(m => $"{m.Role}: {m.Content}"));
}

The code above is just a representation - you can find the full implementation in the source code.

Building Your First Model Context Protocol Server in C#