It is 2025. Everybody's releasing their MCP Server. I, on the other hand, was avoiding it like Neo avoided the red pill. But just like in The Matrix, you can only dodge inevitability for so long. And I built one. For BigQuery. Even though Google's MCP Toolbox existed and had support for BigQuery. Why? Let's dive in…

WTF even is MCP?

I'll admit it - until I started building one myself, I didn't fully understand what MCP was trying to solve. Talk about living under the rock. Primarily because it seemed like such a large mountain to climb, and honestly, the documentation made my brain hurt.

Here's what finally made it click for me: MCP (Model Context Protocol) is basically Anthropic's way of saying "hey, let's standardize how AI assistants talk to your stuff." Your databases, your APIs, your tools - everything.

Picture this: You're one of those people who has very specific requirements for everything. Your morning routine involves exactly 2.5 tablespoons of this specific coffee grind, steamed oat milk at precisely the right temperature, stirred counterclockwise three times (don't judge). You decide to hire an assistant to handle this.

Your assistant is brilliant but has no clue how to use any of your gadgets. The fancy espresso machine, the smart home system, the automated sock-sorting device (yes, that's a thing in my hypothetical world). Without MCP, you'd have to teach each AI assistant how to use each tool individually - different interfaces, different authentication, different everything.

With MCP, you create one "universal manual" that any AI assistant can read. Suddenly, Claude, ChatGPT, or whatever AI assistant comes next can all use your tools through the same standardized interface.

That's when the lightbulb went off: this wasn't just another tech buzzword - it was solving a real integration nightmare that's only getting worse as more AI assistants show up to the party.

Why I decided to reinvent the wheel (spoiler: Google's version has issues)

Now, before you start typing "just use Google's MCP Toolbox, you fool" in the comments, let me explain why I went down this rabbit hole despite a perfectly functional solution already existing.

Google's BigQuery MCP connector works... but it's like that friend who's helpful but comes with a lot of baggage.

First, Google themselves treat it like a science experiment. They literally plaster "BETA" warnings everywhere with the classic "expect breaking changes in future versions" disclaimer. Nothing says "production-ready" like a vendor telling you they might completely change everything tomorrow. I've been burned by Google deprecating products before (cough Google Podcasts cough).

Second, it's about as granular as a sledgehammer. Want to give your AI access to sales data but not employee records? Tough luck. The AI gets access to ALL tables the service account can see. There's no way to say "hey, only show the marketing tables to this user, and only the finance tables to that user."

Third, the data leakage potential gave me anxiety. The only workaround is spinning up multiple MCP instances with different service accounts. So instead of one clean connector, you're managing what I like to call "a zoo of MCP servers." Elegant? About as elegant as duct tape.

But here's what really pushed me over the edge: When you have lots of tables (and BigQuery projects are basically table hoarders), the AI consistently picks the wrong ones. And this isn't Google's fault - it's a fundamental quirk of how LLMs work.

The more options you give them, the worse they get at picking the right one. It's like having a really smart friend who always grabs the first tool they see in your toolbox, even when they need a screwdriver and they picked up a hammer. Your "customer_analytics_2024" table gets ignored while "test_backup_data_2019" gets selected just because it appeared first in the list.

No amount of clever sorting fixes this - alphabetical, by importance, by creation date, even random order. The model will still lean toward the first few options like a kid reaching for candy.

That's when I thought, "You know what? I bet I can try to solve this differently”.

What I built instead

A quick note: I won't be sharing code for this work as it belongs to my employer. I can share the high-level concepts and architecture, but the specifics are proprietary.

Instead of throwing every table at the AI and hoping it picks the right one, I broke down the BigQuery interaction into five distinct tools with a focus on intelligent filtering and user-specific access control.

Tool 1: IAM-Based Content Visibility Controller This tool integrates with our organization's Identity and Access Management system to dynamically determine what datasets and tables a user can access. Rather than relying on service account permissions (which are static), it pulls the authenticated user's identity and cross-references it with our data governance policies in real-time. The tool queries our IAM provider's API to get group memberships and role assignments, then filters the available BigQuery resources accordingly. A user in the marketing group sees marketing datasets, finance users see finance data, and cross-functional analysts get broader access based on their role matrix.

Tool 2: Semantic Dataset Discovery This tool addresses the core table selection problem through semantic search over dataset metadata. It maintains an indexed representation of all dataset names, descriptions, and table schemas, then uses vector similarity search to match user queries with relevant datasets. Instead of presenting 200+ datasets alphabetically, it ranks them by semantic relevance to the user's intent. The key insight: this only works because our organization has invested in proper data cataloging with meaningful dataset names and comprehensive metadata.

Tool 3: Table Metadata Fetcher Once relevant datasets are identified, this tool retrieves table-level metadata including creation dates, last modified timestamps, row counts, and table descriptions. It essentially wraps BigQuery's INFORMATION_SCHEMA queries but filters results based on the datasets identified in Tool 2, preventing information overload.

Tool 4: Schema and Documentation Inspector This tool fetches detailed schema information including column names, data types, constraints, and most importantly, column descriptions and business definitions when available. It pulls from both BigQuery's native schema metadata and our organization's data documentation system to provide context about what each field actually represents.

Tool 5: Query Executor The final tool executes the SQL query with the full context from previous tools. It includes safety mechanisms like query cost estimation and result size limits based on the user's permissions.

What Makes This Different From Google's MCP Toolbox

Yes, Google's BigQuery MCP connector also uses multiple tools for dataset discovery, schema inspection, and query execution. But there are crucial architectural differences:

1. User-Centric vs. Service Account-Centric Access Control Google's connector operates under a single service account's permissions - what the service account can see, everyone can see. My implementation integrates with organizational IAM at request time, providing user-specific data access without spinning up multiple MCP instances. This isn't just a feature difference; it's a fundamental security model change.

2. Semantic vs. Alphabetical Discovery Google's tool discovery presents datasets in essentially alphabetical or chronological order. The AI has to pick from a massive list, leading to the position bias problem I mentioned earlier. My semantic search pre-filters and ranks datasets by relevance, dramatically reducing the cognitive load on the AI model.

3. Context-Aware Tool Chaining While both approaches use multiple tools, Google's connector treats each tool call independently. My implementation maintains context across the tool chain - Tool 3 only fetches metadata for datasets identified by Tool 2, Tool 4 only inspects schemas for tables the user can actually access, etc. This reduces the total information space the AI has to process.

4. Custom Data Governance Integration Google's connector only knows about BigQuery's native metadata. My implementation pulls from our organization's broader data governance infrastructure - business glossaries, data lineage systems, and custom documentation platforms - providing richer context for query generation.

The Reality Check

Here's the honest truth that took me way too long to realize - building this taught me that MCP servers really are just fancy API orchestration wearing a protocol costume. Remember my title about expecting magic and getting glorified API endpoints? Yeah, that's exactly what happened.

MCP... It means no worries for the rest of your days. jk. But seriously, if you have no idea what MCP is or why you should care, check out our quick MCP

The "magic" I was looking for doesn't exist in the MCP protocol itself. It's just a standardized way to wrap your APIs so AI assistants can call them. The real magic (if you can call it that) is in how thoughtfully you design those API interactions and what external systems you decide to integrate with.

All that complexity I was trying to avoid with Google's solution? Well, I didn't eliminate it, I just moved it around and painted it a different color. Instead of dealing with their tool selection issues, I built my own tool selection logic. Instead of managing multiple service accounts, I built complex IAM integration. The work didn't disappear; it just got redistributed to different parts of the system.

But here's the thing - that redistribution actually mattered for our specific use case. We needed user-specific access control and intelligent dataset discovery more than we needed simplicity. Your organization might have completely different priorities, and Google's sledgehammer approach might be exactly what you need.

So did I reinvent the wheel? Absolutely. Was it worth it? That depends on whether you think a wheel with better tires and custom rims is worth the engineering effort, or if you just need something that rolls.

I finally built my first MCP Server. I expected magic, I got glorified API endpoints