Building an IT Support Assistant with Retrieval-Augmented Generation and Gemini-2.0-flash and FAISS


For my Kaggle/Google 5 day intensive AI course’s Capstone project, I set out to build something practical — not just a chatbot that spins stories about servers, but an IT assistant that could help users get accurate answers to queries they would normally have to speak to their I.T. department about.
Working in IT support, you're constantly solving the same classes of problems. The terminology changes. The context shifts. But the core issue — and its resolution — is usually something we've dealt with before. And yet, when someone asks a question, you still need to give a clear, accurate, and human-readable response. That’s where the idea for this project started.
The goal? Create a Retrieval-Augmented Generation (RAG) system that could serve as a safe, accurate IT support assistant — one that knows what it knows and doesn’t pretend otherwise.
The Hallucination Problem
My early experiments used Gemini-2.0-flash to directly answer support-style prompts. The results were occasionally brilliant — but also occasionally made-up. That’s a problem in IT. It’s not enough for something to sound plausible — it has to be correct.
During initial testing Gemini-2.0-flash told me some entirely made up facts, this is when I knew I had to change approach. I needed a way to anchor the LLM’s responses in trusted, internal knowledge — a dataset of real Q&A we already knew was right.
That’s when I moved to a RAG architecture.
The RAG-Based Approach
Instead of letting Gemini-2.0-flash generate answers freely, I decided to retrieve similar questions from a curated Q&A dataset, and instruct Gemini-2.0-flash to use only those retrieved entries when composing its reply.
Here’s the high-level workflow:
Load a CSV containing known Q&A pairs.
Encode the questions using a sentence transformer model.
Store the vectors in a FAISS index for similarity search.
When a new user query comes in:
Encode it.
Search the FAISS index for top matches.
If the best match is close enough, build a prompt with the top few matches.
Send the prompt to Gemini-2.0-flash via API.
Return the grounded response from Gemini-2.0-flash.
Log everything.
Code Snippets
Load and Index the Knowledge Base
# =============================================
# ✅ Function: Load and Index Q&A Knowledge Base
# =============================================
def load_qa_knowledge_base(csv_path):
"""
Loads a CSV file with 'question' and 'answer' columns,
generates sentence embeddings, and stores them in a FAISS index.
"""
df = pd.read_csv(csv_path, quotechar='"', encoding='utf-8', on_bad_lines='skip')
model = SentenceTransformer("all-MiniLM-L6-v2")
df["embedding"] = model.encode(df["question"].tolist(), convert_to_numpy=True).tolist()
# Build FAISS index for fast nearest-neighbor search
embeddings = np.vstack(df["embedding"].values).astype("float32")
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
return df, index, model
Define the Distance Threshold
FAISS calculates the L2 distance between embeddings. The smaller the value, the more semantically similar the text. If no good match is found (above our threshold), we don’t generate a response at all and we drop to our fallback message.
MAX_DISTANCE = 0.9 # Don't trust anything beyond this distance
Query + Generation Pipeline
# ====================================================
# ✅ Function: Answer Questions Using Gemini + Grounding
# ====================================================
def grounded_gemini_answer_verbose(user_query, df, index, model, api_key, log_path=None, top_k=3, max_distance=0.9):
"""
Retrieves similar Q&A entries based on the user query and uses Gemini Pro
(via REST API) to generate a response grounded only in those matches.
"""
# Convert query into vector
query_vec = model.encode([user_query], convert_to_numpy=True).astype("float32")
distances, match_indices = index.search(query_vec, top_k)
# Pick the best match
best_idx = match_indices[0][0]
best_dist = distances[0][0]
best_q = df.iloc[best_idx]["question"]
# Fallback if nothing similar enough
if best_dist > max_distance:
fallback_msg = "⚠️ Sorry, I couldn't find a relevant answer to that question in the knowledge base."
if log_path:
log_qa_interaction(log_path, user_query, best_q, best_dist, fallback_msg, status="Fallback")
return {
"match_quality": f"🔍 Closest match distance: {best_dist:.4f} (above threshold {max_distance})",
"top_match": best_q,
"answer": fallback_msg
}
# Build context prompt from top matches
context = "\n\n".join([
f"Q: {df.iloc[idx]['question']}\nA: {df.iloc[idx]['answer']}"
for idx in match_indices[0]
])
prompt = f"""
You are an experienced IT support assistant.
Your task is to answer the user's question using only the provided Q&A Context section.
You may paraphrase and adapt information from similar questions — you are allowed to interpret if the meaning is clearly close.
However, do not guess or invent new information. If the context clearly does not answer the user's question, say: "I'm sorry, I don't have enough information to answer that."
Respond in a friendly, helpful, and professional tone, using full sentences like you would in a support ticket response.
Q&A Context:
{context}
User's question:
"{user_query}"
"""
# Call Gemini Pro (v1beta) using REST API
url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"
headers = {"Content-Type": "application/json"}
params = {"key": api_key}
payload = {
"contents": [
{
"role": "user",
"parts": [{"text": prompt}]
}
]
}
response = requests.post(url, headers=headers, params=params, data=json.dumps(payload))
if response.status_code == 200:
answer = response.json()["candidates"][0]["content"]["parts"][0]["text"]
else:
answer = f"❌ Gemini API error {response.status_code}: {response.text}"
# Log the successful generation
if log_path:
log_qa_interaction(log_path, user_query, best_q, best_dist, answer, status="Generated")
return {
"match_quality": f"🔍 Closest match distance: {best_dist:.4f}",
"top_match": best_q,
"answer": answer.strip()
}
User Query: "Free disk space via CLI?"
Closest Match: "How do I check available disk space using the Windows command line?"
Distance: 0.2057
Response: “Run the WMIC command: wmic logicaldisk get size,freespace,caption to display disk usage statistics.”
Lessons from Iteration
Like most GenAI projects, this one went through several rewrites:
Initially there was no fallback logic — and Gemini-2.0-flash would answer confidently even with a poor match.
Early versions logged nothing, which made debugging difficult.
Tuning the match quality necessitated rewriting the questions and answers in a way that was easier for the system to semantically analyse, this took some time.
Correcting each mistake made the system more robust. Logging, similarity thresholding, and prompt grounding turned it from a guesser into a dependable assistant.
Capstone Requirements Met
This project checks all the boxes:
✅ Uses multiple GenAI techniques: Embeddings, Vector Search, Prompt Engineering, RAG
✅ Addresses a real IT use case with practical value
✅ Produces safe, auditable responses
✅ Clean, functional end-to-end notebook
Potential Next Steps
Add a simple Streamlit or Gradio UI.
Let users rate answers with 👍/👎.
Handle follow-up questions (multi-turn support).
Use function calling to create support tickets or link to internal systems.
Conclusion
This project taught me that you don’t need to fine-tune a language model to get reliable results — not if you ground it properly. With FAISS, good embeddings, and a well-written prompt, you can get consistent, safe answers from a powerful model like Gemini-2.0-flash.
I built this to solve a real problem I face every day; and now that it works, I’m thinking about what else it could do.
Project developed as part of the Kaggle & Google Generative AI Capstone, 2025.
Subscribe to my newsletter
Read articles from Shaun Dunmall directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
