A few days back, I was looking for my AWS certification score report. I knew it was on my laptop somewhere - I just didn’t remember where I saved it. I vaguely remembered sending it to someone on WhatsApp, so I opened the chat and started scrolling.

Then I opened Google Drive. Then my downloads folder. Then my own brain.

After 20 minutes of aimless clicking, I found the file. But by then, I had lost 10% of my will to live and 30% of my respect for my folder organization.

So I thought - what if I could just ask my laptop something like:

“Where is my AWS certification score report?”

…and it would understand what I meant, then point me to the right file?

Not by filename. Not by modified date.
But by actual meaning.

And that’s how I ended up building Query - a local AI file assistant.

The two brains behind Query

Sentence Transformer

When I typed, “Where’s my AWS score report?” - my laptop blinked at me, “what’s an AWS?”

To be fair, computers don’t understand language the way we do. They don’t know what “certification” is. Or that a “score report” might be hiding in a file called AWS_Final-3.pdf.

The only language they know is numbers.

They need help. And that help comes in the form of a Sentence Transformer.

So what does it actually do?

It takes a sentence - like “My AWS certification result” - and turns it into a fixed-size vector:
a long list of numbers like [0.23, -0.67, 0.12, ...].
This list is called an embedding, and it captures the overall meaning of the sentence.

But how do we get there? Let’s break down the architecture.

Tokenization
The sentence is split into smaller pieces called tokens.
Most tokens are words, but weird ones like certiFyyy_aws2024 get broken down into subword chunks like certi, Fyyy, _aws, 2024.
The goal? Break text into understandable building blocks.
Transformer Encoding (with Self-Attention)
These tokens are passed through a Transformer encoder - the model’s main engine. Unlike older models that read left to right, transformers read the entire sentence at once.
Using self-attention, the model figures out which words influence each other and by how much. So it knows that in “AWS exam score,” the word “score” is connected to “exam” - not floating in space.
Contextual Embedding Generation
After self-attention does its thing, each token is converted into a vector - called a contextual embedding.
These embeddings don’t just represent the word - they represent the word in context. So “score” in “exam score” and “score” in “music score” get totally different vectors.
Pooling
Now we’ve got a bunch of token-level vectors - but we need just one for the whole sentence. Most Sentence Transformers use mean pooling: they take the average of all token vectors.
This gives us one single vector - the sentence embedding - that captures the sentence’s overall meaning.

And that’s what gets stored, compared, searched, and matched.

If two sentence embeddings are close in this high-dimensional space, their meanings are similar.
If they’re far apart? Not even close, buddy.

FAISS

Alright. You’ve got a vector. You’ve got a giant collection of other vectors. Now the goal is to find the closest ones. Sounds simple, right?

Except these vectors aren’t 2D or 3D. They’re living in 384-dimensional space (or more).
Try drawing that. You can’t. Your brain will melt.

That’s where FAISS comes in - it’s built to handle high-dimensional similarity search efficiently. FAISS stands for Facebook AI Similarity Search, and it’s basically a high-speed search engine - but instead of searching text, it searches through vectors.

Here’s what it actually does:

It builds an index structure

You don’t want to compare your query vector to every single file vector one-by-one. That’s too slow.
So FAISS builds an index - a data structure that organizes all your vectors in a way that makes it faster to search through.

There are different kinds of indexes:
- IndexFlatL2: The simplest one. It checks every vector (brute-force), but it’s super optimized, so it’s still fast for smaller datasets.
- Other indexes (like IVF, HNSW, PQ) use clustering and approximations to speed things up for large-scale data.
Distance-Based Search
When you give FAISS a new vector (like the one for your query), it looks for the closest vectors in the index using metrics like:
- L2 distance (a.k.a. Euclidean): Straight-line distance between two points.
- Cosine similarity: Angle between two vectors. More useful when we care about direction, not magnitude.
Efficient Nearest Neighbor Search
FAISS is optimized to find the top-k nearest neighbors - the k most similar vectors to your input.
And it does this really fast - even on CPU - using highly efficient mathematical tricks (like optimized matrix operations and index structures). So when I ask a question, FAISS doesn’t skim through every file.
It finds the best semantic matches immediately, based on vector proximity.

How does the assistant work?

Let’s talk code.

The assistant is basically three parts:
One file walks through my laptop and finds stuff.
One file turns that stuff into numbers.
And one file searches those numbers when I ask a question.

Meet the trio

📂 `file_indexer.py` – The file crawler

This is the one that actually goes through my drive.

It walks through the entire D: drive (or whatever base path you give it), skips junk folders like __pycache__, .git, and System Volume Info, and looks for files with supported extensions - .pdf, .docx, .txt, .py, etc.

For every valid file, it builds a readable sentence like:

“File: AWS_Score_Report.pdf in folder: Certificates at path Certificates/AWS_Score_Report.pdf”

That’s our first rough idea of what the file is. Not based on its content - just based on the filename and where it lives. And trust me, for a lot of cases, that’s enough to work with.

These descriptions + file paths get returned and passed on to the next stage.

📂`vector_store.py` – The embedder + indexer

Here’s where the magic happens ( well, its really just extremely optimized math).

This file takes those file descriptions and runs them through a Sentence Transformer model - specifically all-MiniLM-L6-v2. Each description becomes a vector - a 384-dimensional list of numbers representing the sentence’s meaning.

All those vectors are added to a FAISS index (IndexFlatL2, in my case).
This index is what lets us later search fast - even across thousands of vectors - based on similarity.

I also store the file paths in a separate JSON file (because FAISS doesn’t store metadata).
So when I search later, I know which vector belongs to which file.

End result?

A data/index.faiss file containing all your file vectors
A data/file_map.json file that links each vector back to its file path

📂`query_engine.py` – The search engine

This file handles the actual question-answering part.

When I type a query like:

“Where’s my AWS certification result?”

query_engine.py takes that sentence, passes it through the same Sentence Transformer, and gets a vector for it.

Then it asks FAISS:

“Hey, which vectors in the index are closest to this one?”

FAISS does its thing and returns the top-k matches.
query_engine.py then maps those vector indices back to the actual file paths - and gives me the list of results.

Closest vector = most likely file I was looking for.

That’s a wrap (for now)

So yeah - I built myself a file-search assistant that actually understands what I mean, not just what I type. It scans my files, speaks vector, runs offline, and doesn’t roll its eyes when I forget what I named things.

All it took was a Sentence Transformer, FAISS, some Python, and way too many files named final_final_2_reallyfinal.pdf.

But this is just v1.

Right now, it knows where things are.
Next, it’ll know what’s inside them.

Here’s the link to the repo!

I Built a File Assistant and Now I’m Scared of How Dumb I Used to Search

The two brains behind Query