Making LLMs Efficient for Survey Cleaning: My Journey from Arrays to Choice Maps


Problem Statement
I was building an AI pipeline to clean survey responses. The data structure was like this:
Sample Question:
{
"id": 3271,
"text": "How satisfied are you with our service?",
"choices": [
{ "id": 1, "label": "Very Satisfied" },
{ "id": 2, "label": "Neutral" },
{ "id": 3, "label": "Dissatisfied" }
]
}
Sample Response:
{
"responseId": 1001,
"responses": [
{ "questionId": 3271, "responses": "2" }
]
}
Simple na? The user selected 2
, meaning "Neutral".
Now, when sending batches of survey responses to LLM for cleaning and fraud detection, I had a big question in mind: How to send questions and responses efficiently without wasting tokens and making model slow?
My Thought Process
Initially, I thought - "Aree yaar, just send the full questions array and responses array. Simple."
So I was packing:
Full questions (with choices array)
Full responses (with choiceIds)
But slowly I realised...
Every batch was sending the same choices again and again. Every user response needed LLM to read question choices, scan array, match choiceId.
Even a small survey was eating 2k-3k tokens easily just for system context!
Then I thought:
"What if instead of sending same data again and again, I somehow make the choice lookup easier for the model?"
I had explored three Options
Option 1: Keep Choices as Array (Default)
Each question has
choices: [{ id, label }]
array.Response uses choiceId.
LLM scans array to match.
Pros: Tiny initial payload.
Cons:
Model has to do O(n) array scanning.
Slow reasoning.
Wastes attention and tokens if survey grows.
(Imagine scanning 10 choices manually every time โ uff..)
Option 2: Expand Label Inside Every Response
Instead of sending
choiceId
, I replace it with "Neutral", "Dissatisfied", etc.Responses directly readable by model.
Pros: Fast LLM understanding.
Cons:
Response size doubles or triples.
Huge token waste.
Not good for 10k+ responses batch.
(At small scale ok, but at big scale โ ๐ชฆRIP tokens!)
Option 3: Prebuilt Choice Map per Question
- Build a map like:
{
"3271": {
"1": "Very Satisfied",
"2": "Neutral",
"3": "Dissatisfied"
}
}
Response stays as choiceId ("2").
LLM just does O(1) lookup using map.
Pros:
One-time small cost.
Fastest reasoning.
Smallest token usage long term.
Bulletproof at 100k, 1M responses scale.
Cons:
- Slightly more work backend-side to generate map.
(But haan yaar... once done, clean and scalable!)
Final Flow
Survey Questions (choices array)
โ
Preprocess into Choice Map (one time)
โ
Store Choice Map in System Context
โ
Send Responses with choiceId only
โ
LLM does O(1) lookup from Map
โ
Efficient fraud detection and response validation
โ Pucho advantages kya hai ?
No duplicate choices in every batch.
No ballooning of response size.
No array scanning overhead for LLM.
Key Benefits
Approach | Token Usage | LLM Speed | Scale Readiness |
Choices as Array | Medium | Medium | Ok only for small surveys |
Expanded Labels | High | Fast | Very costly at scale |
Prebuilt Choice Map | Low | Fastest | Best for 100k+ responses |
๐ก Final Thought
Sometimes, small design decisions, like, whether to send a list vs a map, matter A LOT when you want to scale cleanly.
I learned this by thinking deeply from the angle of:
Token cost
LLM cognitive load
Real-world scaling for lakhs of survey responses
TL;DR
This idea is not only for surveys! It can be applied wherever structured choices are involved.
Some real examples:
Auto-grading MCQ exams at scale (education apps).
Screening candidate forms in HRTech startups.
Cleaning healthcare intake forms efficiently.
Processing ecommerce customer feedback forms cheaply.
Analyzing product satisfaction surveys in SaaS platforms.
Main benefits of using Maps in AI pipelines:
โ Save massive tokens.
โ Make LLM think faster.
โ Scale to millions of records easily.
โ Keep backend and API payloads clean and simple.
Thanks for reading! ๐
If you're building AI pipelines like this, comment your thoughts and approaches.
Subscribe to my newsletter
Read articles from Karthik Sai directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
