What I Learned About Caching While Building Real APIs and AI Systems


“From in-memory caching and Redis to edge AI optimizations — real-world lessons that helped me speed up systems and avoid performance bottlenecks.”
What’s the sum of all integers from 1 to 100?
If you instantly said 5050, either you’re aware of Gauss Summation… Or you remembered it from the last time someone asked you.
Now imagine I ask you the same question again… Would you calculate it again or just reuse the answer?
What Is Caching?
Photo by Liam Briese on Unsplash
Caching is the practice of temporarily storing the result of expensive operations so you can serve future requests faster without doing the same work again.
In REST API development, this often means avoiding unnecessary database queries or redundant API calls.
Why Is Caching Important?
Imagine your /api/products
endpoint hits the database every time someone visits your e-commerce homepage. If 10,000 users hit that in a minute, your DB is going to suffer. With caching, you store the result once and reuse it, saving time and resources.
Where Can You Apply Caching?
In-memory (e.g.
node-cache
,memory-cache
)Distributed (e.g. Redis, Memcached)
Browser-level (Cache-Control headers)
CDNs (Cloudflare, Akamai, etc.)
Let’s focus on in-memory and Redis for API development in this article. I can write posts on others if you want me to.
In-Memory Cache
import express from 'express';
import NodeCache from 'node-cache';
const app = express();
const cache = new NodeCache({ stdTTL: 60 }); // time to live in seconds. After that, cache expires.
app.get('/api/products', async (req, res) => {
const cacheKey = 'products';
const cachedData = cache.get(cacheKey);
if (cachedData) {
return res.json({ source: 'cache', data: cachedData });
}
const products = await fetchProductsFromDB(); // simulate DB query
cache.set(cacheKey, products);
res.json({ source: 'db', data: products });
});
Redis Cache
import express from 'express';
import Redis from 'ioredis';
const app = express();
const redis = new Redis();
app.get('/api/users', async (req, res) => {
const cacheKey = 'users';
const cached = await redis.get(cacheKey);
if (cached) {
return res.json({ source: 'redis', data: JSON.parse(cached) });
}
const users = await fetchUsersFromDB();
await redis.set(cacheKey, JSON.stringify(users), 'EX', 120); // expires in 120 seconds
res.json({ source: 'db', data: users });
});
When You Should (and Shouldn’t) Cache?
✅ Data doesn’t change often
✅ Reads > Writes
❌ Highly dynamic or user-specific data (e.g. shopping cart)
Cache Invalidation — The Hard Part
“There are only two hard things in computer science: cache invalidation and naming things. — Phil Karlton”
Cache invalidation is hard, but picking the right strategy makes it manageable. You’ll need a strategy to refresh or clear the cache when your underlying data changes.
I’ll make a detailed post on Caching and invalidation in the coming days. Here’s a summary for now.
A Classic Edge AI use case
Caching works beyond web and APIs
Before the medium-high powered Jetson devices era, the most common Edge AI use cases were on resource-constrained environments (Raspberry Pi, or embedded devices), where each CPU cycle is crucial for performance and energy efficiency.
Here’s a simple example in C++ that demonstrates caching inference results of a vision model (like object detection) for frames with little to no change, which is a common optimization.
Imagine you’re running a pressure gauge detection model on a camera feed. If two consecutive frames are nearly identical, you can skip re-running inference and reuse the previous result — that’s caching at the edge.
We’ll cache the last processed frame’s hash and its detection result.
If the new frame is “similar enough”, we’ll skip the model inference and reuse the result. On the edge, skipping AI inference even for a few frames saves CPU/GPU, battery, and latency.
// frame_handler.cpp
#include <opencv2/opencv.hpp>
#include <unordered_map>
#include <string>
cv::Mat last_frame;
std::string last_result;
int frame_threshold = 10; // pixel-wise threshold for frame difference
// Simulate an expensive model inference
std::string runInference(const cv::Mat& frame) {
// Pretend this runs a heavy AI model
return "Normal readings detected\n";
}
// Compare two frames for similarity
bool isSimilar(const cv::Mat& a, const cv::Mat& b, int threshold) {
if (a.empty() || b.empty()) return false;
cv::Mat diff;
cv::absdiff(a, b, diff);
return (cv::countNonZero(diff > threshold) < 1000); // heuristic
}
std::string processFrame(const cv::Mat& frame) {
if (isSimilar(frame, last_frame, frame_threshold)) {
std::cout << "Cache hit: Reusing previous result\n";
return last_result;
}
std::cout << "Cache miss: Running inference\n";
last_result = runInference(frame);
last_frame = frame.clone();
return last_result;
}
// In action - main.cpp
int main() {
cv::VideoCapture cap(0); // camera input
while (true) {
cv::Mat frame;
cap >> frame;
if (frame.empty()) break;
std::string result = processFrame(frame);
std::cout << "Result: " << result << "\n";
cv::imshow("Edge AI Frame", frame);
if (cv::waitKey(30) >= 0) break;
}
return 0;
}
You use Caching every day
Caching is everywhere. We rely on it daily without realizing it:
Mobile Apps: Ever noticed how apps like Instagram, YouTube, or Twitter load faster the second time you open them? That’s local caching at work — images, videos, and layouts are stored temporarily so the app doesn’t have to download everything again.
Web Browsing: When you revisit a website, it often loads faster. That’s because your browser caches static assets like CSS, JavaScript, and images. No need to re-download the entire site on every visit.
Your Brain (Seriously!): Remember the answer to “What’s the sum of 1 to 100?” from earlier? If you don’t recalculate it and instead recall it — that’s mental caching. Passwords, Frequently used routes, Common answers in conversations, etc. are also your cached responses.
Cooking: Let’s say you prepare a big batch of curry and refrigerate it. You don’t cook it from scratch every day — you reheat it. In a sense, that’s caching in the kitchen.
Caching is a universal principle of optimizing effort, whether it’s computers, phones, or your own brain.
Final Thoughts
Whether you’re building REST APIs, edge AI systems, or microservices — caching is one of those tools that offers massive returns for minimal effort when used right.
From in-memory and Redis caching to frame-level optimization in computer vision, the underlying idea is the same:
Don’t do expensive work more than once if you can avoid it.
If you’re just getting started, pick one high-traffic endpoint or heavy computation, add caching, and measure the impact. You’ll be surprised how much faster and leaner your system feels.
Have you implemented caching in your systems?
What strategies worked best for your use case?
Have you ever struggled with stale data or cache invalidation?
Any fun edge caching hacks you’ve tried?
Drop your thoughts, war stories, or questions in the comments — let’s share notes and level up together.
Subscribe to my newsletter
Read articles from Shy Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
