Building a Scalable System for Real-Time Waste Verification


Hell i need to finish this up real quick, my github repo is still having that old boiler plate code i wrote for this project. Today i was sitting and working on this project and wanted to finish it up and then i realised oh i need to know how things would work in production so i went on to start by designing the system for Regen…
For your intro and my recap let’s revisit the initial project idea and know what i was building,
Recap -
This will work with the help of AI. We could utilize the image processing capability of AI and train it in such a way to detect what kind of waste is in the picture and what kind of waste pictures to approve and what not to. This would help to keep track of places with over-smart people.
Also, since the collectors now need to mark the attendance at each house, this would help keep track of which houses are done, ensuring no area is left and that everyone sends the waste properly. The system records segregation accuracy and collection timestamps. Waste is categorized as wet, dry, hazardous and stored in a database.
So this part of the project was to build a portal where users will upload a picture of their waste and then then this picture will be send to the backend for verification…
Designing the System
First thing we need to figure out is how the client (user) will stay connected to the backend. One way is to use a POST request to send the image and then a GET request to fetch the result. That’s the good old REST API way.
But it’s painfully slow and inefficient for this use case. You really don’t want your user staring at the screen, refreshing it again and again just to see if the result came in. And if the response logic crashes somewhere? I’ll have to write tons of code just to explain that something went wrong while my server was trying to look at your garbage… lol.
Or we could do what LeetCode does when you submit a problem—polling. It sends your submission to the backend and then makes a bunch of GET requests every X milliseconds:
“Is it ready? Is it ready now? How about now?”
Since this verification process is asynchronous, those early polls just return pending promises until—finally—the result is ready.
But yeah, this is super time-consuming and resource-hungry. I’d be loading my server with a ton of useless GET requests for no reason. Not worth it.
The better solution? WebSocket—a persistent, two-way connection between the client and the server.
This way, the user sends the image once, the process kicks off on the backend, and when the result is ready, the server pushes it back through the same connection.
No refreshing. No polling. No unnecessary traffic.
Minimal resources. Clean AF.
Scaling this thought…
This is the high level overview of what the connection architecture might look like in the future… and on the basis of the needs of the product, the best one is ofcourse the websockets one… But this was also one single user centric plan, and now we need to think big and how will the connection scale and for that i thought that -
See, not a single WebSocket connection can handle all these users, nor should it ideally. So, what we need to do is create multiple WS servers and then connect different users to different ones. This can be done on the basis of location… people from one city might end up getting into a single connection, or people from 10 different societies share a connection. Each user has their own user or connection ID, which can be used to verify, out of thousands of people in this connection, which one actually sent something.
Now there is the image handling and verification logic… Ideally, any logic should not be run on the main backend. It should only support the website’s uptime… because if anything goes wrong with this, the entire website goes down. So I think assigning this work to temporary worker CPUs is a better option—they handle these tasks.
Now, since multiple people send some data simultaneously, we could maintain a single or multiple queues, like pub-sub systems. Let’s say multiple people send work to the backend, then it starts pushing this work into a stack or a queue, and the worker handling this task picks up work one at a time from the queue and then performs the desired action.
Queue System and PubSubs…
Now this desired action will have two outcomes—either the verification is true or it is false. Now, whatever the response is, instead of directly sending it to the WebSocket server, it sends the response along with the connection ID, WS server ID, and task ID to the main server, which then finds the server ID and routes this to that particular WS server, and then the user finds out the result. In case of failure, the main server receives something which makes it clear that it failed due to an internal error, so it pushes it back to the end of the queue… and the process continues. But we also need to take care that if some task fails X number of times, it should be discarded. Let’s call this the fail rate—the number of times we’ll retry verification before giving up.
Now let’s talk about what the server will get and how this will be managed. The user needs to upload a picture of waste, which means the server will receive an image. One option is to upload this picture to the backend using Multer temporarily, but this is not feasible here since people across the country would be uploading pics. So one option is to run an S3 bucket for temporary storage of pics, and then send this image for AI verification from there. Then, this verification returns yes or no. If it is yes, then move it to a permanent bucket, and if it’s no, keep it there until the fail rate is hit. This way, we don’t flood our buckets with unwanted images.
So people upload a pic, which gets uploaded to S3, and the metadata is extracted by the worker and stored in Redis. Then the worker continues with verification, and if the result is yes, the worker follows its path and sends it to the backend and then to the WS connection. Redis also sends the metadata to Postgres, which fills up the entry of waste collected for this house, and the driver gets to know that this house’s waste has been collected.
Here’s how the System Design Looks
The Awesome User flow
AI Generated - Eraser
Ending and Final Thoughts -
Now although the system has started to look complex and that it seems theoretically ready to handle production level hits… It has a lot to be taken care of, but that’s just a story for another blog, i’ll keep sharing whatever goes and however it goes…
Thanks…
Subscribe to my newsletter
Read articles from Vansh Khanna directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
