Chai aur Code, Chaat & Chatbots: Understanding Parallel Query Retrieval (Fan Out) in GenAI

Hello Everyone!

Last year I visited a chaat stall with my friends.

Here we are total 5 friends and I ordered chaat and pani puri. There I need to say 5 plate chaat and pani puri rather that saying one plate at a time?

Nahi na!

“Bhaiya, 5 plates of chaat and pani puri!” That’s what Parallel Query Retrieval is getting multiple things done at once instead of waiting for each task to finish just like Asynchronous in Javascript.

Then What’s Fan Out splitting a big task into smaller pieces and processing them simultaneously.

It help’s in Faster results, less waiting! Just like Flash.

The Dabbawala System

As we know Mumbai’s dabbawalas’ deliver thousands of tiffins daily.

It’s not possible? Here fan out comes into the picture :

  1. Split tiffins by area.

  2. Assign each dabbawala a route.

  3. Deliver all tiffins at the same time.

Just like this GenAI uses parallel processing to handle multiple requests together rather then handling them 1 by 1.

Let’s see How Parallel Query Retrieval Works?

Here I implemented a simulator of 3 AI workers(A,B,C). And they r handling 3 different customers in parallel and all 3 customer ordered Chai. let’s see what happen next.

Step 1: Import Libraries

import asyncio  
import time

Step 2: Defining a tea brew function for workers

async def make_chai(name, time_to_brew):  
    print(f"{name} started brewing chai ☕...")  
    await asyncio.sleep(time_to_brew)  
    print(f"{name}'s chai is ready in {time_to_brew} seconds! 🍵")

Step 3: Fan Out Tasks with Asyncio

async def main():  
    # Tasks to run in parallel  
    tasks = [  
        make_chai("A", 3),  
        make_chai("B", 2),  
        make_chai("C", 4)  
    ]  
    # Run all tasks together  
    await asyncio.gather(*tasks)  

# Run the program  
start = time.time()  
asyncio.run(main())  
print(f"Total time: {time.time() - start:.2f} seconds")

Output

Hare could u see one thing instead of getting 9s(3+2+4 = 9s), parallel processing finishes in ~4s!

Why Fan Out in GenAI?

  1. Speed: It help us in getting 10x faster responses.

  2. Efficiency: Handle 1000s of requests like Zomato’s servers.

  3. Scalability: Add more workers (like hiring more dabbawalas’).

Real-World Applications

  • Chatbots: Fetch answers from 5 AI models at once.

Parallel Query Retrieval it’s an superpower to make GenAI apps light-fast just like Flash. Just like Mumbai’s dabbawala’s, fan out tasks and conquer the world! 🚀

Check out My Code at: GitHub

0
Subscribe to my newsletter

Read articles from SANKALP HARITASH directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

SANKALP HARITASH
SANKALP HARITASH

Hey 👋🏻, I am , a Software Engineer from India. I am interested in, write about, and develop (open source) software solutions for and with JavaScript, ReactJs. 📬 Get in touch Twitter: https://x.com/SankalpHaritash Blog: https://sankalp-haritash.hashnode.dev/ LinkedIn: https://www.linkedin.com/in/sankalp-haritash/ GitHub: https://github.com/SankalpHaritash21 📧 Sign up for my newsletter: https://sankalp-haritash.hashnode.dev/newsletter