MongoDB Aggregation Pipeline

Harsh SharmaHarsh Sharma
13 min read

Why Aggregation?

Imagine you’re building a platform like Coursera or Udemy. As your platform grows, so does your database — thousands of users, enrollments, courses, interests, and more.

Now you face real-world data questions like:

  • How many students are from India?

  • What's the average age of users?

  • Which course has the most enrollments?

  • How many users are active or interested in frontend?

This is where the MongoDB Aggregation Pipeline becomes your superpower. 💪
It helps you analyze, transform, and summarize large datasets efficiently — all within the database, without writing external logic.


What is Aggregation in MongoDB?

Aggregation in MongoDB is a powerful way to process data records and return computed results.

Think of it like building a data processing pipeline – each step (called a stage) does something specific, like filtering, grouping, sorting, etc.


What is a Pipeline?

A pipeline is an array of stages, where each stage transforms the data in some way and passes the output to the next stage — just like an assembly line in a factory.

[
  { $match: { country: "India" } },
  { $group: { _id: "$course", count: { $sum: 1 } } }
]

Each stage begins with a $ operator like $match, $group, $project, $sort, $limit, etc.


What is a Stage?

A stage is a single step in the aggregation process. Each stage:

  • Takes in documents

  • Performs a transformation (e.g., filter, group, count)

  • Passes transformed output to the next stage

For example:

  • $match: Filters documents like a query

  • $group: Groups by a field and performs aggregation like sum, avg

  • $project: Selects fields or creates new ones

  • $sort, $limit, $skip: For sorting, pagination

  • $unwind: Flattens arrays

  • $lookup: Joins collections


❓ Why Use Aggregation?

Without aggregation, you'd have to:

  • Pull all documents to your server

  • Write complex logic to compute results

  • Struggle with performance issues

With aggregation, MongoDB does the heavy lifting inside the database engine, which is much faster and optimized for large datasets.


What Will This Blog Cover?

This blog will take you from zero to pro using 21 carefully crafted, real-world aggregation examples. Each one includes:
Real-world use case
Sample dataset
Aggregation pipeline query
Final output
Step-by-step breakdown


Topics and Questions You’ll Learn to Answer

TopicQuestions Answered
$matchHow to filter documents based on conditions?
$group + $sum, $avgHow to count users per group, calculate averages?
$projectHow to include or exclude fields from documents?
$sort, $limit, $skipHow to paginate results or find top records?
$unwindHow to handle arrays inside documents?
$lookupHow to join two collections like SQL?
$countHow to get a total count of filtered results?
$in, $allHow to match documents based on array contents?
$push, $addFields, $size, $ifNullHow to reshape and enrich your documents?

Example 1: Find All Users from India

Real-World Use Case: List users in India for targeted emails.

Dataset:

[
  { "name": "Amit", "profile": { "country": "India" } },
  { "name": "Emily", "profile": { "country": "USA" } }
]

Pipeline:

[ { $match: { "profile.country": "India" } } ]

Output:

[ { "name": "Amit", "profile": { "country": "India" } } ]

Breakdown:

  • $match filters only documents where profile.country === "India"

Example 2: Count of Users per Country

Real-World Use Case: Show user distribution in admin dashboard.

Dataset:

[
  { "name": "Amit", "profile": { "country": "India" } },
  { "name": "John", "profile": { "country": "USA" } },
  { "name": "Raj", "profile": { "country": "India" } }
]

Pipeline:

[{
  $group: {
    _id: "$profile.country",
    userCount: { $sum: 1 }
  }
}]

Output:

[ { _id: "India", userCount: 2 }, { _id: "USA", userCount: 1 } ]

Breakdown:

  • $group groups documents by country

  • $sum: 1 counts number of users in each country. it is an accumlator.


Example 3: Average Age of Users

Real-World Use Case: Display average user age in reports.

Dataset:

[
  { "name": "Amit", "age": 25 },
  { "name": "Raj", "age": 30 }
]

Pipeline:

[{
  $group: {
    _id: null,
    avgAge: { $avg: "$age" }
  }
}]

Output:

[ { _id: null, avgAge: 27.5 } ]

Breakdown:

  • _id: null means no grouping key; aggregate all

  • $avg calculates average age


Example 4: Students Per Course

Real-World Use Case: Know popularity of each course.

Dataset:

[
  { "userId": 1, "courseId": "React" },
  { "userId": 2, "courseId": "Node" },
  { "userId": 3, "courseId": "React" }
]

Pipeline:

[{
  $group: {
    _id: "$courseId",
    numberOfStudents: { $sum: 1 }
  }
}]

Output:

[ { _id: "React", numberOfStudents: 2 }, { _id: "Node", numberOfStudents: 1 } ]

Example 5: Most Enrolled Course

Real-World Use Case: Show trending course.

Pipeline:

[
  { $group: { _id: "$courseId", numberOfStudents: { $sum: 1 } } },
  { $sort: { numberOfStudents: -1 } },
  { $limit: 1 }
]

Output:

[ { _id: "React", numberOfStudents: 2 } ]

Example 6: Filter Users Age > 30

Real-World Use Case: Senior user group campaign.

Dataset:

[
  { "name": "Amit", "profile": { "age": 28 } },
  { "name": "Raj", "profile": { "age": 35 } }
]

Pipeline:

[{
  $match: { "profile.age": { $gt: 30 } }
}]

Output:

[ { "name": "Raj", "profile": { "age": 35 } } ]

Example 7: Students Interested in Frontend

Real-World Use Case: Recommend frontend jobs or events.

Dataset:

[
  { "name": "Amit", "interests": ["Frontend", "React"] },
  { "name": "John", "interests": ["Backend"] }
]

Pipeline:

[{
  $match: { interests: { $in: ["Frontend"] } }
}]

Output:

[ { "name": "Amit", "interests": ["Frontend", "React"] } ]

Example 8: Show Name & Email Only

Real-World Use Case: Only needed data for newsletters.

Dataset:

[
  { "name": "Amit", "email": "amit@mail.com", "age": 30 }
]

Pipeline:

[{
  $project: { name: 1, email: 1, _id: 0 }
}]

Output:

[ { "name": "Amit", "email": "amit@mail.com" } ]

Example 9: Count Students per Interest (Using $unwind)

Real-World Use Case: Measure popularity of interests.

How $unwind Works (Step-by-Step)

Given a document like:

{ "name": "Amit", "interests": ["Frontend", "React"] }

After applying:

{ $unwind: "$interests" }

It becomes:

{ "name": "Amit", "interests": "Frontend" }
{ "name": "Amit", "interests": "React" }

So instead of 1 document with 2 interests, we now have 2 documents, each with 1 interest. This allows us to then group, count, or filter by individual interests.

Dataset:

[
  { "name": "Amit", "interests": ["Frontend", "React"] },
  { "name": "John", "interests": ["Frontend"] }
]

Pipeline:

[
  { $unwind: "$interests" },
  { $group: { _id: "$interests", studentCount: { $sum: 1 } } }
]

Output:

[ { _id: "Frontend", studentCount: 2 }, { _id: "React", studentCount: 1 } ]

Step-by-Step Breakdown:

  1. $unwind: "$interests"

    • Converts array of interests into separate documents:
    { "name": "Amit", "interests": "Frontend" }
    { "name": "Amit", "interests": "React" }
    { "name": "John", "interests": "Frontend" }
  1. $group stage

    • Now that every interest is in a separate document, we can group by interests:
    {
      _id: "$interests",
      studentCount: { $sum: 1 }
    }
  1. Final Output:

     [
       { "_id": "Frontend", "studentCount": 2 },
       { "_id": "React", "studentCount": 1 }
     ]
    

Why $unwind Is Important Here

Without $unwind, MongoDB would treat the entire interests array as a single value, and grouping wouldn’t work as expected.

For example, without $unwind, the $group would see:

{ "_id": ["Frontend", "React"] } // not split

So we must unwind arrays if we want to:

  • Count individual items inside them

  • Filter by individual array elements

  • Group by array values

Example 10: Join Users with Enrollments using $lookup

Real-World Use Case:

Display how many courses each user has enrolled in.


Datasets:

Users Collection:

[
  { "_id": 1, "name": "Amit" }
]

Enrollments Collection:

[
  { "userId": 1, "course": "React" },
  { "userId": 1, "course": "Node" }
]

Aggregation Pipeline:

[
  {
    $lookup: {
      from: "enrollments",            // collection to join
      localField: "_id",              // field from the users collection
      foreignField: "userId",         // field from the enrollments collection
      as: "enrollments"               // name of the new array field in output
    }
  },
  {
    $project: {
      name: 1,
      numberOfCourses: { $size: "$enrollments" },
      _id: 0
    }
  }
]

Step-by-Step Breakdown

Step 1: $lookup

This stage joins the users collection with the enrollments collection using a foreign key-like relationship.

  • localField: "_id" → from users

  • foreignField: "userId" → from enrollments

So for:

{ "_id": 1, "name": "Amit" }

MongoDB looks in the enrollments collection for documents where userId == 1.

It finds:

[
  { "userId": 1, "course": "React" },
  { "userId": 1, "course": "Node" }
]

Then adds them to a new array field called enrollments.

Intermediate Output After $lookup:

[
  {
    "name": "Amit",
    "_id": 1,
    "enrollments": [
      { "userId": 1, "course": "React" },
      { "userId": 1, "course": "Node" }
    ]
  }
]

Step 2: $project

Now we reshape the document using $project.

jsCopyEdit{
  name: 1,                              // include name
  numberOfCourses: { $size: "$enrollments" },  // count number of enrollments
  _id: 0                                // exclude _id
}
  • $size returns the length of the enrollments array

  • numberOfCourses = 2 (since there are 2 courses)

Final Output:

[
  {
    "name": "Amit",
    "numberOfCourses": 2
  }
]

Example 11: Pagination (Skip & Limit)

Real-World Use Case: Show next 5 users after skipping 10.

Dataset:

[
  { "name": "User1", "age": 20 },
  { "name": "User2", "age": 21 },
  { "name": "User3", "age": 22 },
  ...
]

Pipeline:

[ { $sort: { age: 1 } }, { $skip: 10 }, { $limit: 5 } ]

Output:
5 users from the 11th youngest onwards.


Example 12: Count Active Users

Real-World Use Case: Show count of currently active users.

Dataset:

[
  { "name": "Amit", "isActive": true },
  { "name": "John", "isActive": false },
  { "name": "Neha", "isActive": true }
]

Pipeline:

[ { $match: { isActive: true } }, { $count: "activeUsers" } ]

Output:

[ { "activeUsers": 2 } ]

Example 13: Average Age by Gender

Real-World Use Case: Know average age for gender-based user insights.

Dataset:

[
  { "name": "Amit", "gender": "Male", "age": 25 },
  { "name": "Sara", "gender": "Female", "age": 30 }
]

Pipeline:

[{
  $group: { _id: "$gender", averageAge: { $avg: "$age" } }
}]

Output:

[ { _id: "Male", averageAge: 25 }, { _id: "Female", averageAge: 30 } ]

Example 14: 5 Most Common Favorite Fruits

Real-World Use Case: Popular fruits for eCommerce or recommendation.

Dataset:

[
  { "favoriteFruit": "Apple" },
  { "favoriteFruit": "Banana" },
  { "favoriteFruit": "Apple" }
]

Pipeline:

[
  { $group: { _id: "$favoriteFruit", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
  { $limit: 5 }
]

Output:

[ { _id: "Apple", count: 2 }, { _id: "Banana", count: 1 } ]

Example 15: Country with Most Users

Real-World Use Case: Show highest user distribution by country.

Dataset:

[
  { "company": { "location": { "country": "India" } } },
  { "company": { "location": { "country": "USA" } } },
  { "company": { "location": { "country": "India" } } }
]

Pipeline:

[
  { $group: { _id: "$company.location.country", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
  { $limit: 1 }
]

Output:

[ { _id: "India", count: 2 } ]

Example 16: Unique Eye Colors

Real-World Use Case: Display distinct options for filters in a UI.

Dataset:

[
  { "eyeColor": "blue" },
  { "eyeColor": "green" },
  { "eyeColor": "blue" }
]

Pipeline:

[ { $group: { _id: "$eyeColor" } } ]

Output:

[ { _id: "blue" }, { _id: "green" } ]

Example 17: Average Number of Tags Per User

Real-World Use Case: How active users are with tagging.

Dataset:

[
  { "tags": ["js", "react"] },
  { "tags": ["node"] },
  { "tags": [] }
]

Pipeline:

[
  { $addFields: { numberOfTags: { $size: { $ifNull: [ "$tags", [] ] } } } },
  { $group: { _id: null, avgTags: { $avg: "$numberOfTags" } } }
]

Output:

[ { _id: null, avgTags: 1 } ]

Step-by-Step Breakdown:

Step 1: $addFields

{
  numberOfTags: {
    $size: { $ifNull: ["$tags", []] }
  }
}
  • $ifNull: If tags is null, treat it as an empty array []

  • $size: Counts the number of elements in tags

Intermediate Output:

[
  { "tags": ["js", "react"], "numberOfTags": 2 },
  { "tags": ["node"], "numberOfTags": 1 },
  { "tags": [], "numberOfTags": 0 }
]

Step 2: $group

{
  _id: null,
  avgTags: { $avg: "$numberOfTags" }
}
  • Groups all documents into one group

  • Calculates the average of the numberOfTags field

Output:

[ { "_id": null, "avgTags": 1 } ]

Example 18: Users With Tag "valit"

Real-World Use Case: Target users with specific tag.

Dataset:

[
  { "name": "Amit", "tags": ["valit"], "isActive": true },
  { "name": "John", "tags": ["react"] }
]

Pipeline:

[
  { $match: { isActive: true, tags: "valit" } },
  { $project: { name: 1, age: 1, _id: 0 } }
]

Output:

[ { "name": "Amit" } ]

Example 19: Group Users by Fruit and Push Names

Real-World Use Case: Create clusters of users by taste.

Dataset:

[
  { "name": "Amit", "favoriteFruit": "Apple" },
  { "name": "John", "favoriteFruit": "Apple" },
  { "name": "Neha", "favoriteFruit": "Banana" }
]

Pipeline:

[{
  $group: {
    _id: "$favoriteFruit",
    users: { $push: "$name" }
  }
}]

Output:

[
  { _id: "Apple", users: ["Amit", "John"] },
  { _id: "Banana", users: ["Neha"] }
]

Step-by-Step Breakdown

Step 1: $group

t{
  _id: "$favoriteFruit",
  users: { $push: "$name" }
}

This tells MongoDB:

  • Group documents by favoriteFruit

    • All documents with "Apple" will go into one group

    • All documents with "Banana" will go into another

  • In each group, collect a list of names using $push


What $push Does:

$push: "$name"
  • Adds the value of "name" field into an array

  • It doesn't remove duplicates (use $addToSet if you want that)

  • It lets you build a list of values per group


Intermediate Grouping:

MongoDB internally creates groups like:

{
  "Apple": [ "Amit", "John" ],
  "Banana": [ "Neha" ]
}

Final Output:

[
  { "_id": "Apple", users: ["Amit", "John"] },
  { "_id": "Banana", users: ["Neha"] }]

Example 20: Users with Tags "id" and "enim"

Real-World Use Case: Strict filtering by multiple tags.

Dataset:

[
  { "tags": ["id", "enim"] },
  { "tags": ["id"] }
]

Pipeline:

[ { $match: { tags: { $all: ["id", "enim"] } } ]

Output:

[ { "tags": ["id", "enim"] } ]

Example 21: Companies in USA and Their User Count

Real-World Use Case: Show how many users work at each USA company.

Dataset:

[
  { "company": { "title": "TechCorp", "location": { "country": "USA" } } },
  { "company": { "title": "DevSoft", "location": { "country": "USA" } } },
  { "company": { "title": "TechCorp", "location": { "country": "USA" } } }
]

Pipeline:

[
  { $match: { "company.location.country": "USA" } },
  { $group: { _id: "$company.title", userCount: { $sum: 1 } } }
]

Output:

[ { _id: "TechCorp", userCount: 2 }, { _id: "DevSoft", userCount: 1 } ]

Conclusion

Aggregation in MongoDB is not just a feature — it's a superpower for backend developers. Whether you're building analytics dashboards, filtering large datasets, or transforming documents for the frontend, the aggregation pipeline gives you the flexibility and performance to do it all — within the database itself.

In this blog, we explored 21 real-world examples — from basic filtering to joining collections and calculating insights — without writing complex backend code.

If you're new to MongoDB, start using these pipelines today in your projects. And if you're already using MongoDB but haven’t used aggregations yet — now is the time!

One Query. Many Insights. That’s the power of MongoDB Aggregation.

#MongoDB #AggregationPipeline #Mongoose #Backend #ChaiAurCode #Hashnode

1
Subscribe to my newsletter

Read articles from Harsh Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Harsh Sharma
Harsh Sharma

Hi 👋 I'm a Full Stack Developer passionate about building scalable, and cloud-ready applications. I work with the MERN stack (MongoDB, Express, React, Node.js) and Python to craft robust frontend and backend systems. I'm experienced with cloud technologies like AWS (EC2, S3, Lambda) and containerization using Docker. I also love integrating Generative AI (OpenAI, LLMs) into applications and working on real-time features using WebSockets and Apache Kafka. My expertise lies in delivering high-performance, full-stack solutions with clean code, solid architecture, and efficient DevOps practices. Open to freelance or full-time opportunities as a Full Stack Developer!