MongoDB Aggregation Pipeline

Why Aggregation?

Imagine you’re building a platform like Coursera or Udemy. As your platform grows, so does your database — thousands of users, enrollments, courses, interests, and more.

Now you face real-world data questions like:

How many students are from India?
What's the average age of users?
Which course has the most enrollments?
How many users are active or interested in frontend?

This is where the MongoDB Aggregation Pipeline becomes your superpower. 💪
It helps you analyze, transform, and summarize large datasets efficiently — all within the database, without writing external logic.

What is Aggregation in MongoDB?

Aggregation in MongoDB is a powerful way to process data records and return computed results.

Think of it like building a data processing pipeline – each step (called a stage) does something specific, like filtering, grouping, sorting, etc.

What is a Pipeline?

A pipeline is an array of stages, where each stage transforms the data in some way and passes the output to the next stage — just like an assembly line in a factory.

[
  { $match: { country: "India" } },
  { $group: { _id: "$course", count: { $sum: 1 } } }
]

Each stage begins with a $ operator like $match, $group, $project, $sort, $limit, etc.

What is a Stage?

A stage is a single step in the aggregation process. Each stage:

Takes in documents
Performs a transformation (e.g., filter, group, count)
Passes transformed output to the next stage

For example:

$match: Filters documents like a query
$group: Groups by a field and performs aggregation like sum, avg
$project: Selects fields or creates new ones
$sort, $limit, $skip: For sorting, pagination
$unwind: Flattens arrays
$lookup: Joins collections

❓ Why Use Aggregation?

Without aggregation, you'd have to:

Pull all documents to your server
Write complex logic to compute results
Struggle with performance issues

With aggregation, MongoDB does the heavy lifting inside the database engine, which is much faster and optimized for large datasets.

What Will This Blog Cover?

This blog will take you from zero to pro using 21 carefully crafted, real-world aggregation examples. Each one includes:
Real-world use case
Sample dataset
Aggregation pipeline query
Final output
Step-by-step breakdown

Topics and Questions You’ll Learn to Answer

Topic	Questions Answered
`$match`	How to filter documents based on conditions?
`$group` + `$sum`, `$avg`	How to count users per group, calculate averages?
`$project`	How to include or exclude fields from documents?
`$sort`, `$limit`, `$skip`	How to paginate results or find top records?
`$unwind`	How to handle arrays inside documents?
`$lookup`	How to join two collections like SQL?
`$count`	How to get a total count of filtered results?
`$in`, `$all`	How to match documents based on array contents?
`$push`, `$addFields`, `$size`, `$ifNull`	How to reshape and enrich your documents?

Example 1: Find All Users from India

Real-World Use Case: List users in India for targeted emails.

Dataset:

[
  { "name": "Amit", "profile": { "country": "India" } },
  { "name": "Emily", "profile": { "country": "USA" } }
]

Pipeline:

[ { $match: { "profile.country": "India" } } ]

Output:

[ { "name": "Amit", "profile": { "country": "India" } } ]

Breakdown:

$match filters only documents where profile.country === "India"

Example 2: Count of Users per Country

Real-World Use Case: Show user distribution in admin dashboard.

Dataset:

[
  { "name": "Amit", "profile": { "country": "India" } },
  { "name": "John", "profile": { "country": "USA" } },
  { "name": "Raj", "profile": { "country": "India" } }
]

Pipeline:

[{
  $group: {
    _id: "$profile.country",
    userCount: { $sum: 1 }
  }
}]

Output:

[ { _id: "India", userCount: 2 }, { _id: "USA", userCount: 1 } ]

Breakdown:

$group groups documents by country
$sum: 1 counts number of users in each country. it is an accumlator.

Example 3: Average Age of Users

Real-World Use Case: Display average user age in reports.

Dataset:

[
  { "name": "Amit", "age": 25 },
  { "name": "Raj", "age": 30 }
]

Pipeline:

[{
  $group: {
    _id: null,
    avgAge: { $avg: "$age" }
  }
}]

Output:

[ { _id: null, avgAge: 27.5 } ]

Breakdown:

_id: null means no grouping key; aggregate all
$avg calculates average age

Example 4: Students Per Course

Real-World Use Case: Know popularity of each course.

Dataset:

[
  { "userId": 1, "courseId": "React" },
  { "userId": 2, "courseId": "Node" },
  { "userId": 3, "courseId": "React" }
]

Pipeline:

[{
  $group: {
    _id: "$courseId",
    numberOfStudents: { $sum: 1 }
  }
}]

Output:

[ { _id: "React", numberOfStudents: 2 }, { _id: "Node", numberOfStudents: 1 } ]

Example 5: Most Enrolled Course

Real-World Use Case: Show trending course.

Pipeline:

[
  { $group: { _id: "$courseId", numberOfStudents: { $sum: 1 } } },
  { $sort: { numberOfStudents: -1 } },
  { $limit: 1 }
]

Output:

[ { _id: "React", numberOfStudents: 2 } ]

Example 6: Filter Users Age > 30

Real-World Use Case: Senior user group campaign.

Dataset:

[
  { "name": "Amit", "profile": { "age": 28 } },
  { "name": "Raj", "profile": { "age": 35 } }
]

Pipeline:

[{
  $match: { "profile.age": { $gt: 30 } }
}]

Output:

[ { "name": "Raj", "profile": { "age": 35 } } ]

Example 7: Students Interested in Frontend

Real-World Use Case: Recommend frontend jobs or events.

Dataset:

[
  { "name": "Amit", "interests": ["Frontend", "React"] },
  { "name": "John", "interests": ["Backend"] }
]

Pipeline:

[{
  $match: { interests: { $in: ["Frontend"] } }
}]

Output:

[ { "name": "Amit", "interests": ["Frontend", "React"] } ]

Example 8: Show Name & Email Only

Real-World Use Case: Only needed data for newsletters.

Dataset:

[
  { "name": "Amit", "email": "amit@mail.com", "age": 30 }
]

Pipeline:

[{
  $project: { name: 1, email: 1, _id: 0 }
}]

Output:

[ { "name": "Amit", "email": "amit@mail.com" } ]

Example 9: Count Students per Interest (Using `$unwind`)

Real-World Use Case: Measure popularity of interests.

How `$unwind` Works (Step-by-Step)

Given a document like:

{ "name": "Amit", "interests": ["Frontend", "React"] }

After applying:

{ $unwind: "$interests" }

It becomes:

{ "name": "Amit", "interests": "Frontend" }
{ "name": "Amit", "interests": "React" }

So instead of 1 document with 2 interests, we now have 2 documents, each with 1 interest. This allows us to then group, count, or filter by individual interests.

Dataset:

[
  { "name": "Amit", "interests": ["Frontend", "React"] },
  { "name": "John", "interests": ["Frontend"] }
]

Pipeline:

[
  { $unwind: "$interests" },
  { $group: { _id: "$interests", studentCount: { $sum: 1 } } }
]

Output:

[ { _id: "Frontend", studentCount: 2 }, { _id: "React", studentCount: 1 } ]

Step-by-Step Breakdown:

$unwind: "$interests"
- Converts array of interests into separate documents:

    { "name": "Amit", "interests": "Frontend" }
    { "name": "Amit", "interests": "React" }
    { "name": "John", "interests": "Frontend" }

$group stage
- Now that every interest is in a separate document, we can group by interests:

    {
      _id: "$interests",
      studentCount: { $sum: 1 }
    }

Final Output:

 [
   { "_id": "Frontend", "studentCount": 2 },
   { "_id": "React", "studentCount": 1 }
 ]

Why `$unwind` Is Important Here

Without $unwind, MongoDB would treat the entire interests array as a single value, and grouping wouldn’t work as expected.

For example, without $unwind, the $group would see:

{ "_id": ["Frontend", "React"] } // not split

So we must unwind arrays if we want to:

Count individual items inside them
Filter by individual array elements
Group by array values

Example 10: Join Users with Enrollments using `$lookup`

Real-World Use Case:

Display how many courses each user has enrolled in.

Datasets:

Users Collection:

[
  { "_id": 1, "name": "Amit" }
]

Enrollments Collection:

[
  { "userId": 1, "course": "React" },
  { "userId": 1, "course": "Node" }
]

Aggregation Pipeline:

[
  {
    $lookup: {
      from: "enrollments",            // collection to join
      localField: "_id",              // field from the users collection
      foreignField: "userId",         // field from the enrollments collection
      as: "enrollments"               // name of the new array field in output
    }
  },
  {
    $project: {
      name: 1,
      numberOfCourses: { $size: "$enrollments" },
      _id: 0
    }
  }
]

Step-by-Step Breakdown

Step 1: `$lookup`

This stage joins the users collection with the enrollments collection using a foreign key-like relationship.

localField: "_id" → from users
foreignField: "userId" → from enrollments

So for:

{ "_id": 1, "name": "Amit" }

MongoDB looks in the enrollments collection for documents where userId == 1.

It finds:

[
  { "userId": 1, "course": "React" },
  { "userId": 1, "course": "Node" }
]

Then adds them to a new array field called enrollments.

Intermediate Output After $lookup:

[
  {
    "name": "Amit",
    "_id": 1,
    "enrollments": [
      { "userId": 1, "course": "React" },
      { "userId": 1, "course": "Node" }
    ]
  }
]

Step 2: `$project`

Now we reshape the document using $project.

jsCopyEdit{
  name: 1,                              // include name
  numberOfCourses: { $size: "$enrollments" },  // count number of enrollments
  _id: 0                                // exclude _id
}

$size returns the length of the enrollments array
numberOfCourses = 2 (since there are 2 courses)

Final Output:

[
  {
    "name": "Amit",
    "numberOfCourses": 2
  }
]

Example 11: Pagination (Skip & Limit)

Real-World Use Case: Show next 5 users after skipping 10.

Dataset:

[
  { "name": "User1", "age": 20 },
  { "name": "User2", "age": 21 },
  { "name": "User3", "age": 22 },
  ...
]

Pipeline:

[ { $sort: { age: 1 } }, { $skip: 10 }, { $limit: 5 } ]

Output:
5 users from the 11th youngest onwards.

Example 12: Count Active Users

Real-World Use Case: Show count of currently active users.

Dataset:

[
  { "name": "Amit", "isActive": true },
  { "name": "John", "isActive": false },
  { "name": "Neha", "isActive": true }
]

Pipeline:

[ { $match: { isActive: true } }, { $count: "activeUsers" } ]

Output:

[ { "activeUsers": 2 } ]

Example 13: Average Age by Gender

Real-World Use Case: Know average age for gender-based user insights.

Dataset:

[
  { "name": "Amit", "gender": "Male", "age": 25 },
  { "name": "Sara", "gender": "Female", "age": 30 }
]

Pipeline:

[{
  $group: { _id: "$gender", averageAge: { $avg: "$age" } }
}]

Output:

[ { _id: "Male", averageAge: 25 }, { _id: "Female", averageAge: 30 } ]

Example 14: 5 Most Common Favorite Fruits

Real-World Use Case: Popular fruits for eCommerce or recommendation.

Dataset:

[
  { "favoriteFruit": "Apple" },
  { "favoriteFruit": "Banana" },
  { "favoriteFruit": "Apple" }
]

Pipeline:

[
  { $group: { _id: "$favoriteFruit", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
  { $limit: 5 }
]

Output:

[ { _id: "Apple", count: 2 }, { _id: "Banana", count: 1 } ]

Example 15: Country with Most Users

Real-World Use Case: Show highest user distribution by country.

Dataset:

[
  { "company": { "location": { "country": "India" } } },
  { "company": { "location": { "country": "USA" } } },
  { "company": { "location": { "country": "India" } } }
]

Pipeline:

[
  { $group: { _id: "$company.location.country", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
  { $limit: 1 }
]

Output:

[ { _id: "India", count: 2 } ]

Example 16: Unique Eye Colors

Real-World Use Case: Display distinct options for filters in a UI.

Dataset:

[
  { "eyeColor": "blue" },
  { "eyeColor": "green" },
  { "eyeColor": "blue" }
]

Pipeline:

[ { $group: { _id: "$eyeColor" } } ]

Output:

[ { _id: "blue" }, { _id: "green" } ]

Example 17: Average Number of Tags Per User

Real-World Use Case: How active users are with tagging.

Dataset:

[
  { "tags": ["js", "react"] },
  { "tags": ["node"] },
  { "tags": [] }
]

Pipeline:

[
  { $addFields: { numberOfTags: { $size: { $ifNull: [ "$tags", [] ] } } } },
  { $group: { _id: null, avgTags: { $avg: "$numberOfTags" } } }
]

Output:

[ { _id: null, avgTags: 1 } ]

Step-by-Step Breakdown:

Step 1: `$addFields`

{
  numberOfTags: {
    $size: { $ifNull: ["$tags", []] }
  }
}

$ifNull: If tags is null, treat it as an empty array []
$size: Counts the number of elements in tags

Intermediate Output:

[
  { "tags": ["js", "react"], "numberOfTags": 2 },
  { "tags": ["node"], "numberOfTags": 1 },
  { "tags": [], "numberOfTags": 0 }
]

Step 2: `$group`

{
  _id: null,
  avgTags: { $avg: "$numberOfTags" }
}

Groups all documents into one group
Calculates the average of the numberOfTags field

Output:

[ { "_id": null, "avgTags": 1 } ]

Example 18: Users With Tag "valit"

Real-World Use Case: Target users with specific tag.

Dataset:

[
  { "name": "Amit", "tags": ["valit"], "isActive": true },
  { "name": "John", "tags": ["react"] }
]

Pipeline:

[
  { $match: { isActive: true, tags: "valit" } },
  { $project: { name: 1, age: 1, _id: 0 } }
]

Output:

[ { "name": "Amit" } ]

Example 19: Group Users by Fruit and Push Names

Real-World Use Case: Create clusters of users by taste.

Dataset:

[
  { "name": "Amit", "favoriteFruit": "Apple" },
  { "name": "John", "favoriteFruit": "Apple" },
  { "name": "Neha", "favoriteFruit": "Banana" }
]

Pipeline:

[{
  $group: {
    _id: "$favoriteFruit",
    users: { $push: "$name" }
  }
}]

Output:

[
  { _id: "Apple", users: ["Amit", "John"] },
  { _id: "Banana", users: ["Neha"] }
]

Step-by-Step Breakdown

Step 1: `$group`

t{
  _id: "$favoriteFruit",
  users: { $push: "$name" }
}

This tells MongoDB:

Group documents by favoriteFruit
- All documents with "Apple" will go into one group
- All documents with "Banana" will go into another
In each group, collect a list of names using $push

What `$push` Does:

$push: "$name"

Adds the value of "name" field into an array
It doesn't remove duplicates (use $addToSet if you want that)
It lets you build a list of values per group

Intermediate Grouping:

MongoDB internally creates groups like:

{
  "Apple": [ "Amit", "John" ],
  "Banana": [ "Neha" ]
}

Final Output:

[
  { "_id": "Apple", users: ["Amit", "John"] },
  { "_id": "Banana", users: ["Neha"] }]

Example 20: Users with Tags "id" and "enim"

Real-World Use Case: Strict filtering by multiple tags.

Dataset:

[
  { "tags": ["id", "enim"] },
  { "tags": ["id"] }
]

Pipeline:

[ { $match: { tags: { $all: ["id", "enim"] } } ]

Output:

[ { "tags": ["id", "enim"] } ]

Example 21: Companies in USA and Their User Count

Real-World Use Case: Show how many users work at each USA company.

Dataset:

[
  { "company": { "title": "TechCorp", "location": { "country": "USA" } } },
  { "company": { "title": "DevSoft", "location": { "country": "USA" } } },
  { "company": { "title": "TechCorp", "location": { "country": "USA" } } }
]

Pipeline:

[
  { $match: { "company.location.country": "USA" } },
  { $group: { _id: "$company.title", userCount: { $sum: 1 } } }
]

Output:

[ { _id: "TechCorp", userCount: 2 }, { _id: "DevSoft", userCount: 1 } ]

Conclusion

Aggregation in MongoDB is not just a feature — it's a superpower for backend developers. Whether you're building analytics dashboards, filtering large datasets, or transforming documents for the frontend, the aggregation pipeline gives you the flexibility and performance to do it all — within the database itself.

In this blog, we explored 21 real-world examples — from basic filtering to joining collections and calculating insights — without writing complex backend code.

If you're new to MongoDB, start using these pipelines today in your projects. And if you're already using MongoDB but haven’t used aggregations yet — now is the time!

One Query. Many Insights. That’s the power of MongoDB Aggregation.

#MongoDB #AggregationPipeline #Mongoose #Backend #ChaiAurCode #Hashnode

MongoDB Aggregation Pipeline

Why Aggregation?

What is Aggregation in MongoDB?

What is a Pipeline?

What is a Stage?

❓ Why Use Aggregation?

What Will This Blog Cover?

Topics and Questions You’ll Learn to Answer

Example 1: Find All Users from India

Example 2: Count of Users per Country

Example 3: Average Age of Users

Example 4: Students Per Course

Example 5: Most Enrolled Course

Example 6: Filter Users Age > 30

Example 7: Students Interested in Frontend

Example 8: Show Name & Email Only

Example 9: Count Students per Interest (Using $unwind)

How $unwind Works (Step-by-Step)

Step-by-Step Breakdown:

Why $unwind Is Important Here

Example 10: Join Users with Enrollments using $lookup

Real-World Use Case:

Datasets:

Users Collection:

Enrollments Collection:

Aggregation Pipeline:

Step-by-Step Breakdown

Step 1: $lookup

Step 2: $project

Example 11: Pagination (Skip & Limit)

Example 12: Count Active Users

Example 13: Average Age by Gender

Example 14: 5 Most Common Favorite Fruits

Example 15: Country with Most Users

Example 16: Unique Eye Colors

Example 17: Average Number of Tags Per User

Step-by-Step Breakdown:

Step 1: $addFields

Step 2: $group

Example 18: Users With Tag "valit"

Example 19: Group Users by Fruit and Push Names

Step-by-Step Breakdown

Step 1: $group

What $push Does:

Intermediate Grouping:

Final Output:

Example 20: Users with Tags "id" and "enim"

Example 21: Companies in USA and Their User Count

Conclusion

Subscribe to my newsletter

Harsh Sharma

Harsh Sharma

Example 9: Count Students per Interest (Using `$unwind`)

How `$unwind` Works (Step-by-Step)

Why `$unwind` Is Important Here

Example 10: Join Users with Enrollments using `$lookup`

Step 1: `$lookup`

Step 2: `$project`

Step 1: `$addFields`

Step 2: `$group`

Step 1: `$group`

What `$push` Does: