MongoDB Aggregation Pipeline


Why Aggregation?
Imagine you’re building a platform like Coursera or Udemy. As your platform grows, so does your database — thousands of users, enrollments, courses, interests, and more.
Now you face real-world data questions like:
How many students are from India?
What's the average age of users?
Which course has the most enrollments?
How many users are active or interested in frontend?
This is where the MongoDB Aggregation Pipeline becomes your superpower. 💪
It helps you analyze, transform, and summarize large datasets efficiently — all within the database, without writing external logic.
What is Aggregation in MongoDB?
Aggregation in MongoDB is a powerful way to process data records and return computed results.
Think of it like building a data processing pipeline – each step (called a stage) does something specific, like filtering, grouping, sorting, etc.
What is a Pipeline?
A pipeline is an array of stages, where each stage transforms the data in some way and passes the output to the next stage — just like an assembly line in a factory.
[
{ $match: { country: "India" } },
{ $group: { _id: "$course", count: { $sum: 1 } } }
]
Each stage begins with a $
operator like $match
, $group
, $project
, $sort
, $limit
, etc.
What is a Stage?
A stage is a single step in the aggregation process. Each stage:
Takes in documents
Performs a transformation (e.g., filter, group, count)
Passes transformed output to the next stage
For example:
$match
: Filters documents like a query$group
: Groups by a field and performs aggregation like sum, avg$project
: Selects fields or creates new ones$sort
,$limit
,$skip
: For sorting, pagination$unwind
: Flattens arrays$lookup
: Joins collections
❓ Why Use Aggregation?
Without aggregation, you'd have to:
Pull all documents to your server
Write complex logic to compute results
Struggle with performance issues
With aggregation, MongoDB does the heavy lifting inside the database engine, which is much faster and optimized for large datasets.
What Will This Blog Cover?
This blog will take you from zero to pro using 21 carefully crafted, real-world aggregation examples. Each one includes:
Real-world use case
Sample dataset
Aggregation pipeline query
Final output
Step-by-step breakdown
Topics and Questions You’ll Learn to Answer
Topic | Questions Answered |
$match | How to filter documents based on conditions? |
$group + $sum , $avg | How to count users per group, calculate averages? |
$project | How to include or exclude fields from documents? |
$sort , $limit , $skip | How to paginate results or find top records? |
$unwind | How to handle arrays inside documents? |
$lookup | How to join two collections like SQL? |
$count | How to get a total count of filtered results? |
$in , $all | How to match documents based on array contents? |
$push , $addFields , $size , $ifNull | How to reshape and enrich your documents? |
Example 1: Find All Users from India
Real-World Use Case: List users in India for targeted emails.
Dataset:
[
{ "name": "Amit", "profile": { "country": "India" } },
{ "name": "Emily", "profile": { "country": "USA" } }
]
Pipeline:
[ { $match: { "profile.country": "India" } } ]
Output:
[ { "name": "Amit", "profile": { "country": "India" } } ]
Breakdown:
$match
filters only documents whereprofile.country === "India"
Example 2: Count of Users per Country
Real-World Use Case: Show user distribution in admin dashboard.
Dataset:
[
{ "name": "Amit", "profile": { "country": "India" } },
{ "name": "John", "profile": { "country": "USA" } },
{ "name": "Raj", "profile": { "country": "India" } }
]
Pipeline:
[{
$group: {
_id: "$profile.country",
userCount: { $sum: 1 }
}
}]
Output:
[ { _id: "India", userCount: 2 }, { _id: "USA", userCount: 1 } ]
Breakdown:
$group
groups documents by country$sum: 1
counts number of users in each country. it is an accumlator.
Example 3: Average Age of Users
Real-World Use Case: Display average user age in reports.
Dataset:
[
{ "name": "Amit", "age": 25 },
{ "name": "Raj", "age": 30 }
]
Pipeline:
[{
$group: {
_id: null,
avgAge: { $avg: "$age" }
}
}]
Output:
[ { _id: null, avgAge: 27.5 } ]
Breakdown:
_id: null
means no grouping key; aggregate all$avg
calculates average age
Example 4: Students Per Course
Real-World Use Case: Know popularity of each course.
Dataset:
[
{ "userId": 1, "courseId": "React" },
{ "userId": 2, "courseId": "Node" },
{ "userId": 3, "courseId": "React" }
]
Pipeline:
[{
$group: {
_id: "$courseId",
numberOfStudents: { $sum: 1 }
}
}]
Output:
[ { _id: "React", numberOfStudents: 2 }, { _id: "Node", numberOfStudents: 1 } ]
Example 5: Most Enrolled Course
Real-World Use Case: Show trending course.
Pipeline:
[
{ $group: { _id: "$courseId", numberOfStudents: { $sum: 1 } } },
{ $sort: { numberOfStudents: -1 } },
{ $limit: 1 }
]
Output:
[ { _id: "React", numberOfStudents: 2 } ]
Example 6: Filter Users Age > 30
Real-World Use Case: Senior user group campaign.
Dataset:
[
{ "name": "Amit", "profile": { "age": 28 } },
{ "name": "Raj", "profile": { "age": 35 } }
]
Pipeline:
[{
$match: { "profile.age": { $gt: 30 } }
}]
Output:
[ { "name": "Raj", "profile": { "age": 35 } } ]
Example 7: Students Interested in Frontend
Real-World Use Case: Recommend frontend jobs or events.
Dataset:
[
{ "name": "Amit", "interests": ["Frontend", "React"] },
{ "name": "John", "interests": ["Backend"] }
]
Pipeline:
[{
$match: { interests: { $in: ["Frontend"] } }
}]
Output:
[ { "name": "Amit", "interests": ["Frontend", "React"] } ]
Example 8: Show Name & Email Only
Real-World Use Case: Only needed data for newsletters.
Dataset:
[
{ "name": "Amit", "email": "amit@mail.com", "age": 30 }
]
Pipeline:
[{
$project: { name: 1, email: 1, _id: 0 }
}]
Output:
[ { "name": "Amit", "email": "amit@mail.com" } ]
Example 9: Count Students per Interest (Using $unwind
)
Real-World Use Case: Measure popularity of interests.
How $unwind
Works (Step-by-Step)
Given a document like:
{ "name": "Amit", "interests": ["Frontend", "React"] }
After applying:
{ $unwind: "$interests" }
It becomes:
{ "name": "Amit", "interests": "Frontend" }
{ "name": "Amit", "interests": "React" }
So instead of 1 document with 2 interests, we now have 2 documents, each with 1 interest. This allows us to then group, count, or filter by individual interests.
Dataset:
[
{ "name": "Amit", "interests": ["Frontend", "React"] },
{ "name": "John", "interests": ["Frontend"] }
]
Pipeline:
[
{ $unwind: "$interests" },
{ $group: { _id: "$interests", studentCount: { $sum: 1 } } }
]
Output:
[ { _id: "Frontend", studentCount: 2 }, { _id: "React", studentCount: 1 } ]
Step-by-Step Breakdown:
$unwind: "$interests"
- Converts array of interests into separate documents:
{ "name": "Amit", "interests": "Frontend" }
{ "name": "Amit", "interests": "React" }
{ "name": "John", "interests": "Frontend" }
$group
stage- Now that every interest is in a separate document, we can group by
interests
:
- Now that every interest is in a separate document, we can group by
{
_id: "$interests",
studentCount: { $sum: 1 }
}
Final Output:
[ { "_id": "Frontend", "studentCount": 2 }, { "_id": "React", "studentCount": 1 } ]
Why $unwind
Is Important Here
Without $unwind
, MongoDB would treat the entire interests
array as a single value, and grouping wouldn’t work as expected.
For example, without $unwind
, the $group
would see:
{ "_id": ["Frontend", "React"] } // not split
So we must unwind arrays if we want to:
Count individual items inside them
Filter by individual array elements
Group by array values
Example 10: Join Users with Enrollments using $lookup
Real-World Use Case:
Display how many courses each user has enrolled in.
Datasets:
Users Collection:
[
{ "_id": 1, "name": "Amit" }
]
Enrollments Collection:
[
{ "userId": 1, "course": "React" },
{ "userId": 1, "course": "Node" }
]
Aggregation Pipeline:
[
{
$lookup: {
from: "enrollments", // collection to join
localField: "_id", // field from the users collection
foreignField: "userId", // field from the enrollments collection
as: "enrollments" // name of the new array field in output
}
},
{
$project: {
name: 1,
numberOfCourses: { $size: "$enrollments" },
_id: 0
}
}
]
Step-by-Step Breakdown
Step 1: $lookup
This stage joins the users
collection with the enrollments
collection using a foreign key-like relationship.
localField: "_id"
→ fromusers
foreignField: "userId"
→ fromenrollments
So for:
{ "_id": 1, "name": "Amit" }
MongoDB looks in the enrollments
collection for documents where userId == 1
.
It finds:
[
{ "userId": 1, "course": "React" },
{ "userId": 1, "course": "Node" }
]
Then adds them to a new array field called enrollments
.
Intermediate Output After $lookup
:
[
{
"name": "Amit",
"_id": 1,
"enrollments": [
{ "userId": 1, "course": "React" },
{ "userId": 1, "course": "Node" }
]
}
]
Step 2: $project
Now we reshape the document using $project
.
jsCopyEdit{
name: 1, // include name
numberOfCourses: { $size: "$enrollments" }, // count number of enrollments
_id: 0 // exclude _id
}
$size
returns the length of theenrollments
arraynumberOfCourses = 2
(since there are 2 courses)
Final Output:
[
{
"name": "Amit",
"numberOfCourses": 2
}
]
Example 11: Pagination (Skip & Limit)
Real-World Use Case: Show next 5 users after skipping 10.
Dataset:
[
{ "name": "User1", "age": 20 },
{ "name": "User2", "age": 21 },
{ "name": "User3", "age": 22 },
...
]
Pipeline:
[ { $sort: { age: 1 } }, { $skip: 10 }, { $limit: 5 } ]
Output:
5 users from the 11th youngest onwards.
Example 12: Count Active Users
Real-World Use Case: Show count of currently active users.
Dataset:
[
{ "name": "Amit", "isActive": true },
{ "name": "John", "isActive": false },
{ "name": "Neha", "isActive": true }
]
Pipeline:
[ { $match: { isActive: true } }, { $count: "activeUsers" } ]
Output:
[ { "activeUsers": 2 } ]
Example 13: Average Age by Gender
Real-World Use Case: Know average age for gender-based user insights.
Dataset:
[
{ "name": "Amit", "gender": "Male", "age": 25 },
{ "name": "Sara", "gender": "Female", "age": 30 }
]
Pipeline:
[{
$group: { _id: "$gender", averageAge: { $avg: "$age" } }
}]
Output:
[ { _id: "Male", averageAge: 25 }, { _id: "Female", averageAge: 30 } ]
Example 14: 5 Most Common Favorite Fruits
Real-World Use Case: Popular fruits for eCommerce or recommendation.
Dataset:
[
{ "favoriteFruit": "Apple" },
{ "favoriteFruit": "Banana" },
{ "favoriteFruit": "Apple" }
]
Pipeline:
[
{ $group: { _id: "$favoriteFruit", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 5 }
]
Output:
[ { _id: "Apple", count: 2 }, { _id: "Banana", count: 1 } ]
Example 15: Country with Most Users
Real-World Use Case: Show highest user distribution by country.
Dataset:
[
{ "company": { "location": { "country": "India" } } },
{ "company": { "location": { "country": "USA" } } },
{ "company": { "location": { "country": "India" } } }
]
Pipeline:
[
{ $group: { _id: "$company.location.country", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 1 }
]
Output:
[ { _id: "India", count: 2 } ]
Example 16: Unique Eye Colors
Real-World Use Case: Display distinct options for filters in a UI.
Dataset:
[
{ "eyeColor": "blue" },
{ "eyeColor": "green" },
{ "eyeColor": "blue" }
]
Pipeline:
[ { $group: { _id: "$eyeColor" } } ]
Output:
[ { _id: "blue" }, { _id: "green" } ]
Example 17: Average Number of Tags Per User
Real-World Use Case: How active users are with tagging.
Dataset:
[
{ "tags": ["js", "react"] },
{ "tags": ["node"] },
{ "tags": [] }
]
Pipeline:
[
{ $addFields: { numberOfTags: { $size: { $ifNull: [ "$tags", [] ] } } } },
{ $group: { _id: null, avgTags: { $avg: "$numberOfTags" } } }
]
Output:
[ { _id: null, avgTags: 1 } ]
Step-by-Step Breakdown:
Step 1: $addFields
{
numberOfTags: {
$size: { $ifNull: ["$tags", []] }
}
}
$ifNull
: Iftags
isnull
, treat it as an empty array[]
$size
: Counts the number of elements intags
Intermediate Output:
[
{ "tags": ["js", "react"], "numberOfTags": 2 },
{ "tags": ["node"], "numberOfTags": 1 },
{ "tags": [], "numberOfTags": 0 }
]
Step 2: $group
{
_id: null,
avgTags: { $avg: "$numberOfTags" }
}
Groups all documents into one group
Calculates the average of the
numberOfTags
field
Output:
[ { "_id": null, "avgTags": 1 } ]
Example 18: Users With Tag "valit"
Real-World Use Case: Target users with specific tag.
Dataset:
[
{ "name": "Amit", "tags": ["valit"], "isActive": true },
{ "name": "John", "tags": ["react"] }
]
Pipeline:
[
{ $match: { isActive: true, tags: "valit" } },
{ $project: { name: 1, age: 1, _id: 0 } }
]
Output:
[ { "name": "Amit" } ]
Example 19: Group Users by Fruit and Push Names
Real-World Use Case: Create clusters of users by taste.
Dataset:
[
{ "name": "Amit", "favoriteFruit": "Apple" },
{ "name": "John", "favoriteFruit": "Apple" },
{ "name": "Neha", "favoriteFruit": "Banana" }
]
Pipeline:
[{
$group: {
_id: "$favoriteFruit",
users: { $push: "$name" }
}
}]
Output:
[
{ _id: "Apple", users: ["Amit", "John"] },
{ _id: "Banana", users: ["Neha"] }
]
Step-by-Step Breakdown
Step 1: $group
t{
_id: "$favoriteFruit",
users: { $push: "$name" }
}
This tells MongoDB:
Group documents by
favoriteFruit
All documents with
"Apple"
will go into one groupAll documents with
"Banana"
will go into another
In each group, collect a list of names using
$push
What $push
Does:
$push: "$name"
Adds the value of
"name"
field into an arrayIt doesn't remove duplicates (use
$addToSet
if you want that)It lets you build a list of values per group
Intermediate Grouping:
MongoDB internally creates groups like:
{
"Apple": [ "Amit", "John" ],
"Banana": [ "Neha" ]
}
Final Output:
[
{ "_id": "Apple", users: ["Amit", "John"] },
{ "_id": "Banana", users: ["Neha"] }]
Example 20: Users with Tags "id" and "enim"
Real-World Use Case: Strict filtering by multiple tags.
Dataset:
[
{ "tags": ["id", "enim"] },
{ "tags": ["id"] }
]
Pipeline:
[ { $match: { tags: { $all: ["id", "enim"] } } ]
Output:
[ { "tags": ["id", "enim"] } ]
Example 21: Companies in USA and Their User Count
Real-World Use Case: Show how many users work at each USA company.
Dataset:
[
{ "company": { "title": "TechCorp", "location": { "country": "USA" } } },
{ "company": { "title": "DevSoft", "location": { "country": "USA" } } },
{ "company": { "title": "TechCorp", "location": { "country": "USA" } } }
]
Pipeline:
[
{ $match: { "company.location.country": "USA" } },
{ $group: { _id: "$company.title", userCount: { $sum: 1 } } }
]
Output:
[ { _id: "TechCorp", userCount: 2 }, { _id: "DevSoft", userCount: 1 } ]
Conclusion
Aggregation in MongoDB is not just a feature — it's a superpower for backend developers. Whether you're building analytics dashboards, filtering large datasets, or transforming documents for the frontend, the aggregation pipeline gives you the flexibility and performance to do it all — within the database itself.
In this blog, we explored 21 real-world examples — from basic filtering to joining collections and calculating insights — without writing complex backend code.
If you're new to MongoDB, start using these pipelines today in your projects. And if you're already using MongoDB but haven’t used aggregations yet — now is the time!
One Query. Many Insights. That’s the power of MongoDB Aggregation.
#MongoDB #AggregationPipeline #Mongoose #Backend #ChaiAurCode #Hashnode
Subscribe to my newsletter
Read articles from Harsh Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Harsh Sharma
Harsh Sharma
Hi 👋 I'm a Full Stack Developer passionate about building scalable, and cloud-ready applications. I work with the MERN stack (MongoDB, Express, React, Node.js) and Python to craft robust frontend and backend systems. I'm experienced with cloud technologies like AWS (EC2, S3, Lambda) and containerization using Docker. I also love integrating Generative AI (OpenAI, LLMs) into applications and working on real-time features using WebSockets and Apache Kafka. My expertise lies in delivering high-performance, full-stack solutions with clean code, solid architecture, and efficient DevOps practices. Open to freelance or full-time opportunities as a Full Stack Developer!