Understanding the MongoDB Aggregation Pipeline: A Beginner's Guide
What is the Aggregation Pipeline in MongoDB?
The aggregation pipeline in MongoDB is a powerful framework used to process and transform data. It processes data in multiple stages, with each stage performing a specific operation (like filtering, grouping, or sorting). The output of one stage becomes the input for the next, similar to how a SQL GROUP BY or JOIN works. Think of it as a data pipeline, where data flows through several stages, with each stage transforming it before passing it to the next.
How the Aggregation Pipeline Works?
An aggregation pipeline consists of multiple stages, each of which applies a specific transformation to the data. Data flows through these stages one after the other. The syntax looks like this:
db.collection.aggregate([
{ stage1 },
{ stage2 },
{ stage3 },
...
]);
Each stage is an object where you specify an operation (e.g., $match
, $group
, $sort
) and the parameters needed for that operation.
Sample Document:
Let's say we have the following user document in the users
collection:
{
"_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
"name": "Alice",
"age": 25,
"gender": "female",
"hobbies": ["coding", "reading", "swimming"],
"orders": [
{ "product": "Laptop", "price": 1000 },
{ "product": "Phone", "price": 500 }
]
}
Important Aggregation Operators and How They Work.
- $match – Filter Documents (Similar to
WHERE
in SQL)
It works like the
find()
query. This stage selects documents that match the specified conditions.Use it early in the pipeline to reduce the amount of data processed in later stages.
$match
is used to filter documents based on certain conditions.// Query: const result = await User.aggregate([ { $match: { age: { $gte: 18 } } } ]);
// Output: [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "age": 25, "gender": "female", "hobbies": ["coding", "reading", "swimming"], "orders": [ { "product": "Laptop", "price": 1000 }, { "product": "Phone", "price": 500 } ] } ]
This returns documents where
age >= 18
.
- $project - Select Specific Fields
Use
$project
to shape the output by including or excluding certain fields.// Query : const result = await User.aggregate([ { $project: { name: 1, age: 1, totalOrders: { $size: "$orders" } } } ]);
// Output : [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "age": 25, "totalOrders": 2 } ]
This includes the
name
,age
, and the number of orders.
- $unwind - Flatten an Array Field
Use
$unwind
to break down an array field (likeorders
orhobbies
) into multiple documents.// Query : const result = await User.aggregate([ { $unwind: "$hobbies" } ]);
// Output : [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "hobbies": "coding" }, { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "hobbies": "reading" }, { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "hobbies": "swimming" } ]
Each hobby in the hobbies array becomes a separate document.
- $addFields - Add New Fields to the Output
Use
$addFields
to add computed fields to documents.// Query : const result = await User.aggregate([ { $addFields: { isAdult: { $gte: ["$age", 18] } } } ]);
// Output : [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "age": 25, "isAdult": true } ]
Some fields were missing from your output after using the
$addFields
operator. Don't worry, this isn't a mistake but a result of how MongoDB's aggregation pipeline works. By default, each stage only passes along the fields specified or modified in that stage.To ensure all original fields are included along with your new field (like
isAdult
), you can use the$project
operator to explicitly list the fields you want to keep.const result = await User.aggregate([ { $addFields: { isAdult: { $gte: ["$age", 18] } // Adding a new field } }, { $project: { // Explicitly include all fields name: 1, age: 1, gender: 1, hobbies: 1, orders: 1, isAdult: 1 } } ]);
// Output: [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "age": 25, "gender": "female", "hobbies": ["coding", "reading", "swimming"], "orders": [ { "product": "Laptop", "price": 1000 }, { "product": "Phone", "price": 500 } ], "isAdult": true } ]
- $lookup - Join with Another Collection
The
$lookup
operator in MongoDB's aggregation pipeline allows you to perform joins between collections, similar to SQL joins. It is used to merge data from one collection with another based on a matching field, which helps in retrieving related data from multiple sources.Syntax of
$lookup
:{ $lookup: { from: "<foreignCollection>", localField: "<localField>", foreignField: "<foreignField>", as: "<outputField>" } };
from
: The name of the foreign collection (the collection you want to join).localField
: The field from the current collection used to match with theforeignField
.foreignField
: The field from the foreign collection to match with thelocalField
.as
: The new field's name in the output will contain the matched documents from the foreign collection.
How $lookup
Works:
$lookup
performs a left outer join:
If there is a match, the documents from the foreign collection are embedded as an array.
If there is no match, the array will be empty.
Sample products
Collection:
[
{ "_id": ObjectId("1"), "productName": "Laptop", "category": "Electronics" },
{ "_id": ObjectId("2"), "productName": "Phone", "category": "Electronics" }
]
Sample users Collection:
{
"_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
"name": "Alice",
"age": 25,
"gender": "female",
"hobbies": ["coding", "reading", "swimming"],
"orders": [
{ "product": "Laptop", "price": 1000 },
{ "product": "Phone", "price": 500 }
]
};
Now we Perform $lookup like this :
const result = await User.aggregate([
{
$lookup: {
from: "products", // Name of the collection to join
let: { orderProducts: "$orders.product" }, // Define the products from orders array
pipeline: [
{
$match: {
$expr: { $in: ["$productName", "$$orderProducts"] } // Match productName with products in orders
}
}
],
as: "orderDetails" // Store the matched products as 'orderDetails'
}
},
{
$project: {
name: 1,
age: 1,
gender: 1,
orders: 1,
orderDetails: 1
}
}
]).toArray();
console.log(result);
Stage 1: $lookup
The $lookup
stage performs a left outer join to a specified collection to filter documents from the products
collection.
from: "products"
:This specifies the name of the collection to join with. In this case, the
products
collection contains details about various products.let: { orderProducts: "$orders.product" }
:This creates a variable
orderProducts
, which stores theproduct
field from each document’sorders
array. This variable is accessible within the$lookup
pipeline.pipeline: [...]
:This section allows you to define a series of operations to execute on the documents from the
products
collection. Here, it contains a$match
stage.Inside the Pipeline:
$match
: Filters the documents from theproducts
collection to include only those where theproductName
matches one of the products in theorderProducts
array.$expr: { $in: ["$productName", "$$orderProducts"] }
:$in
: This operator checks if theproductName
(from theproducts
collection) is present in theorderProducts
array (defined earlier). This effectively allows for filtering based on the order's products.as: "orderDetails"
:This defines the name of the new field that will contain the matched documents from the
products
collection. The results of the$lookup
will be stored in theorderDetails
field of the output documents.
Stage 2: $project
The $project
stage specifies which fields to include or exclude in the output documents.
$project: {...}
:This stage shapes the output document by including specific fields from the input documents.
Field specifications:
name: 1
: Include thename
field.age: 1
: Include theage
field.gender: 1
: Include thegender
field.orders: 1
: Include theorders
field.orderDetails: 1
: Include theorderDetails
field that was populated by the$lookup
.
// Output of the Aggregation Query
[
{
"_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
"name": "Alice",
"age": 25,
"gender": "female",
"orders": [
{ "product": "Laptop", "price": 1000 },
{ "product": "Phone", "price": 500 }
],
"orderDetails": [
{ "_id": ObjectId("1"), "productName": "Laptop", "category": "Electronics" },
{ "_id": ObjectId("2"), "productName": "Phone", "category": "Electronics" }
]
}
]
$sort - Sort Documents
Use
$sort
to order documents by a field.
// Query :
const result = await User.aggregate([
{ $sort: { age: -1 } } // Sort by age in descending order
]);
// Output :
[
{
"_id": "6357f4c4e4b0a5c95f9d4e5a",
"name": "Alice",
"age": 25
}
]
This sorts documents by age in descending order.
$limit - Limit the Number of Results
Use
$limit
to restrict the number of results returned.// Query: const result = await User.aggregate([ { $limit: 1 } ]);
// Output : [ { "_id": "6357f4c4e4b0a5c95f9d4e5a", "name": "Alice", "age": 25 } ] // This limits the output to one document.
$count - Count Documents
Use
$count
to get the total number of documents in the pipeline.// Query: const result = await User.aggregate([ { $match: { age: { $gte: 18 } } }, { $count: "totalAdults" } ]);
// Output: [ { "totalAdults": 1 } ] // This counts the number of adults in the collection.
Aggregation Pipeline Optimization Tips:
Use
$match
early: Filtering early reduces the data processed in later stages.Use indexes: Queries that use
$match
and$sort
benefit from appropriate indexes.Avoid large
$lookup
joins: Joins with large collections can impact performance.
Summary:
The MongoDB aggregation pipeline is a powerful way to transform and analyze data within MongoDB. With operators like $match
, $group
, $sort
, $unwind
, and $lookup
, you can perform a wide range of operations on your data efficiently. Learning to use these operators effectively helps you write optimized and meaningful queries for real-world use cases.
Subscribe to my newsletter
Read articles from AMIT directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by