Understanding the MongoDB Aggregation Pipeline: A Beginner's Guide

AMITAMIT
8 min read

What is the Aggregation Pipeline in MongoDB?

The aggregation pipeline in MongoDB is a powerful framework used to process and transform data. It processes data in multiple stages, with each stage performing a specific operation (like filtering, grouping, or sorting). The output of one stage becomes the input for the next, similar to how a SQL GROUP BY or JOIN works. Think of it as a data pipeline, where data flows through several stages, with each stage transforming it before passing it to the next.

How the Aggregation Pipeline Works?

An aggregation pipeline consists of multiple stages, each of which applies a specific transformation to the data. Data flows through these stages one after the other. The syntax looks like this:

db.collection.aggregate([
    { stage1 },
    { stage2 },
    { stage3 },
    ...
]);

Each stage is an object where you specify an operation (e.g., $match, $group, $sort) and the parameters needed for that operation.

Sample Document:

Let's say we have the following user document in the users collection:

{
  "_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
  "name": "Alice",
  "age": 25,
  "gender": "female",
  "hobbies": ["coding", "reading", "swimming"],
  "orders": [
    { "product": "Laptop", "price": 1000 },
    { "product": "Phone", "price": 500 }
  ]
}

Important Aggregation Operators and How They Work.

  1. $match – Filter Documents (Similar to WHERE in SQL)
  • It works like the find() query. This stage selects documents that match the specified conditions.

  • Use it early in the pipeline to reduce the amount of data processed in later stages.

  • $match is used to filter documents based on certain conditions.

      // Query:
      const result = await User.aggregate([
        { $match: { age: { $gte: 18 } } }
      ]);
    
      // Output:
      [
        {
          "_id": "6357f4c4e4b0a5c95f9d4e5a",
          "name": "Alice",
          "age": 25,
          "gender": "female",
          "hobbies": ["coding", "reading", "swimming"],
          "orders": [
            { "product": "Laptop", "price": 1000 },
            { "product": "Phone", "price": 500 }
          ]
        }
      ]
    

    This returns documents where age >= 18.

  1. $project - Select Specific Fields
  • Use $project to shape the output by including or excluding certain fields.

      // Query :
      const result = await User.aggregate([
        { $project: { name: 1, age: 1, totalOrders: { $size: "$orders" } } }
      ]);
    
      // Output :
      [
        {
          "_id": "6357f4c4e4b0a5c95f9d4e5a",
          "name": "Alice",
          "age": 25,
          "totalOrders": 2
        }
      ]
    

    This includes the name, age, and the number of orders.

  1. $unwind - Flatten an Array Field
  • Use $unwind to break down an array field (like orders or hobbies) into multiple documents.

      // Query :
      const result = await User.aggregate([
        { $unwind: "$hobbies" }
      ]);
    
      // Output :
      [
        { "_id": "6357f4c4e4b0a5c95f9d4e5a",
           "name": "Alice",
           "hobbies": "coding"
        },
        { "_id": "6357f4c4e4b0a5c95f9d4e5a",
           "name": "Alice",
           "hobbies": "reading"
        },
        { "_id": "6357f4c4e4b0a5c95f9d4e5a",
           "name": "Alice",
           "hobbies": "swimming"
        }
      ]
    

    Each hobby in the hobbies array becomes a separate document.

  1. $addFields - Add New Fields to the Output
  • Use $addFields to add computed fields to documents.

      // Query :
      const result = await User.aggregate([
        { $addFields: { isAdult: { $gte: ["$age", 18] } } }
      ]);
    
      // Output : 
      [
        {
          "_id": "6357f4c4e4b0a5c95f9d4e5a",
          "name": "Alice",
          "age": 25,
          "isAdult": true
        }
      ]
    

    Some fields were missing from your output after using the $addFields operator. Don't worry, this isn't a mistake but a result of how MongoDB's aggregation pipeline works. By default, each stage only passes along the fields specified or modified in that stage.

    To ensure all original fields are included along with your new field (like isAdult), you can use the $project operator to explicitly list the fields you want to keep.

      const result = await User.aggregate([
        {
          $addFields: {
            isAdult: { $gte: ["$age", 18] } // Adding a new field
          }
        },
        {
          $project: {  // Explicitly include all fields
            name: 1,
            age: 1,
            gender: 1,
            hobbies: 1,
            orders: 1,
            isAdult: 1
          }
        }
      ]);
    
      // Output: 
      [
        {
          "_id": "6357f4c4e4b0a5c95f9d4e5a",
          "name": "Alice",
          "age": 25,
          "gender": "female",
          "hobbies": ["coding", "reading", "swimming"],
          "orders": [
            { "product": "Laptop", "price": 1000 },
            { "product": "Phone", "price": 500 }
          ],
          "isAdult": true
        }
      ]
    
  1. $lookup - Join with Another Collection
  • The $lookup operator in MongoDB's aggregation pipeline allows you to perform joins between collections, similar to SQL joins. It is used to merge data from one collection with another based on a matching field, which helps in retrieving related data from multiple sources.

  • Syntax of $lookup:

      {
        $lookup: {
          from: "<foreignCollection>",
          localField: "<localField>",
          foreignField: "<foreignField>",
          as: "<outputField>"
        }
      };
    
  • from: The name of the foreign collection (the collection you want to join).

  • localField: The field from the current collection used to match with the foreignField.

  • foreignField: The field from the foreign collection to match with the localField.

  • as: The new field's name in the output will contain the matched documents from the foreign collection.

How $lookup Works:

$lookup performs a left outer join:

  • If there is a match, the documents from the foreign collection are embedded as an array.

  • If there is no match, the array will be empty.

Sample products Collection:

[
  { "_id": ObjectId("1"), "productName": "Laptop", "category": "Electronics" },
  { "_id": ObjectId("2"), "productName": "Phone", "category": "Electronics" }
]

Sample users Collection:

{
  "_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
  "name": "Alice",
  "age": 25,
  "gender": "female",
  "hobbies": ["coding", "reading", "swimming"],
  "orders": [
    { "product": "Laptop", "price": 1000 },
    { "product": "Phone", "price": 500 }
  ]
};

Now we Perform $lookup like this :

const result = await User.aggregate([
  {
    $lookup: {
      from: "products",  // Name of the collection to join
      let: { orderProducts: "$orders.product" },  // Define the products from orders array
      pipeline: [
        {
          $match: {
            $expr: { $in: ["$productName", "$$orderProducts"] }  // Match productName with products in orders
          }
        }
      ],
      as: "orderDetails"  // Store the matched products as 'orderDetails'
    }
  },
  {
    $project: {
      name: 1,
      age: 1,
      gender: 1,
      orders: 1,
      orderDetails: 1
    }
  }
]).toArray();

console.log(result);

Stage 1: $lookup

The $lookup stage performs a left outer join to a specified collection to filter documents from the products collection.

  • from: "products":

  • This specifies the name of the collection to join with. In this case, the products collection contains details about various products.

  • let: { orderProducts: "$orders.product" }:

  • This creates a variable orderProducts, which stores the product field from each document’s orders array. This variable is accessible within the $lookup pipeline.

  • pipeline: [...]:

  • This section allows you to define a series of operations to execute on the documents from the products collection. Here, it contains a $match stage.

  • Inside the Pipeline:

  • $match: Filters the documents from the products collection to include only those where the productName matches one of the products in the orderProducts array.

  • $expr: { $in: ["$productName", "$$orderProducts"] }:

  • $in: This operator checks if the productName (from the products collection) is present in the orderProducts array (defined earlier). This effectively allows for filtering based on the order's products.

  • as: "orderDetails":

  • This defines the name of the new field that will contain the matched documents from the products collection. The results of the $lookup will be stored in the orderDetails field of the output documents.

Stage 2: $project

The $project stage specifies which fields to include or exclude in the output documents.

  • $project: {...}:

  • This stage shapes the output document by including specific fields from the input documents.

  • Field specifications:

  • name: 1: Include the name field.

  • age: 1: Include the age field.

  • gender: 1: Include the gender field.

  • orders: 1: Include the orders field.

  • orderDetails: 1: Include the orderDetails field that was populated by the $lookup.

// Output of the Aggregation Query

[
  {
    "_id": ObjectId("6357f4c4e4b0a5c95f9d4e5a"),
    "name": "Alice",
    "age": 25,
    "gender": "female",
    "orders": [
      { "product": "Laptop", "price": 1000 },
      { "product": "Phone", "price": 500 }
    ],
    "orderDetails": [
      { "_id": ObjectId("1"), "productName": "Laptop", "category": "Electronics" },
      { "_id": ObjectId("2"), "productName": "Phone", "category": "Electronics" }
    ]
  }
]
  1. $sort - Sort Documents

    Use $sort to order documents by a field.

// Query :
const result = await User.aggregate([
  { $sort: { age: -1 } } // Sort by age in descending order
]);
// Output :
[
  {
    "_id": "6357f4c4e4b0a5c95f9d4e5a",
    "name": "Alice",
    "age": 25
  }
]

This sorts documents by age in descending order.

  1. $limit - Limit the Number of Results

    Use $limit to restrict the number of results returned.

     // Query: 
     const result = await User.aggregate([
       { $limit: 1 }
     ]);
    
     // Output :
     [
       {
         "_id": "6357f4c4e4b0a5c95f9d4e5a",
         "name": "Alice",
         "age": 25
       }
     ]
    
     // This limits the output to one document.
    
  2. $count - Count Documents

    Use $count to get the total number of documents in the pipeline.

     // Query: 
     const result = await User.aggregate([
       { $match: { age: { $gte: 18 } } },
       { $count: "totalAdults" }
     ]);
    
     // Output: 
     [
       { "totalAdults": 1 }
     ]
    
     // This counts the number of adults in the collection.
    

Aggregation Pipeline Optimization Tips:

  1. Use $match early: Filtering early reduces the data processed in later stages.

  2. Use indexes: Queries that use $match and $sort benefit from appropriate indexes.

  3. Avoid large $lookup joins: Joins with large collections can impact performance.

Summary:

The MongoDB aggregation pipeline is a powerful way to transform and analyze data within MongoDB. With operators like $match, $group, $sort, $unwind, and $lookup, you can perform a wide range of operations on your data efficiently. Learning to use these operators effectively helps you write optimized and meaningful queries for real-world use cases.

0
Subscribe to my newsletter

Read articles from AMIT directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

AMIT
AMIT