NXM 201

1. Mongodb - advance queries [https://students.masaischool.com/assignments/25108/problems/27000/121343]

6 Questions

Checkout the sample database Here [https://pastebin.com/raw/W1NgY8XQ]
Import this json into your mongodb collection.
Write following queries:
1. Find all the countries in region "Asia"

    db['mongodb-advance-queries'].find({
    "region":"Asia"
     });
----------------------------------------------------------------
2. Find all the countries with currency: "EUR"

          db['mongodb-advance-queries'].find({
            "currency":"EUR"
             })
-----------------------------------------------------------------------
3. Find all the countries whose timezone's gmtOffset is 3600
   -> Documentation reference - https://www.mongodb.com/docs/manual/tutorial/query-array-of-documents/

     db['mongodb-advance-queries'].find({
     "timezones.gmtOffset" : {$eq:3600}
      })
-------------------------------------------------------------------
4. Find all the countries whose timezone's gmtOffset is 3600 AND 
 timezone name is Central European Time (or CET)

// 4. Find all the countries whose timezone's gmtOffset is 3600 AND 
//  timezone name is Central European Time (or CET)
db['mongodb-advance-queries'].find({
   timezones: {
    $elemMatch:{
        tzName:"Central European Time",
        gmtOffset:3600
    }
   }
})

Why we should use $eleMatch ? 
To Ensure Both Conditions Are True for the Same Element:
You need to use the $elemMatch operator. This operator ensures 
that both conditions are true within the same element of the array.

Explanation of $elemMatch:
$elemMatch ensures that both tzName and gmtOffset are matched 
within the same element of the timezones array.

WHAT NOT TO DO HERE

You cannot query here using the below query

db['mongodb-advance-queries'].find({
    "timezones.tzName": { $eq: "Central European Time" },
    "timezones.gmtOffset": { $eq: 3600 }
})

because 
No, your query does check both conditions together,
 but it applies them independently across all elements of
 the timezones array. MongoDB's 
query engine treats arrays in a way that requires some clarification:

This query matches documents where:

At least one element in the timezones array has tzName 
equal to "Central European Time".
At least one element in the timezones array has gmtOffset equal to 3600.
These two conditions do not necessarily need to be true in
the same array element. For example, the query could return 
a document where:

One array element has tzName: "Central European Time" but a different 
gmtOffset.
Another array element has gmtOffset: 3600 but a different tzName.

-------------------------------------------------------------------------
5. Find All the countries with more than 1 time zone (hint: size of array)

// 5. Find All the countries with more than 1 time zone (hint: size of array)
db['mongodb-advance-queries'].find({
    $expr: {
        $gt: [{ $size: "$timezones" }, 1]
    }
})
--------------
 Explaination of the above query (5th query)
`db['mongodb-advance-queries']`

This part specifies the database collection you are querying. 
In this case, it is called mongodb-advance-queries.
The db object represents the database connection, and the
 collection is accessed using the bracket notation.

.find({...}):
The find method is used to retrieve documents from the 
specified collection that match the criteria defined 
in the query object.

$expr:
This operator allows you to use aggregation expressions
 within the query. It enables you to perform operations 
on the fields of the documents and evaluate them dynamically.
Without $expr, you would not be able to use aggregation 
operators like $size directly in the query.
$gt:

This is a comparison operator that stands for "greater than." 
It takes two arguments and returns true if the first argument 
is greater than the second.In this case, it is used to compare
 the size of the timezones array to the number 1.

$size:
This operator returns the number of elements in an array. In this query, it is applied to the timezones field of each document.
The expression {$size: "$timezones"} evaluates to the count 
of elements in the timezones array for each document.

[{ $size: "$timezones" }, 1]:
This array contains two elements: the first is the size of
 the timezones array for the current document, and the second is the
 number 1.The $gt operator will check if the size of the 
timezones array is greater than 1.


Summary of the Query's Purpose
The overall purpose of this query is to find and retrieve all
 documents (countries) in the mongodb-advance-queries collection
 that have more than one time zone. It does this by checking if 
the length of the timezones array for each document is greater 
than 1. If it is, that document will be included in the results 
returned by the query.


Example Scenario
For example, if you have a country document like this:

json
{
    "name": "Russia",
    "timezones": [
        {"zoneName": "Europe/Moscow"},
        {"zoneName": "Asia/Vladivostok"},
        {"zoneName": "Asia/Yekaterinburg"}
    ]
}
This document would match the query because the timezones 
array has three elements (zones), which is greater than one.

In contrast, a country document with a single time zone would not match, as its array size would not exceed one.
---------------------------------------
-----------------------------------------------------------------
6. Find All the countries with "Korean" translation (KR)

// 6.Find All the countries with "Korean" translation (KR)
db['mongodb-advance-queries']
.find({
    "translations.kr": {$exists: true}
})


---------------------------------------------------------
one sample data

  {
        "id": 1,
        "name": "Afghanistan",
        "iso3": "AFG",
        "iso2": "AF",
        "numeric_code": "004",
        "phone_code": "93",
        "capital": "Kabul",
        "currency": "AFN",
        "currency_name": "Afghan afghani",
        "currency_symbol": "؋",
        "tld": ".af",
        "native": "افغانستان",
        "region": "Asia",
        "subregion": "Southern Asia",
        "timezones": [
            {
                "zoneName": "Asia\/Kabul",
                "gmtOffset": 16200,
                "gmtOffsetName": "UTC+04:30",
                "abbreviation": "AFT",
                "tzName": "Afghanistan Time"
            }
        ],
        "translations": {
            "kr": "아프가니스탄",
            "pt-BR": "Afeganistão",
            "pt": "Afeganistão",
            "nl": "Afghanistan",
            "hr": "Afganistan",
            "fa": "افغانستان",
            "de": "Afghanistan",
            "es": "Afganistán",
            "fr": "Afghanistan",
            "ja": "アフガニスタン",
            "it": "Afghanistan",
            "cn": "阿富汗",
            "tr": "Afganistan"
        },
        "latitude": "33.00000000",
        "longitude": "65.00000000",
        "emoji": "🇦🇫",
        "emojiU": "U+1F1E6 U+1F1EB"
    }

Here are more datasets:
https://data.world/datasets/json
https://github.com/neelabalan/mongodb-sample-dataset
Bonus:

refer official docs for following operators.
Update the data using:
$set
$rename
$push $pop
$min $max
$unset
Feel free to update any data since this is bonus part

2. Mongo Aggregation-1

1. Mongo Aggregation-1 [5 question] [https://students.masaischool.com/assignments/25109/problems/27030/121344]

Go to https://media.mongodb.org/zips.json and download the large dataset of zips
https://www.mongodb.com/docs/manual/tutorial/aggregation-zip-code-data-set/
sample data 
 {
    "_id": "01001",
    "city": "AGAWAM",
    "loc": [
      -72.622739,
      42.070206
    ],
    "pop": 15338,
    "state": "MA"
  }
Write aggregation queries for following:

1. find largest population city in every state
//1. find largest population city in every state
    db.zips.aggregate([
       {
        $group: {
          _id: "$state",
          maxPop: { $max: "$pop" }
        }
       }
    ]);

-------------------------------------------------------------------
2. find average population per state by cities
  meaning of the question
Let’s break it down further:

The line "find average population per state by cities" means:

Group by state: Each state (state field) will have its own group.
Calculate the average population of the cities in that state: For each state, take the population (pop field) of all its cities and compute the average.
Example:
Data:
json

[
  { "state": "CA", "city": "Los Angeles", "pop": 100000 },
  { "state": "CA", "city": "San Diego", "pop": 50000 },
  { "state": "TX", "city": "Dallas", "pop": 80000 },
  { "state": "TX", "city": "Austin", "pop": 70000 }
]
Goal:
For California (CA), calculate the average of 100000 and 50000 → Result: 75000.
For Texas (TX), calculate the average of 80000 and 70000 → Result: 75000.
So the output will look like:

json
Copy code
[
  { "_id": "CA", "avgPop": 75000 },
  { "_id": "TX", "avgPop": 75000 }
]
The line emphasizes that you are averaging at the city level within each state.


// find average population per state by cities
// answer quer
db.zips.aggregate([
   {
    $group: {
      _id: "$state",
      averagePopulation: { $avg: "$pop" }
    }
   }
]);


---------------------------------------------------
3. find the lowest population city
    db.zips.aggregate([
        {
         $sort:{
          pop:1
         }
        },{
          $limit:1
        }
    ]);


Self Practice:
Learn about $project aggragation stage yourself
Learn about $count stage yourself

2. mongodb_aggregation-II 10 Question

[https://students.masaischool.com/assignments/25109/problems/30013/121345]

Instructions

dataset :link

Write an aggregation pipeline to get the total number of orders for each customer.

 // i wrote this (it works)

 db.orders.aggregate( [ { $unwind : "$orders" },{
   $group: {
     _id: "$name",
     totalQuantity : {$sum: "$orders.quantity"}
   }
 } ] )

 //chatgpt gave me this (didn't understand the nested $sum)
 db.orders.aggregate([
   { 
     $group: { 
       _id: "$name", 
       totalQuantity: { $sum: { $sum: "$orders.quantity" } }
     } 
   }
 ]);

Write an aggregation pipeline to get the total price of all orders for each customer.

  db.orders.aggregate([
   {$unwind: "$orders"},{
     $group: {
       _id: "$name",
       total: {$sum : { $multiply: [ "$orders.price", "$orders.quantity" ] }}
     }
   }
 ])

  // output
 /*
 [
   {
     "_id": "John",
     "total": 3000
   },
   {
     "_id": "Jane",
     "total": 1300
   },
   {
     "_id": "Bob",
     "total": 1000
   }
 ]
 */

Write an aggregation pipeline to get the average price of all orders for each customer.

 // query

 db.orders.aggregate([
   {
     $unwind: "$orders"
   },
   {
     $group: {
       _id: "$name",
       price: {
         $sum: {
           $multiply: [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       },
       total_quantity: {
         $sum: "$orders.quantity"
       }
     }
   },
   {
     "$addFields": {
       average_price_per_user: {
         $divide: [
           "$price",
           "$total_quantity"
         ]
       }
     }
   }
 ])

 // output
 [
   {
     "_id": "Bob",
     "average_price_per_user": 333.3333333333333,
     "price": 1000,
     "total_quantity": 3
   },
   {
     "_id": "John",
     "average_price_per_user": 1500,
     "price": 3000,
     "total_quantity": 2
   },
   {
     "_id": "Jane",
     "average_price_per_user": 433.3333333333333,
     "price": 1300,
     "total_quantity": 3
   }
 ]

 /// updated query to restrict average_price_per_user to 2 decimal place
 db.orders.aggregate([
   {
     $unwind: "$orders"
   },
   {
     $group: {
       _id: "$name",
       price: {
         $sum: {
           $multiply: [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       },
       total_quantity: {
         $sum: "$orders.quantity"
       }
     }
   },
   {
     $addFields: {
       average_price_per_user: {
         $round: [
           { $divide: ["$price", "$total_quantity"] }, 2
         ]
       }
     }
   }
 ]);
 // result
 [
   {
     "_id": "Bob",
     "average_price_per_user": 333.33,
     "price": 1000,
     "total_quantity": 3
   },
   {
     "_id": "John",
     "average_price_per_user": 1500,
     "price": 3000,
     "total_quantity": 2
   },
   {
     "_id": "Jane",
     "average_price_per_user": 433.33,
     "price": 1300,
     "total_quantity": 3
   }
 ]

Write an aggregation pipeline to get the highest price of all orders for each customer.

 //Write an aggregation pipeline to get
 //the highest price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "maxTotalAmount": {
         "$max": {
           "$multiply": [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       }
     }
   }
 ])

 //output
 [
   {
     "_id": "John",
     "maxTotalAmount": 2000
   },
   {
     "_id": "Jane",
     "maxTotalAmount": 1000
   },
   {
     "_id": "Bob",
     "maxTotalAmount": 800
   }
 ]

Write an aggregation pipeline to get the lowest price of all orders for each customer.

 //Write an aggregation pipeline to get the
 //lowest price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "minOrderAmount": {
         "$min": {
           "$multiply": [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       }
     }
   }
 ])

 // output
 [
   {
     "_id": "John",
     "minOrderAmount": 1000
   },
   {
     "_id": "Jane",
     "minOrderAmount": 300
   },
   {
     "_id": "Bob",
     "minOrderAmount": 200
   }
 ]

Write an aggregation pipeline to get the total number of orders and the total price of all orders for each customer.

 //Write an aggregation pipeline to get the
 // total number of orders and the total price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "total_no_of_orders": {
         "$sum": 1 //Each unwinded document represents one order
       },
       "total_price_of_orders": {
         "$sum": {
           "$multiply": [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       }
     }
   }
 ])

 // output
 [
   {
     "_id": "Bob",
     "total_no_of_orders": 2,
     "total_price_of_orders": 1000
   },
   {
     "_id": "Jane",
     "total_no_of_orders": 2,
     "total_price_of_orders": 1300
   },
   {
     "_id": "John",
     "total_no_of_orders": 2,
     "total_price_of_orders": 3000
   }
 ]

Write an aggregation pipeline to get the total number of orders and the average price of all orders for each customer.

 //Write an aggregation pipeline to get the
 //total number of orders and the average price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "total_no_of_orders": {
         "$sum": 1
       },
       "total_order_price": {
         "$sum": {
           "$multiply": [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       }
     }
   },
   {
     "$addFields": {
       "average_price_for_each_customer": {
         "$divide": [
           "$total_order_price",
           "$total_no_of_orders"
         ]
       }
     }
   }
 ])
 // output
 [
   {
     "_id": "John",
     "average_price_for_each_customer": 1500,
     "total_no_of_orders": 2,
     "total_order_price": 3000
   },
   {
     "_id": "Jane",
     "average_price_for_each_customer": 650,
     "total_no_of_orders": 2,
     "total_order_price": 1300
   },
   {
     "_id": "Bob",
     "average_price_for_each_customer": 500,
     "total_no_of_orders": 2,
     "total_order_price": 1000
   }
 ]

Write an aggregation pipeline to get the highest price and the lowest price of all orders for each customer.

 //Write an aggregation pipeline to get
 //the highest price and the lowest price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "highest_price_of_order": {
         "$max": "$orders.price"
       },
       "lowest_price_of_order": {
         "$min": "$orders.price"
       }
     }
   }
 ])

 // Output

 [
   {
     "_id": "John",
     "highest_price_of_order": 2000,
     "lowest_price_of_order": 1000
   },
   {
     "_id": "Jane",
     "highest_price_of_order": 500,
     "lowest_price_of_order": 300
   },
   {
     "_id": "Bob",
     "highest_price_of_order": 800,
     "lowest_price_of_order": 100
   }
 ]

Write an aggregation pipeline to get the total number of orders and the highest price of all orders for each customer.

 //Write an aggregation pipeline to get
 //the total number of orders and the highest price of all orders for each customer.
 db.collection.aggregate([
   {
     "$unwind": "$orders"
   },
   {
     "$group": {
       "_id": "$name",
       "total_number_of_orders": {
         "$sum": 1
       },
       "highest_price_of_all_orders": {
         "$max": {
           "$multiply": [
             "$orders.price",
             "$orders.quantity"
           ]
         }
       }
     }
   }
 ])

 // output
 [
   {
     "_id": "Bob",
     "highest_price_of_all_orders": 800,
     "total_number_of_orders": 2
   },
   {
     "_id": "Jane",
     "highest_price_of_all_orders": 1000,
     "total_number_of_orders": 2
   },
   {
     "_id": "John",
     "highest_price_of_all_orders": 2000,
     "total_number_of_orders": 2
   }
 ]

Write an aggregation pipeline to get the total number of orders, the total price, and the average price of all orders for each customer.

//Write an aggregation pipeline to get the
//total number of orders, the total price, and the average price of all orders for each customer
db.collection.aggregate([
  {
    "$unwind": "$orders"
  },
  {
    "$group": {
      "_id": "$name",
      "total_number_of_orders": {
        "$sum": 1
      },
      "total_price_of_all_orders": {
        "$sum": {
          "$multiply": [
            "$orders.price",
            "$orders.quantity"
          ]
        }
      }
    }
  },
  {
    "$addFields": {
      "average_price_of_all_orders": {
        "$divide": [
          "$total_price_of_all_orders",
          "$total_number_of_orders"
        ]
      }
    }
  }
])

// OUTPUT
[
  {
    "_id": "Bob",
    "average_price_of_all_orders": 500,
    "total_number_of_orders": 2,
    "total_price_of_all_orders": 1000
  },
  {
    "_id": "John",
    "average_price_of_all_orders": 1500,
    "total_number_of_orders": 2,
    "total_price_of_all_orders": 3000
  },
  {
    "_id": "Jane",
    "average_price_of_all_orders": 650,
    "total_number_of_orders": 2,
    "total_price_of_all_orders": 1300
  }
]

For each of the above questions, provide the MongoDB Aggregation Pipeline query that solves the problem.

3. Aggregations - II

1. Aggregation-Multiple-lookup-stages 6 Question

[https://students.masaischool.com/assignments/25110/problems/29014/121346]

Instructions

Create a sample dataset in a MongoDB collection that contains the following documents:

// users
{
   _id: 1,
   name: "John",
   age: 25,
   address: "123 Main St"
},
{
   _id: 2,
   name: "Jane",
   age: 35,
   address: "456 Main St"
},
{
   _id: 3,
   name: "Bob",
   age: 45,
   address: "789 Main St"
},
{
   _id: 4,
   name: "Alice",
   age: 55,
   address: "246 Main St"
}

Create a separate collection that contains information about the zip codes for each address. The structure of the documents in this collection should be as follows:

// zips
{
   _id: 1,
   address: "123 Main St",
   zipcode: "12345"
},
{
   _id: 2,
   address: "456 Main St",
   zipcode: "54321"
},
{
   _id: 3,
   address: "789 Main St",
   zipcode: "98765"
},
{
   _id: 4,
   address: "246 Main St",
   zipcode: "24680"
}

Create a third collection that contains information about the city and state for each zip code. The structure of the documents in this collection should be as follows:

// zipcodeinfo
{
   _id: 1,
   zipcode: "12345",
   city: "New York",
   state: "NY"
},
{
   _id: 2,
   zipcode: "54321",
   city: "Chicago",
   state: "IL"
},
{
   _id: 3,
   zipcode: "98765",
   city: "Los Angeles",
   state: "CA"
},
{
   _id: 4,
   zipcode: "24680",
   city: "Miami",
   state: "FL"
}

Create a fourth collection that contains information about the country for each state. The structure of the documents in this collection should be as follows:

// stateinfo
{
   _id: 1,
   state: "NY",
   country: "USA"
},
{
   _id: 2,
   state: "IL",
   country: "USA"
},
{
   _id: 3,
   state: "CA",
   country: "USA"
},
{
   _id: 4,
   state: "FL",
   country: "USA"
},
{
   _id: 5,
   state: "MH",
   country: "India"
},
{
   _id: 6,
   state: "DL",
   country: "India"
}

Use the $lookup operator to perform a join on the first and second collections, so that the resulting documents contain both the personal information from the first collection and the zip code information from the second collection.

 // QUERY
 db.users.aggregate([
   {$lookup: {
     from: "zips",
     localField: "address",
     foreignField: "address",
     as: "zips"
   }}
 ])

 // output
 [
   {
     "_id": 1,
     "name": "John",
     "age": 25,
     "address": "123 Main St",
     "zips": [
       {
         "_id": 1,
         "address": "123 Main St",
         "zipcode": "12345"
       }
     ]
   },
   {
     "_id": 2,
     "name": "Jane",
     "age": 35,
     "address": "456 Main St",
     "zips": [
       {
         "_id": 2,
         "address": "456 Main St",
         "zipcode": "54321"
       }
     ]
   },
   {
     "_id": 3,
     "name": "Bob",
     "age": 45,
     "address": "789 Main St",
     "zips": [
       {
         "_id": 3,
         "address": "789 Main St",
         "zipcode": "98765"
       }
     ]
   },
   {
     "_id": 4,
     "name": "Alice",
     "age": 55,
     "address": "246 Main St",
     "zips": [
       {
         "_id": 4,
         "address": "246 Main St",
         "zipcode": "24680"
       }
     ]
   }
 ]

Chain another $lookup again and again, so that the resulting document contains the personal info from the first collection, zip code info from the second collection, state from third collection and country from the fourth collection

//QUERY
db.users.aggregate([
  {$lookup: {
    from: "zips",
    localField: "address",
    foreignField: "address",
    as: "zips"
  }},{
    $lookup: {
      from: "zipcodeinfo",
      localField: "zips.zipcode",
      foreignField: "zipcode",
      as: "zipCodeInfo"

    }
  },{
    $lookup: {
      from: "stateinfo",
      localField: "zipCodeInfo.state",
      foreignField: "state",
      as: "stateAndCountryInfo"
    }
  }
])

// OUTPUT
[
  {
    "_id": 1,
    "name": "John",
    "age": 25,
    "address": "123 Main St",
    "zips": [
      {
        "_id": 1,
        "address": "123 Main St",
        "zipcode": "12345"
      }
    ],
    "zipCodeInfo": [
      {
        "_id": 1,
        "zipcode": "12345",
        "city": "New York",
        "state": "NY"
      }
    ],
    "stateAndCountryInfo": [
      {
        "_id": 1,
        "state": "NY",
        "country": "USA"
      }
    ]
  },
  {
    "_id": 2,
    "name": "Jane",
    "age": 35,
    "address": "456 Main St",
    "zips": [
      {
        "_id": 2,
        "address": "456 Main St",
        "zipcode": "54321"
      }
    ],
    "zipCodeInfo": [
      {
        "_id": 2,
        "zipcode": "54321",
        "city": "Chicago",
        "state": "IL"
      }
    ],
    "stateAndCountryInfo": [
      {
        "_id": 2,
        "state": "IL",
        "country": "USA"
      }
    ]
  },
  {
    "_id": 4,
    "name": "Alice",
    "age": 55,
    "address": "246 Main St",
    "zips": [
      {
        "_id": 4,
        "address": "246 Main St",
        "zipcode": "24680"
      }
    ],
    "zipCodeInfo": [
      {
        "_id": 4,
        "zipcode": "24680",
        "city": "Miami",
        "state": "FL"
      }
    ],
    "stateAndCountryInfo": [
      {
        "_id": 4,
        "state": "FL",
        "country": "USA"
      }
    ]
  },
  {
    "_id": 3,
    "name": "Bob",
    "age": 45,
    "address": "789 Main St",
    "zips": [
      {
        "_id": 3,
        "address": "789 Main St",
        "zipcode": "98765"
      }
    ],
    "zipCodeInfo": [
      {
        "_id": 3,
        "zipcode": "98765",
        "city": "Los Angeles",
        "state": "CA"
      }
    ],
    "stateAndCountryInfo": [
      {
        "_id": 3,
        "state": "CA",
        "country": "USA"
      }
    ]
  }
]

2. Aggregation sale transactions 2 Question [https://students.masaischool.com/assignments/25110/problems/29015/121347]

3. Aggregation Graph Lookup 1 Question

[https://students.masaischool.com/assignments/25110/problems/30037/121348]

4. Aggregation aggregation_population 6 Question

[https://students.masaischool.com/assignments/25110/problems/30034/121349]

Instructions

Dataset : link

Assume you are working on a project that requires you to analyze the population of cities in Massachusetts. Using the given dataset, write a MongoDB aggregation query that performs the following operations:

Filter out the cities with a population less than or equal to 5000.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   }
 ]);

 // sample output
 [
   {
     "_id": "01001",
     "city": "AGAWAM",
     "loc": [
       -72.622739,
       42.070206
     ],
     "pop": 15338,
     "state": "MA"
   },
   {
     "_id": "01002",
     "city": "CUSHMAN",
     "loc": [
       -72.51565,
       42.377017
     ],
     "pop": 36963,
     "state": "MA"
   },
   {
     "_id": "01007",
     "city": "BELCHERTOWN",
     "loc": [
       -72.410953,
       42.275103
     ],
     "pop": 10579,
     "state": "MA"
   }]

Group the remaining cities by state and calculate the total population for each state.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   },{
     $group: {
       _id: "$state",
       totalPopulation: {
         $sum: "$pop"
       },

     }
   }
 ]);

 // output
 [
   {
     "_id": "NH",
     "totalPopulation": 817338
   },
   {
     "_id": "MD",
     "totalPopulation": 4395935
   },
   {
     "_id": "VT",
     "totalPopulation": 292677
   },
   {
     "_id": "OH",
     "totalPopulation": 9723043
   },
   {
     "_id": "SD",
     "totalPopulation": 371091
   },..........]

Sort the resulting documents by the total population in descending order.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   },{
     $group: {
       _id: "$state",
       totalPopulation: {
         $sum: "$pop"
       },

     }
   },{
     $sort: {
       totalPopulation: -1

     }
   }
 ]);

 // OUTPUT

 [
   {
     "_id": "CA",
     "totalPopulation": 29009632
   },
   {
     "_id": "NY",
     "totalPopulation": 16462946
   },
   {
     "_id": "TX",
     "totalPopulation": 15415031
   },
   {
     "_id": "FL",
     "totalPopulation": 12251762
   },
   {
     "_id": "PA",
     "totalPopulation": 10225244
   },......]

Skip the first 2 documents in the sorted list.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   },{
     $group: {
       _id: "$state",
       totalPopulation: {
         $sum: "$pop"
       },

     }
   },{
     $sort: {
       totalPopulation: -1

     }
   },{
     $skip: 2
   }
 ]);

 // OUTPUT
 [
   {
     "_id": "CA",
     "totalPopulation": 29009632
   },
   {
     "_id": "NY",
     "totalPopulation": 16462946
   },
   {
     "_id": "TX",
     "totalPopulation": 15415031
   },
   {
     "_id": "FL",
     "totalPopulation": 12251762
   },
   {
     "_id": "PA",
     "totalPopulation": 10225244
   },
   {
     "_id": "IL",
     "totalPopulation": 10139378
   },...]

Limit the output to 2 documents.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   },{
     $group: {
       _id: "$state",
       totalPopulation: {
         $sum: "$pop"
       },

     }
   },{
     $sort: {
       totalPopulation: -1

     }
   },{
     $limit: 2
   }
 ]);
 // OUTPUT
 [
   {
     "_id": "CA",
     "totalPopulation": 29009632
   },
   {
     "_id": "NY",
     "totalPopulation": 16462946
   }
 ]

Project the output to include only the state and total population fields.

 db.zips.aggregate([
   {
     $match: {
       pop: { $gte: 5000 }
     }
   },{
     $group: {
       _id: "$state",
       totalPopulation: {
         $sum: "$pop"
       },

     }
   },{
     $sort: {
       totalPopulation: -1

     }
   },{
     $limit: 2
   }, {
     $project: {
       _id: 1, // Include the state field (_id holds the state)
       totalPopulation: 1 // Include the totalPopulation field
     }
   }
 ]);
 // OUTPUT
 [
   {
     "_id": "CA",
     "totalPopulation": 29009632
   },
   {
     "_id": "NY",
     "totalPopulation": 16462946
   }
 ]

Write the MongoDB aggregation query that accomplishes the above operations.

4 MongoDB Indexing

1. MongoDB Indexing 8 Questions

[https://students.masaischool.com/assignments/25313/problems/29070/121532]

Create a new collection.

Insert at least 10 documents into the collection, with each document containing at least 3 fields (e.g. name, age, location).

  db.location.insertMany([
    { name: "Alice", age: 25, location: "New York" },
    { name: "Bob", age: 30, location: "Los Angeles" },
    { name: "Charlie", age: 28, location: "Chicago" },
    { name: "Diana", age: 32, location: "Houston" },
    { name: "Ethan", age: 29, location: "San Francisco" },
    { name: "Fiona", age: 27, location: "Seattle" },
    { name: "George", age: 35, location: "Miami" },
    { name: "Hannah", age: 26, location: "Boston" },
    { name: "Ian", age: 31, location: "Atlanta" },
    { name: "Jasmine", age: 33, location: "Dallas" }
  ]);

Create a compound index on the collection that includes at least 2 fields (e.g. age and location).
```
  age:1, location:1
```
Use the explain() function to analyze the performance of a query that uses the compound index.
Create a single field index on a field that is not included in the compound index.
Use the explain() function to compare the performance of a query using the single field index to the performance of a query using the compound index.
Drop the single field index and create a multi-key index on an array field in one of the documents.

```json "Create a multi-key index on an array field in one of the documents":

A multi-key index is an index created on fields that store arrays. MongoDB uses a multi-key index to index each element in the array, which allows efficient queries on array contents.

For example, if one document contains a field tags with an array value:

{ name: "Alice", tags: ["developer", "teacher", "writer"] }

You can create a multi-key index on the tags field:

db.collection.createIndex({ tags: 1 });

This index allows queries like: db.collection.find({ tags: "developer" });


* Use the explain() function to analyze the performance of a query that uses the multi-key index.


2\. Aggregation\_date\_range 7 Questions

\[[https://students.masaischool.com/assignments/25313/problems/30245/121533](https://students.masaischool.com/assignments/25313/problems/30245/121533)\]

---

5\. Timers and Internals

1.aggregation\_advanceQueries-II - 5 Questions

\[[https://students.masaischool.com/assignments/25817/problems/30639/122146](https://students.masaischool.com/assignments/25817/problems/30639/122146)\]

### **Timers and Internals**

**assignmentInstructions**

Note : Write the schemas for all the queries and solve them on VSCODE.

1. Write an aggregation query that calculates the average number of items ordered per customer, broken down by country and year. Only include orders from the years 2019-2021.


```json
// SAMPLE DATA
{
  "_id": ObjectId("..."),
  "customer_id": ObjectId("..."),
  "items": [
    { "product_id": ObjectId("..."), "quantity": 2, "price": 10.00 },
    { "product_id": ObjectId("..."), "quantity": 1, "price": 20.00 }
  ],
  "timestamp": ISODate("2021-05-01T10:00:00Z"),
  "country": "USA"
}

// SOLUTION

Write an aggregation query that calculates the total revenue for each product category, broken down by quarter. Include the product category, quarter, and total revenue in the output.

{
  "_id": ObjectId("..."),
  "product_id": ObjectId("..."),
  "customer_id": ObjectId("..."),
  "quantity": 5,
  "price": 100.00,
  "transaction_date": ISODate("2022-02-15T10:00:00Z"),
  "category": "Electronics"
}

Write an aggregation query that calculates the average salary for each department, broken down by gender. Only include employees who were hired in the last 5 years.

{
  "_id": ObjectId("..."),
  "name": "John Doe",
  "department": "Marketing",
  "gender": "Male",
  "salary": 75000,
  "hire_date": ISODate("2020-01-01T00:00:00Z"),
  "termination_date": ISODate("2022-03-15T00:00:00Z")
}

Write an aggregation query that calculates the total value of inventory for each product, broken down by warehouse and year. Only include inventory items that have a quantity greater than 0.

{
  "_id": ObjectId("..."),
  "product_id": ObjectId("..."),
  "warehouse_id": ObjectId("..."),
  "quantity": 100,
  "unit_price": 5.00,
  "timestamp": ISODate("2022-03-01T00:00:00Z")
}

Write an aggregation query that calculates the average response time of each API endpoint, broken down by hour of the day and day of the week. Only include logs from the last 30 days.

{
  "_id": ObjectId("..."),
  "endpoint": "/api/users",
  "response_time": 200,
  "timestamp": ISODate("2022-04-20T10:00:00Z")
}

6. CI/CD assignment - 4 Question

[https://students.masaischool.com/assignments/26136?tab=assignmentDetails]

NXM 301

7.SQL Assignment Day 2 [https://students.masaischool.com/assignments/25056?tab=assignmentDetails]

1. BANK Problem -> 20 Problems

Insert the data for the following

Retrieve all the records from the table BORROW where the amount in between 2000 and 3000
Retrieve the details from the table DEPOSIT
Retrieve the customer name, account no of DEPOSIT
Retrieve the name of the customer living in NAGPUR.
Retrieve the name of the customers who opened accounts after 17-NOV-95.
Retrieve the account number and amount of the customer having an account opened between 01-12-95 and 01-06.96.
Retrieve all the records from the table DEPOSIT where CNAME begins with C.
Retrieve all the records from the table BORROW where the 2nd character of CNAME is U.
Retrieve all the records from the table DEPOSIT where the branch name is CHANDNI or MG ROAD.
Retrieve all the records from the table DEPOSIT where the branch name is not CHANDNI or MG ROAD.
Retrieve all the records from DEPOSIT where the amount > 1000 and arrange the customer name in ascending order.
Retrieve all the records from DEPOSIT where the amount > 1000 and arrange the customer name in descending order.
Retrieve customer details from the BORROW table where the third character of the customer name is either ‘A’ or ‘D’.
Retrieve all the records from the table BORROW where the amount is not between 2000 and 8000.
Find out the unique records from the table DEPOSIT
Update the amount of all depositors in the deposit table by giving them 10% interest (i.e. amount=amount * 0.1) where the branch is VRCE.
Update the amount of all depositors in the deposit table by giving them 10% interest where the branch is VRCE and the customer name is ANIL.