How to Find the Missing Number in an Array

In an array containing numbers from 1 to N, where one number is missing, the goal is to find the missing number. Here, we'll discuss four different methods to achieve this, ranging from brute force to optimal approaches.

This approach involves checking for each number from 1 to N whether it exists in the array.

Implementation:

// Solution-1: Brute Force Approach (using Nested Loop & Linear Search)
// Time Complexity: O(n*n)
// Space Complexity: O(1)
int findMissingNumber(vector<int> &arr, int n)
{
    // Outer loop that runs from 1 to N
    for (int i = 1; i <= n; i++)
    {
        // `flag` variable to check if an element exists
        bool flag = false;

        // Search the element using Linear Search
        for (int j = 0; j < n - 1; j++)
        {
            if (arr[j] == i)
            {
                flag = true;
                break;
            }
        }

        // Check if the element is missing
        if (!flag)
        {
            return i;
        }
    }

    // If loop completes without finding a missing number (shouldn't happen)
    // It is just to avoid warnings.
    return -1;
}

Logic:

  1. Outer Loop: Iterate through numbers from 1 to N.

  2. Linear Search: For each number, check if it exists in the array using a nested loop.

  3. Flag Check: Use a flag to indicate if the number was found. If not, return the number as it is the missing one.

Time Complexity: O(n²)

  • Explanation: The outer loop runs from 1 to N and for each iteration of the outer loop, the inner loop runs up to N-1. This results in a time complexity of O(n * (n-1)), which simplifies to O(n²).

Space Complexity: O(1)

  • Explanation: Only a few additional variables (like flag) are used, which do not depend on the size of the input array.

Example:

  • Input: arr = [1, 2, 4, 6, 3, 7, 8], n = 8

  • Output: 5

  • Explanation: The number 5 is missing from the array.


Solution 2: Better Approach (Using Hashing)

This approach uses a hash table to track the presence of numbers in the array.

Implementation:

// Solution-2: Better Approach (using Hashing)
// Time Complexity: O(n) + O(n) ~ O(2n)
// Space Complexity: O(n)
int findMissingNumber(vector<int> &arr, int n)
{
    // Hash table to store the presence of numbers
    vector<int> hash(n + 1, 0);

    // Mark the presence of elements in the hash table
    for (int i = 0; i < n - 1; i++)
    {
        hash[arr[i]] = 1;
    }

    // Find the missing number by iterating through the hash table
    for (int i = 1; i <= n; i++)
    {
        if (hash[i] == 0)
        {
            return i;
        }
    }

    // If loop completes without finding a missing number (shouldn't happen)
    // It is just to avoid warnings.
    return -1;
}

Logic:

  1. Initialize Hash Table: Create a hash table to track numbers from 1 to N.

  2. Mark Presence: Iterate through the array, marking the presence of each number in the hash table.

  3. Find Missing Number: Iterate through the hash table to find the number that is not marked.

Time Complexity: O(n)

  • Explanation: The first loop runs through the array of size N-1 to populate the hash table and the second loop runs through the hash table of size N to find the missing number. Thus, the overall time complexity is O(n) + O(n-1), which simplifies to O(n).

Space Complexity: O(n)

  • Explanation: An additional hash table of size N is used to keep track of the presence of each number.

Example:

  • Input: arr = [1, 2, 4, 6, 3, 7, 8], n = 8

  • Output: 5

  • Explanation: The number 5 is not present in the hash table.


Solution 3: Optimal Approach 1 (Summation Approach)

This approach uses the sum formula for the first N natural numbers to find the missing number.

Implementation:

// Solution-3: Optimal Approach 1 (Summation Approach)
// Time Complexity: O(n)
// Space Complexity: O(1)
int findMissingNumber(vector<int> &arr, int n)
{
    // Sum of all numbers from 1 to N
    int sumOfNNums = (n * (n + 1)) / 2;

    int sumOfArr = 0;

    // Calculate the actual sum of elements in the array
    for (int i = 0; i < n - 1; i++)
    {
        sumOfArr += arr[i];
    }

    int missingNum = sumOfNNums - sumOfArr;

    return missingNum;
}

Logic:

  1. Sum of N Numbers: Calculate the sum of numbers from 1 to N using the formula sum = (n * (n + 1)) / 2.

  2. Sum of Array: Calculate the sum of elements in the array.

  3. Find Missing Number: Subtract the sum of the array from the sum of N numbers to get the missing number.

Time Complexity: O(n)

  • Explanation: The loop runs through the array once to calculate the sum of the elements, resulting in a time complexity of O(n).

Space Complexity: O(1)

  • Explanation: Only a few additional variables (like sumOfNNums and sumOfArr) are used, which do not depend on the size of the input array.

Example:

  • Input: arr = [1, 2, 4, 6, 3, 7, 8], n = 8

  • Output: 5

  • Explanation: The sum of numbers from 1 to 8 is 36 and the sum of the array is 31. Thus, the missing number is 36 - 31 = 5.


Solution 4: Optimal Approach 2 (XOR Approach)

This approach uses the XOR operation to find the missing number efficiently.

Implementation:

// Solution-4: Optimal Approach 2 (XOR Approach)
// Time Complexity: O(n)
// Space Complexity: O(1)
int findMissingNumber(vector<int> &arr, int n)
{
    int xor1 = 0;
    int xor2 = 0;

    for (int i = 0; i < n - 1; i++)
    {
        // XOR of array elements
        xor2 ^= arr[i];

        // XOR up to [1...N-1]
        xor1 ^= (i + 1);
    }

    // XOR up to [1...N]
    xor1 ^= n;

    // The missing number
    return xor1 ^ xor2;
}

Logic:

  1. XOR Array Elements: Compute the XOR of all elements in the array.

  2. XOR Up to N: Compute the XOR of all numbers from 1 to N.

  3. Find Missing Number: XOR the results from steps 1 and 2. The result will be the missing number.

Time Complexity: O(n)

  • Explanation: The first loop runs through the array of size N-1 to calculate the XOR of the elements and the second loop runs up to N to calculate the XOR of the range from 1 to N. Thus, the overall time complexity is O(n-1) + O(n), which simplifies to O(n).

Space Complexity: O(1)

  • Explanation: Only a few additional variables (like xor1 and xor2) are used, which do not depend on the size of the input array.

Example:

  • Input: arr = [1, 2, 4, 6, 3, 7, 8], n = 8

  • Output: 5

  • Explanation: XOR of all array elements and XOR of numbers from 1 to 8 will result in the missing number, which is 5.


Comparison

  • Brute Force Approach:

    • Pros: Simple and easy to understand.

    • Cons: Inefficient with O(n²) time complexity.

  • Hashing Approach:

    • Pros: More efficient with O(n) time complexity.

    • Cons: Uses extra space for the hash table.

  • Summation Approach:

    • Pros: Very efficient with O(n) time complexity and O(1) space complexity.

    • Cons: Might face integer overflow for very large values of N.

  • XOR Approach:

    • Pros: Efficient with O(n) time complexity and O(1) space complexity.

    • Cons: Slightly more complex to understand compared to summation approach.

Edge Cases

  • Empty Array: If the input array is empty, the missing number should be 1.

  • Single Element Missing: When the array contains only one element, either 1 (missing 2) or 2 (missing 1), handle by checking and returning the missing number.

  • All Elements Present: If the array contains all elements from 1 to N without any missing number (should not happen if constraints guarantee a missing number), return a sentinel value like -1.

  • Duplicated Elements: This scenario is not considered because the problem constraints specify that all elements are distinct.

Additional Notes

  • Integer Overflow: For the summation approach, consider the risk of integer overflow when calculating the sum of the first N natural numbers. In practical implementations, ensure the language or environment can handle large integer values.

  • Data Types: Choose appropriate data types to handle the range of input values, especially for large N.

  • Performance Considerations: While the brute force approach is easy to implement, it is not suitable for large arrays due to its O(n²) time complexity. The hashing and XOR approaches are preferable for large datasets.

  • In-place Modifications: For space efficiency, the XOR approach modifies the input array in place. Ensure that this behavior is acceptable in the context of the problem or use case.

  • Understanding XOR: The XOR approach leverages the property that a ^ a = 0 and a ^ 0 = a. It effectively cancels out duplicate elements and isolates the missing number. This method is both efficient and elegant but requires understanding of bitwise operations.

Conclusion

Finding the missing number in an array of size N-1 can be achieved using various methods, from simple brute force to optimal XOR-based solutions. Depending on the constraints and requirements, one can choose the most suitable approach.


0
Subscribe to my newsletter

Read articles from Mahbub Alam Masum directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mahbub Alam Masum
Mahbub Alam Masum