Introduction

Imagine you're driving late at night on a quiet highway, feeling exhausted after a long day. Your eyelids grow heavier, and before you know it, your focus starts to waver. Drowsy driving is one of the leading causes of road accidents globally, responsible for thousands of injuries and fatalities every year. In fact, studies suggest that drowsy driving is just as dangerous as drunk driving, yet it often goes unnoticed until it's too late.

This alarming problem inspired me to create a real-time drowsiness detection system, designed to alert drivers before they lose focus. By combining the power of machine learning and computer vision, I developed a system that uses live webcam feeds to monitor a driver’s eye movements and detect signs of fatigue in real time.

In this blog, I’ll take you through the entire process of building this project—from understanding the problem and selecting the right technology to implementing the solution and overcoming challenges. Whether you're interested in machine learning, computer vision, or real-world problem-solving, this blog will give you valuable insights into creating impactful AI applications.

Problem Statement

Drowsy driving is a critical issue that endangers not only the driver but also passengers and other road users. According to studies, driver fatigue accounts for a significant percentage of road accidents globally, with devastating consequences. Unlike drunk driving, which can be tested with a breathalyzer, drowsiness is harder to detect, making it a silent but equally dangerous threat.

Current methods for combating drowsy driving include monitoring head nods or lane deviations, but these can be imprecise or slow to respond. This inspired me to explore a more proactive and accurate solution—detecting drowsiness directly through facial features, specifically the eyes.

The goal of this project is simple yet powerful: to develop a real-time drowsiness detection system that leverages machine learning and computer vision to detect signs of fatigue early and issue timely alerts. Such a system could be deployed in vehicles, workplaces, or other settings where fatigue could lead to dangerous outcomes.

Overview of the Approach

To tackle the problem of drowsiness detection, I divided the project into three main stages:

Data Preparation:
The foundation of any machine learning project is a well-prepared dataset. For this project, I used an existing dataset containing labeled images of open and closed eyes. To ensure the model could handle various real-world scenarios, I applied data augmentation techniques such as flipping, rotation, and scaling to enhance the diversity of the training data.
Model Selection and Training:
For detecting drowsiness, I chose MobileNet, a lightweight convolutional neural network (CNN) optimized for real-time applications. MobileNet's architecture is efficient yet powerful, making it ideal for systems with limited resources. The model was trained to classify eye states (open or closed) with high accuracy.
Real-Time Integration:
To make the system functional in real-world scenarios, I integrated the trained model with OpenCV, a popular computer vision library. OpenCV processes live webcam feeds, detects facial landmarks, and identifies the eyes. The model continuously monitors the driver’s eye state, triggering an alert if drowsiness is detected.

By combining these components, I built a robust pipeline capable of real-time drowsiness detection. Below, I’ll dive deeper into each stage, starting with the dataset and preprocessing.

Dataset and Preprocessing

For this project, I used the MRL Eye Dataset, a publicly available dataset. This dataset contains images of human eyes categorized as either open or closed, captured under varying conditions. It includes data from 37 different individuals, ensuring diversity in terms of eye shapes, sizes, and facial features. You can access the dataset.

Data Organization

To streamline the training process, I manually separated the dataset into two folders:

One containing images of open eyes.
The other containing images of closed eyes.

This manual categorization helped ensure the dataset was well-structured for training the machine learning model. It also made it easier to balance the classes and apply augmentation techniques effectively.

Key Steps in Preprocessing:

Resizing and Normalization:
All images were resized to 224×224 pixels to maintain uniformity. Pixel values were normalized to a range of 0 to 1 to improve the model’s convergence during training.
Data Augmentation:
To improve the model's ability to generalize to real-world scenarios, I applied augmentation techniques such as:
- Horizontal flipping.
- Random rotation and zooming.
- Adjustments to brightness and contrast.
Balancing the Dataset:
After organizing the dataset into separate folders, I ensured there was an equal number of images for both open and closed eyes to prevent class imbalance issues.
Splitting Data:
The organized dataset was split into training, validation, and test sets in a 70-20-10 ratio. This ensured a fair evaluation of the model's performance on unseen data.

Challenges Faced:

Manual Categorization: Organizing the images into separate folders was time-consuming but crucial for creating a structured dataset.
Hardware Limitations:
Due to limited GPU capacity, my computer was unable to train the model on the entire dataset. To address this, I selected a subset of the dataset for training and testing, ensuring that the sample was diverse enough to retain meaningful insights while optimizing computational efficiency.
Real-World Variability:
Despite the dataset's diversity, real-world scenarios like individuals wearing glasses or low-light environments introduced additional challenges. Data augmentation techniques helped the model generalize better to such scenarios.
Model Deviation:
During testing, I observed that the model occasionally detected other objects (such as facial features or background elements) and incorrectly classified them, leading to a slight deviation in predictions. Unfortunately, I was unable to fully address this issue within the scope of this project. However, I plan to explore solutions, such as fine-tuning the model or using additional preprocessing techniques, in future iterations.

Model Training and Results

Training the model involved a combination of convolutional layers from the MobileNet architecture to effectively classify eye states as open or closed. Due to limited GPU capacity, I trained the model on a subset of the MRL Eye Dataset. Despite this limitation, the model achieved an impressive accuracy of 91% on the validation set.

Performance Context

Although a 91% accuracy may seem moderate compared to benchmarks, it is noteworthy considering that only a fraction of the dataset was used for training. This result demonstrates the model's ability to learn meaningful patterns even with limited data and computational resources.

Code Walkthrough

Step 1: Importing Required Libraries

import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

Step 2: Loading and Preprocessing the Dataset

def get_data(directory):
    """
    Loads and preprocesses the dataset from the given directory.

    - Converts images to grayscale
    - Resizes them to 24x24 pixels
    - Normalizes pixel values
    - Labels images as 'Open' or 'Closed'

    Parameters:
        directory (str): Path to the dataset

    Returns:
        numpy array: Preprocessed image data and corresponding labels
    """
    categories = ['Closed', 'Open']
    data = []

    for category in categories:
        path = os.path.join(directory, category)
        label = categories.index(category)

        for img in os.listdir(path):
            img_path = os.path.join(path, img)
            try:
                img_arr = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)  # Convert to grayscale
                resized_arr = cv2.resize(img_arr, (224, 224))  # Resize to 24x24 pixels
                data.append([resized_arr, label])
            except Exception as e:
                print(f"Error loading image {img_path}: {e}")

    return np.array(data, dtype=object)

# Load dataset
data = get_data('dataset_path')  # Replace with the actual dataset path

Step 3: Splitting the Data into Train, Validation, and Test Sets

# Prepare features (X) and labels (y)
X, y = [], []

for feature, label in data:
    X.append(feature)
    y.append(label)

# Convert lists to numpy arrays
X = np.array(X).reshape(-1, 224, 224, 1) / 255.0  # Normalize pixel values
y = np.array(y)

# Split data into training (70%), validation (20%), and test (10%) sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=1/3, random_state=42)

# Convert labels to categorical format (One-hot encoding)
y_train = to_categorical(y_train, 2)
y_val = to_categorical(y_val, 2)
y_test = to_categorical(y_test, 2)

Step 4: Defining the CNN Model

def build_model():
    """
    Builds a Convolutional Neural Network (CNN) for eye state classification.

    Returns:
        model (Sequential): Compiled CNN model
    """
    model = Sequential([
        Conv2D(32, (3,3), activation='relu', input_shape=(224, 224, 1)),
        MaxPooling2D((2,2)),

        Conv2D(64, (3,3), activation='relu'),
        MaxPooling2D((2,2)),

        Flatten(),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(2, activation='softmax')  # Output layer for binary classification
    ])

    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Build the model
model = build_model()

Step 5: Training the Model

# Train the CNN model
history = model.fit(
    X_train, y_train, 
    validation_data=(X_val, y_val),
    epochs=10,  # Adjust based on performance
    batch_size=32
)

Step 6: Evaluating the Model

# Evaluate on the test dataset
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.2f}")

# Plot training history
plt.figure(figsize=(12,5))

# Loss plot
plt.subplot(1,2,1)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training & Validation Loss')
plt.legend()

# Accuracy plot
plt.subplot(1,2,2)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training & Validation Accuracy')
plt.legend()

plt.show()

Step 7: Making Predictions

def predict_eye_state(image_path, model):
    """
    Predicts whether an eye is open or closed using the trained CNN model.

    Parameters:
        image_path (str): Path to the image
        model (Sequential): Trained CNN model

    Returns:
        str: Prediction result ("Open" or "Closed")
    """
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    img = cv2.resize(img, (224, 224))
    img = img.reshape(1, 224, 224, 1) / 255.0  # Normalize

    prediction = model.predict(img)
    class_index = np.argmax(prediction)

    return "Open" if class_index == 1 else "Closed"

# Example prediction
sample_image = "path_to_test_image.jpg"  # Replace with an actual image path
result = predict_eye_state(sample_image, model)
print(f"Predicted Eye State: {result}")

How to improve model performance

1. Improving Model Performance (Optional)

You can explain how to improve model performance by:

Data Augmentation: Adding more variability to your dataset.
Tuning Hyperparameters: Adjusting batch size, learning rate, or the number of epochs.
Adding More Layers: Including additional convolutional or dense layers.
Transfer Learning: Using a pre-trained model for better results.
Dataset split: You can split the dataset into 70-20-10.

Conclusion

In this project, I built a simple yet effective Convolutional Neural Network for classifying eye states (open vs. closed). While the model performed well, real-world scenarios introduce challenges such as varying lighting conditions, occlusions (e.g., glasses, reflections), and diverse facial structures. Future improvements could include integrating temporal models like LSTMs or GRUs to differentiate between natural blinks and prolonged eye closure, refining the dataset with more diverse samples, and optimizing for edge devices.

This model has practical applications in real-time systems, such as drowsiness detection for drivers or accessibility tools for visually impaired users. With further tuning and data augmentation, the accuracy and robustness can be significantly enhanced. If you're interested in improving or deploying this system, feel free to experiment and contribute!

Drowsiness Detection using OpenCV & Computer Vision