Make Your First Computer Vision Project: A Step-by-Step Guide

Have you ever wondered how computers can "see" and process images? Computer vision is the exciting field that gives machines the ability to interpret and understand visual information, just like humans do. In this beginner-friendly guide, I'll walk you through creating your very first computer vision project using Python.

What You'll Build

In this tutorial, we'll create a simple computer vision application that can:

  • Load and display images

  • Convert images to grayscale

  • Apply Gaussian blur for noise reduction

  • Detect edges in images

  • Detect faces in photographs

By the end, you'll have a working application that you can use to process your own images and even deploy online for others to use!

Prerequisites

  • Basic Python knowledge

  • A computer with Python installed

  • Enthusiasm to learn!

Setting Up Your Environment

Let's start by setting up our project environment. We'll create a folder for our project and set up a virtual environment to keep our dependencies organized.

  1. Create a project folder and navigate to it:

     mkdir computer_vision_project
     cd computer_vision_project
    
  2. Create and activate a virtual environment:

     python -m venv venv
    

    On Windows:

     venv\Scripts\activate
    

    On macOS/Linux:

     source venv/bin/activate
    
  3. Install required libraries:

     pip install opencv-python numpy matplotlib
    

Building the Basic Image Processing Script

Now, let's create our first script that will handle basic image processing techniques.

  1. Create a new file called image_processing.py:

  2. Add the following code:

     import cv2
     import numpy as np
     import matplotlib.pyplot as plt
    
     # Load an image
     def load_image(image_path):
         img = cv2.imread(image_path)
         # Convert from BGR to RGB (OpenCV loads images in BGR format)
         img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
         return img
    
     # Display the image
     def display_image(image, title="Image"):
         plt.figure(figsize=(10, 8))
         plt.imshow(image)
         plt.title(title)
         plt.axis('off')
         plt.show()
    
     # Apply grayscale conversion
     def convert_to_grayscale(image):
         return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    
     # Apply Gaussian blur
     def apply_gaussian_blur(image, kernel_size=(5, 5)):
         return cv2.GaussianBlur(image, kernel_size, 0)
    
     # Apply edge detection
     def detect_edges(image, threshold1=100, threshold2=200):
         return cv2.Canny(image, threshold1, threshold2)
    
     # Main function
     def main():
         # Path to your image
         image_path = "sample_image.jpg"
    
         # Load the image
         original_image = load_image(image_path)
    
         # Display original image
         display_image(original_image, "Original Image")
    
         # Convert to grayscale
         gray_image = convert_to_grayscale(original_image)
         display_image(gray_image, "Grayscale Image")
    
         # Apply Gaussian blur
         blurred_image = apply_gaussian_blur(gray_image)
         display_image(blurred_image, "Blurred Image")
    
         # Detect edges
         edges = detect_edges(blurred_image)
         display_image(edges, "Edge Detection")
    
     if __name__ == "__main__":
         main()
    
  3. Download a sample image from the internet or use one of your own photos, and save it as sample_image.jpg in your project folder.

  4. Run the script:

     python image_processing.py
    

    You should see four images displayed one after another: the original image, grayscale version, blurred version, and edge detection result.

Adding Face Detection

Now, let's add face detection capabilities to our project:

  1. Create a new file called face_detection.py:

  2. Add the following code:

     import cv2
     import numpy as np
     import matplotlib.pyplot as plt
    
     def load_image(image_path):
         img = cv2.imread(image_path)
         img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
         return img, img_rgb
    
     def detect_faces(image):
         # Load the pre-trained face detector
         face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    
         # Convert to grayscale for face detection
         gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    
         # Detect faces
         faces = face_cascade.detectMultiScale(gray, 1.1, 4)
    
         return faces
    
     def draw_faces(image, faces):
         # Create a copy of the image
         image_with_faces = image.copy()
    
         # Draw rectangles around detected faces
         for (x, y, w, h) in faces:
             cv2.rectangle(image_with_faces, (x, y), (x+w, y+h), (255, 0, 0), 2)
    
         return image_with_faces
    
     def display_image(image, title="Image"):
         plt.figure(figsize=(10, 8))
         plt.imshow(image)
         plt.title(title)
         plt.axis('off')
         plt.show()
    
     def main():
         # Path to your image with faces
         image_path = "people_image.jpg"
    
         # Load the image
         original_image, image_rgb = load_image(image_path)
    
         # Display original image
         display_image(image_rgb, "Original Image")
    
         # Detect faces
         faces = detect_faces(original_image)
    
         # Draw rectangles around faces
         image_with_faces = draw_faces(image_rgb, faces)
    
         # Display image with detected faces
         display_image(image_with_faces, "Detected Faces")
    
         print(f"Found {len(faces)} faces!")
    
     if __name__ == "__main__":
         main()
    
  3. Download or find an image with faces and save it as people_image.jpg in your project folder.

  4. Run the face detection script:

     python face_detection.py
    

    You should see two images displayed: the original image and the same image with blue rectangles drawn around detected faces.

Deploying with Streamlit

Now that we have our basic functionality working, let's create a web application using Streamlit to make our project interactive and shareable.

  1. Install Streamlit:

     pip install streamlit
    
  2. Create a new file called app.py:

  3. Add the following code:

     import streamlit as st
     import cv2
     import numpy as np
     from PIL import Image
     import io
    
     def load_image(image_file):
         img = Image.open(image_file)
         return np.array(img)
    
     def convert_to_grayscale(image):
         return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    
     def apply_gaussian_blur(image, kernel_size=5):
         return cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
    
     def detect_edges(image, threshold1=100, threshold2=200):
         if len(image.shape) == 3:
             image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
         return cv2.Canny(image, threshold1, threshold2)
    
     def detect_faces(image):
         # Load the pre-trained face detector
         face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    
         # Convert to grayscale for face detection
         if len(image.shape) == 3:
             gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
         else:
             gray = image
    
         # Detect faces
         faces = face_cascade.detectMultiScale(gray, 1.1, 4)
    
         return faces
    
     def draw_faces(image, faces):
         # Create a copy of the image
         image_with_faces = image.copy()
    
         # Draw rectangles around detected faces
         for (x, y, w, h) in faces:
             cv2.rectangle(image_with_faces, (x, y), (x+w, y+h), (255, 0, 0), 2)
    
         return image_with_faces
    
     # Set up the Streamlit app
     st.set_page_config(page_title="Computer Vision Explorer", layout="wide")
    
     st.title("Computer Vision Explorer")
     st.write("Upload an image to apply different computer vision techniques!")
    
     # File uploader
     uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
    
     if uploaded_file is not None:
         # Load and display the original image
         image = load_image(uploaded_file)
    
         st.subheader("Original Image")
         st.image(image, caption="Uploaded Image", use_column_width=True)
    
         # Create a sidebar for options
         st.sidebar.title("Options")
    
         # Select image processing technique
         technique = st.sidebar.selectbox(
             "Select a technique",
             ["Grayscale", "Gaussian Blur", "Edge Detection", "Face Detection"]
         )
    
         # Parameters based on selected technique
         if technique == "Gaussian Blur":
             kernel_size = st.sidebar.slider("Kernel Size", 1, 15, 5, step=2)
         elif technique == "Edge Detection":
             threshold1 = st.sidebar.slider("Threshold 1", 0, 300, 100)
             threshold2 = st.sidebar.slider("Threshold 2", 0, 300, 200)
    
         # Apply the selected technique
         st.subheader(f"Result: {technique}")
    
         if technique == "Grayscale":
             result_image = convert_to_grayscale(image)
             st.image(result_image, caption="Grayscale Image", use_column_width=True)
    
         elif technique == "Gaussian Blur":
             result_image = apply_gaussian_blur(image, kernel_size)
             st.image(result_image, caption="Blurred Image", use_column_width=True)
    
         elif technique == "Edge Detection":
             result_image = detect_edges(image, threshold1, threshold2)
             st.image(result_image, caption="Edge Detection", use_column_width=True)
    
         elif technique == "Face Detection":
             faces = detect_faces(image)
             result_image = draw_faces(image, faces)
             st.image(result_image, caption=f"Detected {len(faces)} faces", use_column_width=True)
    
  4. Run the Streamlit app:

     streamlit run app.py
    

    Your browser should open, showing your interactive computer vision application.

Deploying to Streamlit Cloud

Let's deploy our app to Streamlit Cloud so others can use it:

  1. Create a requirements.txt file:

     pip freeze > requirements.txt
    
  2. Create a GitHub repository and push your code:

     git init
     git add .
     git commit -m "Initial commit"
     git branch -M main
     git remote add origin https://github.com/yourusername/computer-vision-project.git
     git push -u origin main
    
  3. Sign up for Streamlit Cloud at https://streamlit.io/cloud and connect your GitHub account.

  4. Deploy your app:

    • Click "New app"

    • Select your repository

    • Choose the main branch

    • Set the main file path to app.py

    • Click "Deploy"

  5. Share your app with the provided URL.

What's Happening Behind the Scenes?

Let's break down what's happening in our code:

Image Processing Basics

  • Grayscale Conversion: Converting a color image to grayscale reduces it to a single channel, which is often the first step in many computer vision algorithms.

  • Gaussian Blur: This technique reduces noise and detail in an image by applying a Gaussian filter.

  • Edge Detection: The Canny edge detector identifies edges in an image by looking for areas with rapid intensity changes.

Face Detection

We're using a pre-trained Haar Cascade classifier, which is a machine learning-based approach that has been trained to recognize facial features. The classifier looks for specific patterns in the image that correspond to faces.

Streamlit Web App

Streamlit turns our Python script into a web application with minimal code. It handles:

  • File uploads

  • User interface components (sliders, dropdowns)

  • Image display

  • Interactive parameter adjustment

Github Repo:

Click here for github repo

Conclusion

Congratulations! You've successfully built your first computer vision project that can process images in various ways and even detect faces. You've also deployed it as a web application that anyone can use.

This project provides a foundation for more advanced computer vision applications. You could extend it by:

  • Adding more image processing techniques

  • Implementing object detection

  • Creating a real-time webcam version

  • Training your own custom models

The field of computer vision is vast and growing rapidly. I hope this tutorial has given you a taste of what's possible and inspired you to explore further!

Resources for Further Learning

Happy coding!

0
Subscribe to my newsletter

Read articles from Abhishek Mukherjee directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Mukherjee
Abhishek Mukherjee

Myself Abhishek Mukherjee, a full-stack web developer who's passionate about exploring new technologies and learning new skills. With a solid foundation in HTML, CSS, JavaScript, Bootstrap, React.js, Node.js, and MongoDB, I've developed a wide range of web applications and have become proficient in both front-end and back-end development. Recently, I've been diving into the world of DevOps and automation, learning Python and Bash scripting and familiarizing myself with popular DevOps tools like Ansible, Jenkins, and Docker. As a strong believer in the power of open-source software, I'm always looking for opportunities to contribute to projects on GitHub and other platforms. In addition to my technical skills, I'm also a passionate writer and enjoy sharing my knowledge and experience with others through blog posts and tutorials. I find it incredibly rewarding to help others learn and grow in their own development journeys. When I'm not coding or writing, I can usually be found exploring new technologies, attending tech meetups, or practicing my photography skills. I believe that continuous learning and self-improvement are key to success in any field, and I'm excited to continue growing my skills and contributing to the world of web development and DevOps.