How do computers see us?

Tanush GolwalaTanush Golwala
6 min read

'...........' go the self-driving cars these days...

With the introduction of technological innovations in the automotive sector, one is frequently left wondering how these vehicles manage to operate.

You guessed it: ‘COMPUTER VISION’ is the answer.

Introduction

Computer Vision is a field of computer science that focuses on computers to identify and understand various dimensions among images and videos. It is a branch of Artificial Intelligence that enables computers to interpret and analyze the visual world.

Since its inception, Computer vision has found its way into multiple facets of day-to-day human activities. From detecting intrusions in surveillance video in Israel, and monitoring mine equipment in China to rapid face detection in Japan.

Israel has used Computer Vision to develop an intrusion detection system.

The image shows how Israel has used Computer Vision to develop an intrusion detection system.

What does the past look like?

It is highly implausible that the concept of computer vision originated more than half a century ago, in 1963, when Larry Roberts, known as the "Father of Computer Vision", discussed the possibility of extracting 3D geometrical data from 2D perspective images.

However, the real breakthrough in Computer Vision was seen in 2001, when two MIT researchers came up with the well-known ‘ViolaJones Algorithm’ which revolutionized the field of face detection thus complementing the applications of CV.

In the previous decades, computer vision has further seen strides both from a development and application point of view due to the interest of tech giants like Google, Meta, Apple, and Tesla tapping into its immense capabilities. Governments from many nations, especially developing nations like India, are increasingly supporting its development and deployment, turning it into a race.

Now, let us have a look at one of the most widely used library of CV

OpenCV

OpenCV (Open-Source Computer Vision Library) is an open-source computer vision and machine learning software library. OpenCV was created to expedite the use of machine perception in commercial goods and to offer a common foundation for computer vision applications.

The library has more than 2500 optimized algorithms and can be used in various languages, a few of which include Python, JavaScript, C++, and Java. It has been under active development since 2011 and keeps constantly receiving updates.

What can you do with OpenCV?

Image Processing

Before getting into image processing, we shall look at how images are perceived by the OpenCV library.

Each digital image can be represented as a 3-dimensional NumPy array. Where each dimension stores a value ranging from 0 to 255 in RGB format. The combination of these three values available at each pixel is what gives the human eye a perception of color.

Representation of an image in terms of an array representing RGB values

Representation of an image in terms of an array representing RGB values

Effectively once you read an image using the OpenCV library, according to its presets it reads the data in BGR(blue, green, red) form. This needs to be corrected using the cvtColor function’s inbuilt attribute provided by the OpenCV library for converting BGR to RGB.

Notice how the index correction helps CV interpret the same image differently

Blurring and Sharpening

Blurring and sharpening can be achieved in computer vision using a Gaussian Blurring filter. It is achieved by creating a kernel that acts as a filter over the original image. In image processing, a kernel or mask is a small matrix used for blurring, sharpening, embossing, and edge detection.

To put it simply, it defines each pixel's output about its neighboring pixels. It's referred to as kernel. The threshold setting and matrix size will determine how much blurring or sharpening occurs. Sliding the kernel over each pixel in the image, carrying out the convolution procedure, and substituting the weighted sum for the original pixel value is how a kernel is applied to an image. An image that has been blurred is the outcome of this technique being repeated for each pixel in the original image.

Let us make an image using OpenCV

Creating a kernel for it,

Applying the kernel to achieve the blurring effect,

This demonstrates how the sharpness of a picture can be altered by performing basic mathematical operations on its pixel values.

While blurring and sharpening is one of the most widely used applications of image processing in computer vision. A lot more can be done with highly optimized libraries present in the OpenCV library.

Edge detection works by detecting discontinuities in brightness and helps us enhance object tracking and detection when it comes to video processing using OpenCV.

Video Processing

Computer vision is a tool used in the field of video that allows computers and systems to extract useful information from digital photos, videos, and other visual inputs. Based on that information, the computers and systems can then act or offer recommendations.

Typically, computer vision is used in conjunction with another library while analyzing videos. The two most popular ones are TensorFlow and Mediapipe.

MediaPipe:

Mediapipe is a cross-platform library developed by Google that provides amazing ready-to-use ML solutions for computer vision tasks. A flexible collection of pre-built modules for computer vision and machine learning activities is provided by Mediapipe. Face detection, position estimation, object tracking, hand tracking, and other features are among the essential modules. By offering pre-made solutions for challenging visual challenges, these modules expedite the creation of apps and save developers the time and work involved in creating them from the ground up.

Let us take the example of MediaPipe Hands.

For every recognition run, the Gesture Recognizer creates a gesture detection result object. Hand landmarks in picture coordinates, handedness (left/right hand), and hand gesture categories of the detected hands are all contained in the output item.

MediaPipe plots the coordinates of each of the 21 hand landmarks are x, y, and z. The depth at the wrist serves as the origin, and the landmark depth is represented by the z coordinate. The landmark is closer to the camera the smaller the value.

Representation of HandLandMark Recognition with OpenCV and Mediapipe

Similarly, if we look at a video stream, an object detection model can identify which of a known set of objects might be present and provide information about their positions within the image.

Vehicle Identifiers used in streets of New York and Chicago

What does the Future hold?

Healthcare

In medical imaging, computer vision will be essential for helping with early disease detection. Vital sign monitoring and medical scan analysis are two of its benefits. Companies like AINexus Healthcare are always working to improve their respective models and have already begun employing computer vision for diagnoses.

Security

Security operations will be strengthened by improvements in object tracking and facial recognition technologies, which will help businesses and public safety organizations identify and mitigate hazards more successfully. Data from a huge network of cameras that covers most of its population is collected by China's sophisticated facial recognition system.

Smart cities:

With the help of computer vision, traffic, garbage, and infrastructure maintenance, urban settings can be made safer and more effective. Google subsidiaries like ‘Sidewalk Labs’ have already started developing an ecosystem for such cities

Bibliography:

https://opencv.org/about/

https://www.edlitera.com/en/blog/posts/computer-vision-edge-computing#mcetoc_1g2q47gt6c

https://people.csail.mit.edu/sparis/bf_course/slides/02_gaussian_blur.pdf

https://cds.cern.ch/record/400313/files/p21.pdf

11
Subscribe to my newsletter

Read articles from Tanush Golwala directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tanush Golwala
Tanush Golwala