Pixels and Promises: The Ongoing Saga of Computer Vision
How awesome are driverless cars? Teslas can park in tricky situations and navigate cities on their own. We also found it fascinating to see those futuristic technologies in spy/sci-fi films like Mission Impossible's Ethan Hunt or James Bond. Google Photos recognizes you and your friends automatically in your albums, and you can unlock your phone simply by putting your face on display. Sounds fun, right?
However, have you ever given any thought to how these features work behind the scenes? How do machines see our world differently than what we see in real life?
I will explain how everything operates. But instead of studying the technicalities, in this blog, we will discover how it all started, where we are right now, and how far we have to go.
So what is Computer Vision or the eyes of machines?
Computer vision is like giving magical eyes to computers. Just as we use our eyes to understand the world, computer vision enables machines to “see” and make sense of visual information- images, videos, and more.
It is a field within artificial intelligence (AI) that focuses on enabling computers to extract meaningful information from visual inputs (like images and videos) and tries to understand the environment.
But what was the need to develop it?
Humans rely heavily on vision to understand the world. We can recognize objects, interpret scenes, and make decisions based on what we see. For machines to interact with the world effectively, they needed a similar ability- to “see” and understand visual information.
Our everyday production of enormous volumes of visual data has increased with the introduction of digital cameras, cell phones, and surveillance systems. Many tasks involve visual data: sorting products on assembly lines, inspecting medical images, or identifying defects in manufacturing. Automating these tasks using computer vision improves efficiency, reduces errors, and saves time.
So where did it all begin?
The Past: Pioneering Era
1700s - 1900s: Early Developments in Light and Vision :
From the 1700s to the early 1900s, scientists were captivated by light and its behavior. Photography has emerged as a powerful tool for studying motion, capturing stars, and unraveling the mysteries of vision.
In 1884, Kodak introduced the first camera system, marking a significant milestone in visual technology.
1950s: Hubel and Wiesel: Pioneers of Visual Perception and Computer Vision:
Neurophysiologists David Hubel and Torsten Wiesel’s groundbreaking work in the 1950s and 1960s revealed key principles of early visual processing.
Their discoveries, including specialized neurons, hierarchical feature processing, and receptive fields, directly influenced the development of computer vision algorithms.
1959: The VIDICON Tube: Pioneering Digital Scanning for Computer Vision :
By converting optical images into electrical signals, it enabled computers to digitize visual information.
This breakthrough laid the groundwork for Computer Vision, powering applications like object recognition and pattern analysis
1963: Lawrence G. Roberts: Pioneering 3D Reconstruction in Computer Vision :
the “Blockworld” program, an early exploration of deriving 3D representations from 2D images.
By using edge detection and hypothesis testing, Roberts laid the groundwork for essential Computer Vision concepts.
His work emphasized edge detection, 3D reconstruction, and hypothesis-driven approaches, cornerstones of modern Computer Vision.
1967: The Secret History of Facial Recognition:
Woodrow W. Bledsoe and I. Kanter developed a facial recognition system using edge detection and feature matching.
Their work marked an early success in computer vision-based face recognition.
1966: Marvin Minsky’s Impact on Computer Vision :
His book “Perceptrons” highlighted the limitations of single-layer neural networks.
This led to a shift toward multilayer networks and paved the way for modern deep learning in Computer Vision.
1979: Neocognitron: Pioneering Neural Networks for Computer Vision (1979):
Kunihiko Fukushima introduced the Neocognitron, a neural network inspired by the human visual system.
It excelled at local feature extraction and introduced translation invariance.
1980s-1990s: Object Recognition with Machine Learning:
During this period, researchers delved into object recognition and scene understanding using machine learning techniques.
Key milestones included the “Cascade-Correlation” neural network, the Scale-Invariant Feature Transform (SIFT) algorithm, and the popularity of Gaussian Mixture Models (GMM) for visual data modeling.
2000s: SVMs and Viola-Jones Algorithm:
Support Vector Machines (SVMs) rose to prominence for object recognition.
the Viola-Jones framework, powered by AdaBoost, revolutionized real-time face detection.
2000s-2010s: The Deep Learning Revolution:
Convolutional Neural Networks (CNNs) emerged as superheroes in computer vision. They rocked image classification tasks, thanks to their ability to learn intricate features from pixels.
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) acted as the grand stage for showcasing these neural wonders. It pushed the boundaries of what was possible in visual recognition.
Iconic architectures like AlexNet, VGGNet, and ResNet strutted onto the scene, each with its own swagger and impact.
The Present: Where we have reached?
2010s-Present: Advancement in Deep Learning and Beyond:
Transfer learning techniques, such as fine-tuning pre-trained models, are becoming prevalent in computer vision.
Generative Adversarial Networks (GANs) are used to generate realistic images and videos.
Attention mechanisms, as in Transformer models, have been applied to computer vision tasks.
Specialized hardware, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), has accelerated the training of deep neural networks for computer vision.
The Future: Where to go and how much?
As we peer into the horizon of technological possibility, computer vision stands tall—a sentinel of pixels and algorithms. Its promise? Nothing short of transformative:
Hardware Marvels: Imagine GPUs and TPUs as cosmic engines, propelling us faster, and deeper into the visual cosmos. These silicon marvels will fuel our quest for clarity, speed, and precision.
AI Fusion: Picture a grand ballroom where computer vision waltzes with natural language processing. They’ll converse, decode, and harmonize, bridging the gap between pixels and prose. Together, they’ll unravel mysteries and create symphonies of understanding.
Interdisciplinary Symphony: Across fields, experts gather, an orchestra of engineers, artists, and ethicists. Their instruments? Curiosity, collaboration, and creativity. They’ll compose safety nets, design ethical frameworks, and sculpt a world where pixels serve humanity.
Guided by Ethical Stars: In this celestial voyage, we navigate through the constellations of ethics and regulation. No reckless leaps; just purposeful strides. Computer vision, like a celestial navigator, charts a course toward safety, efficiency, and accessibility.
So, my friend, keep your eyes wide open. The canvas of tomorrow awaits, and computer vision holds the brush.
Subscribe to my newsletter
Read articles from ADITYA BHATTACHARYA directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
ADITYA BHATTACHARYA
ADITYA BHATTACHARYA
A fervent and a passionate fellow for Computer Science and Research, trying to bridge the gap between real life and code life. From print("Hello World ! ") to writing Scalable Language Codes, I am yet to discover the emerging future of AI in our forecasting life of Modern Technology.