The Simple Math Powering Machine Learning: Euclidean Distance
Draw two points on a piece of paper and label them A and B.
Use a ruler to connect them with a straight line, which represents the shortest distance between the two points known - that’s Euclidean distance.
This simple idea is used in everything from navigating a city to machines recognizing images. In this article, we'll explore how this basic concept powers many of the technologies we use daily, from finding similar images to making movie recommendations.
A Walk Along a Line
Let's say that you're on a straight road. You and your friend are standing next to a mile/kilometric marker (depending on which country you are in).
You are at marker 2.
Your friend is at marker 8.
To find out how far apart you are, you subtract your position from your friend's:
codeDistance = |8 - 2| = 6 miles/kms
So, you're 6 miles/kms apart.
Navigating a City Grid
Imagine you're in a city laid out like a grid (see diagram below). Streets run east-west, and avenues run north-south.
You're at the corner of 1st Street and 1st Avenue.
Your destination is at 4th Street and 4th Avenue.
To get there, you need to:
Count how many blocks north or south you need to go: From 1st Avenue to 4th Avenue is 3 blocks south.
Count how many blocks east or west you need to go: From 1st Street to 4th Street is 3 blocks east.
Now, imagine a straight line cutting diagonally across the blocks: This line represents the Euclidean distance.
Even though you can't walk through buildings to take this diagonal path, the Euclidean distance tells you the shortest possible distance between your starting point and your destination. This concept has many other uses.
For example, we looked at how to use Azure for searching and finding similar images in an other article: https://tjgokken.com/vector-image-search-with-azure-ai
We used Euclidean distance to determine what type of image we were looking at. The two images we compared were of forests.
So, how is the Euclidean distance calculated?
Image Vector Search
In our Image Search application, we extracted tags from the images The first image might have tags like "forest," "trees," and "outdoor," and the second image might have similar tags. These tags are turned into numbers (or vectors), representing how confident the system is that each tag applies to the image (like 0.9 for 'forest,' 0.8 for 'trees'). The Euclidean distance is then calculated between these numbers (vectors), helping the system determine how 'similar' or 'different' the images are.
If we compare two images, their Euclidean distance tells us how "similar" or "different" they are, based on those numbers.
For example, if the numbers for both images are very close (say one image has "forest" with 0.9 confidence and the other has "forest" with 0.85), the Euclidean distance will be small, meaning the images are quite similar.
The Euclidean distance is intuitive because we think in straight line terms.
There is, of course, a mathematical formula behind this but there is no need for that in this article. What matters is to remember that Euclidean distance is like using a tape measure to find the shortest path between two points.
More Real-World Examples of Machine Learning
Euclidean distance plays a crucial role in helping machines make sense of data across many dimensions. Music recommendation systems or even when an app is suggesting you which restaurant to choose use Euclidean distance using many dimensions.
For example, when recommending a movie, the system compares your preferences (like genre, actors, and directors) to those of other users using Euclidean distance. The smaller the distance, the more likely you'll enjoy the same movies.
In machine learning, data points like customers, products, or even words in a document can be represented as vectors. Euclidean distance helps group similar data points together. For example, in customer segmentation, it helps identify customers with similar spending patterns, age, or interests, so businesses can offer them targeted promotions.
All in all, machines love Euclidean Distance because it;
Quantifies Similarity: Machines need numbers to compare things, and Euclidean distance provides a clear metric.
Simplifies Complex Data: By reducing multiple features into a single distance value, machines can make faster decisions.
Provides Versatile Application: Used in image recognition, natural language processing, and more.
Conclusion: A Simple Concept with Big Impact
Euclidean distance is the simplest way to measure how close or far things are. This straightforward concept plays a huge role in powering the technology around us—from navigation apps to music recommendations. Understanding it gives us a glimpse into how machines make decisions that impact our daily lives.
Subscribe to my newsletter
Read articles from TJ Gokken directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by