The Math Behind ML – Week 4 Vectors - Introduction


Linear Algebra is the study of vectors and certain rules to manipulate vectors [From our previous blog here]. In this post, we’ll dive deeper into what a vector really is and learn more about how it’s used in various domains like Physics, Mathematics, and Machine Learning.
Vector
A vector [Abstract Type] is a variable which has both magnitude(value) and direction.
$$Vector = \vec{v}$$
Let’s look at magnitude like fuel to a car. It determines how long the car can run. Just like that, Magnitude determines the length of the vector. You can also look at vector as a difference between two points in a plane.[As per the image above, it’s difference between (0,0),(2,3) ].
$$\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} Magnitude \\ Direction\end{bmatrix}$$
Vector can be anywhere in a plane. In Linear algebra, vectors usually starts from origin (0,0) for convenience and moves in any direction from origin. But a vector can start anywhere.
Scalar
When doing operations on a vector, we add, subtract, multiply vector with a number. When we do those kinds of operations, we scale the vectors as shown in the image. The number can also be called as scalar. We use number and scalar interchangeably. Scalar don’t’t have a direction and we use it all the time.
How do we represent the vectors ?
There are three ways to represent the vectors:
Geometric representation : Arrows in space [Like the images above].
Algebraic representation : Arrays of numbers (like [1,2,3]).
Functional representation : Used in more advanced settings [We will learn more about it in the upcoming blogs].
Why do we need vectors and what’s the big deal about them ?
Vectors are at our disposal
Now that we know why do we need them, let’s look at the types of vectors to get an idea about when to use them. There are several types of vectors in mathematics and machine learning. Here’s the list:
Vectors In Mathematics
2D, 3D and n-D Vectors
A 2D vector lies in a 2D plane a flat 2 Dimensional plane. One along X-axis and one along Y-axis.
$$\vec{v} = \begin{bmatrix} x \\ y \end{bmatrix}$$
A 3D vector lies in a 3D plane it adds Z-component to the space.
$$\vec{v} = \begin{bmatrix} x \\ y \\z \end{bmatrix}$$
An n-dimensional (n-D) vector generalizes this idea. Instead of 2 or 3 components, it has n components, where n can be any positive integer.
$$\vec{v} = \begin{bmatrix} x \\ y \\ \vdots \\ n \end{bmatrix}_{n \times 1}$$
We can’t visualize N-Dimensional vectors beyond 3D directly but, we will use them a lot in machine learning.
Zero vector
A Zero vector is a special vector where all components are zero. It has no direction and zero magnitude.
$$\vec{0} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},\begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix},\begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}_{n\times1}$$
We use zero vectors in several places such as null spaces, solutions to homogeneous systems, conditions for linear independence which we will look into in future blogs.
Unit vector
Unit vector has a magnitude of 1 [magnitude=1]. But it keeps its direction. Commonly used to represent pure direction.
$$\vec{v} = \begin{bmatrix} 0 \\ 1\end{bmatrix}$$
Most common places we use unit vectors is in physics, Computer graphics, Machine learning(Normalizing features)
Tidbit
Why do we normalize a vector ? :
We normalize a vector to get a unit vector for the following reasons:
Simplifying calculations.
Representing direction.
Ensuring consistent scaling. [For features in ML preventing dominant features from overwhelming others during model training, especially in algorithms relying on distance or angle calculations.]
A vector can be normalized by using the formula :
$$\vec{v} = \frac{\vec{v}}{\|\vec{v}\|}$$
Where ||V|| is norm or magnitude of the vector. We use this formula to calculate the Norm :
$$Norm : \|\vec{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}$$
$$v_1,v_2,…v_n \space are\space components\space inside\space the\space vector.$$
Orthogonal vectors
Two vectors are said to be orthogonal if they are perpendicular to each other. [i.e: Mathematically, their dot product is zero.]
$$\vec{a} \cdot \vec{b} = 0$$
$$\vec{a} = \begin{bmatrix} 1 \\ 2\end{bmatrix}, \vec{b} = \begin{bmatrix} 2 \\ -1\end{bmatrix}$$
Their dot product will be
$$\vec{a}\cdot\vec{b} = 1\cdot2+2\cdot(-1) = 2-2 = 0$$
These vectors are linearly independent.
Uses :
In ML, orthogonal features reduce correlation and improve model performance.
In geometry, they define axes and coordinate systems.
In linear algebra, they are the foundation of orthonormal bases, QR decomposition, etc…
Orthonormal vectors :
We can call a set of vectors orthonormal if they satisfies these two conditions:
Each vector has unit length (i.e.,
norm = 1
)Every pair of distinct vectors is orthogonal (i.e.,
dot product = 0
)$$\text{A set of vectors } { \vec{v}_1, \vec{v}_2, \dots, \vec{v}_n } \text{ is orthonormal if:}$$
$$\|\vec{v}_i\| = 1 \quad \text{for all } i$$
$$\vec{v}_i \cdot \vec{v}_j = 0 \quad \text{for all } i \neq j$$
Uses :
To simplify many linear algebra computations.
Used in QR decomposition, Fourier transforms, and PCA.
In machine learning, orthonormal feature spaces help in dimensionality reduction and de correlation.
Parallel vectors
Two vectors are parallel if they point in the same direction or exactly opposite direction. This means the vector is a scalar multiple of the other.
$$\vec{a} = k \space \vec{b} \space | \space k∈R$$
$$\vec{a} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \quad , \quad \vec{b} = \begin{bmatrix} 2 \\ 4 \end{bmatrix} \space ; \space \\ \text{Clearly, } \vec{a} = 2 \cdot \vec{b}$$
How do these contribute :
In geometry, they define directional consistency.
In physics, they represent aligned forces or velocities.
In ML and linear algebra, if feature vectors are parallel, they are linearly dependent, which reduces model robustness.
Equal Vectors
Two vectors are equal if their magnitude and direction are same and each components in the two vectors are identical.
$$\vec{a} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} ;\space \vec{b} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} ⇒ \vec{a} = \vec{b}$$
Column vs Row vector
Column vector is a vector which is written normally. Most common in linear algebra and matrix operations. A vector which is written horizontally, we can also say transpose of column vector.
In 3D, whether you represent a vector as a column vector or a row vector is a matter of convention and doesn't fundamentally change the representation of the vector itself.
$$\vec{a} = \begin{bmatrix} x \\ y \end{bmatrix} _{Column\space vector}$$
$$\vec{v} = \begin{bmatrix} x & y & z \end{bmatrix}_{Row\space vector}$$
Position vector
Position vector helps us to find the location of one object relative to another object in a vector space. Position vectors usually start at the origin and then terminate at any other arbitrary point. Thus, these vectors are used to determine the position of a particular point with reference to its origin. So, Technically, a vector starts at origin.
If point P(2,4), Q(3,5) then :
$$\vec{OP} = \begin{bmatrix} 2 \\ 4 \end{bmatrix} ; \space \vec{OQ} = \begin{bmatrix} 3 \\ 5 \end{bmatrix}$$
Displacement vector
Displacement vector denotes the change in position from one point to another. It tells you how far is one point to another. If you move from point A to point B, the displacement vector is :
$$\vec{d} = \vec{B} - \vec{A}$$
$$\text{If} \quad \vec{A} = \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \quad \vec{B} = \begin{bmatrix} 5 \\ 7 \end{bmatrix} \text{Then} \quad \vec{d} = \vec{B} - \vec{A} = \begin{bmatrix} 5 - 2 \\ 7 - 3 \end{bmatrix} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$$
It’s different from the position vector, which describes a point’s location relative to the origin. Displacement vector focuses on the relative change between two points.
Uses :
Physics: To analyze motion, velocity, and acceleration — all depend on how position changes over time, which is captured by displacement.
Robotics & Navigation: To figure out the shortest path or the direction to move from one location to another.
Computer Graphics & Animation: To move objects smoothly from one position to another.
Machine Learning & Data Science: Sometimes displacement vectors help quantify changes or differences between data points in feature space.
Basis Vector
A basis vector is one of a set of vectors that are both linearly independent and span a vector space. In simple terms, basis vectors are the building blocks of a vector space — any vector in that space can be expressed as a unique combination (sum) of these basis vectors. So, it’s basis for any vector formation.
Why are basis vectors important?
They provide a coordinate system for the vector space.
Every vector in the space can be represented as a linear combination of basis vectors.
The number of basis vectors defines the dimension of the space.
$$\vec{i} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad \vec{j} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$$
$$\text{In 2D space, the basis vectors are usually:} \\$$
$$\text{Any vector} \quad \vec{v} = \begin{bmatrix} x \\ y \end{bmatrix} \quad \text{can be written as:} \\$$
$$\vec{v} = x \vec{i} + y \vec{j}$$
Co-vectors : [ Also known as dual vector or linear functional]
Co-vectors is a mathematical concept where it maps a vector to a scalar. Co-vectors live in dual space V*
and vectors live in vector space V
. Dual space contains set of functions which takes vectors and generate a scalar. To sum it up, Co-vector will extract the number from a vector.
Use cases :
Making predictions : When you train an ML model, you’ll get predictions like “This house is worth $300,000“. This is a scalar. How do we know this from X number of features, We multiply data vector by a co-vector to extract meaningful value.
Measure alignment [Weights] : Covectors measure how much your vector "aligns" with a specific direction.
Say you're a hiring an ML Engineer and you want to know: "How much does this candidate’s experience align with what I value?".
You set your weights to reflect what matters to you (e.g., weight for coding, ML, NLP, etc.).
You dot them with the candidate’s skills (vector).
You get a score: a scalar that tells you how strongly aligned this candidate is.
To check similarity in NLP and Computer Vision:
- If you are querying for that vector in a vector space. The output will be the resulting vector. We determine the similarity based on the scalar output. Higher the scalar output, more similar the vector is. This is how search engines, chat bots, and CLIP (image-language models) work.
Eigen vectors
Eigen vectors are special cases of vectors that when linear transformation is applied to it, like multiplying with matrix, it doesn’t change direction. It only gets stretched or compressed.
$$A\vec{v} = \lambda\vec{v}$$
Usually when you rotate and stretch vectors using matrices, most vectors will change direction and length, few vectors don’t change direction. Those are Eigen vectors.
Useful At :
Dimensionality reduction [PCA]
In neural networks
Graph Theory
Gradient vectors :
[One of the most important concepts in calculus, optimization and machine learning]
The gradient vector of a function tell us which direction the function increases the fastest and how steep the increase is.
Uses :
- Used heavily in machine learning for optimization [Gradient descent]
Vectors In Machine Learning
Quick descriptions of available vectors in Machine Learning. We will look at them in the following articles
Feature vector : Vector representing input features of a single data point.
Target vector : Expected output values corresponding to the input features (also called as label vector).
Weight vector : Model parameters used to scale or transform feature vectors during training.
Gradient vector : The gradient vector of a function tell us which direction the function increases the fastest and how steep the increase is.
Parameter vector : collects all trainable parameters of a model into single vector for efficient updates.
Prediction vector : output vector produced by the model, represiengint predicted values or probabilities for each class.
One-hot and multi-hot vectors :
one hot :
Exactly one element is 1 rest are 0, used to encode categorical data.multi-hot :
multiple elements can be 1, representing multi-label categoriesSparse and dense vectors : Most elements are zero in sparse vector and most elements are non-zero in dense vector
Word embedding vector : Numeric representation of words in a continuous vector space, capturing semantic meaning [Ex: word2vec,GloVe].
TF-IDF Vector : Represents text documents as vectors weighted by
term frequency
andinverse document frequency
, hilighting important words.Probability vector : Represents probabilities assigned to different classes by a model; all elements sum to 1.
Latent vector : hidden or compressed representation of data in lower-dimensional space, learned by models like auto encoders or VAEs.
Phew! These are all the vectors that are available at our disposal, we can do so many fun stuff which we are going to do in the upcoming chapters so, Don’t worry about memorizing all of these for now. Just skim through it — we’ll encounter and explain them in more detail in upcoming posts where they naturally fit into the discussion.
Now that we Jotted down how many kinds of vectors there are, let’s look at how to use them in the next blog.
Next Up
In the next post, we will learn more about vector space [Where vectors reside], operations and properties of vectors with some intuitive Gifs and code examples.
Subscribe to my newsletter
Read articles from Srikar Amara directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by