Introduction to Linear Algebra

Table of contents

We all know what linear algebra is. I’m not going to teach you algebras here. Rather, i will show you how they are being used from the aspect of data science. Enjoy!
Intro
Why is linear algebra important? Think about the common example we use to build a model in machine learning—the house pricing example. You have multiple pieces of information like area, location, and number of rooms. These inputs help determine the price of a house. Each input is a number, or a list of numbers (as location names will eventually be converted into numbers), and these numbers form a vector. If there are 2 features, it's 2-dimensional. As you can see, vectors can be represented as dimensions, and (1) linear algebra is effective with high-dimensional data. What if you have 500 dimensions? This is where linear algebra becomes crucial. You can apply dimensionality reduction (PCA) to convert them into just 2 dimensions. At the end of the day everything is going to be turned into vectors. (2) Moreover, in machine learning, linear algebra is highly used to train a model. Cause in model training we have to perform a lot of matrix operations. (3) In dimensionality reduction, PCA is used based on Linear algebra (Eigen value). (4) Forward propagation and backward propagation in neural network, where you have to work with multiple layers which are interconnected with each other. These layers are a form of input features. Here extensive matrix operation is required.
Scalers and Vectors
Scaler = Single numerical value; represents a magnitude/ quantity
Vector = A numerical value which has both magnitude and direction.
In data science, a vector doesn't necessarily mean an object with a physical direction like in physics. Instead, it represents a collection of values. It can have different dimensions, like [10, 20, 30] - which is 3-dimensional. Everything else is similar, such as plotting points on a diagram. The gaming industry relies heavily on vectors. If you're into gaming, imagine I have a Lamborghini driving at 200 km/hr in GTA 5, and a police car is coming towards me at 200 km/hr. If they collide, there will be effects - the car might explode, catch fire, or spin 100 times. These effects are based on the units; for example, if a car collides at 200 km/hr, it will spin and then be crushed. This is where units are very useful. In GTA, if you ride a boat, a vector will represent your boat, and the boat will jump if the waves are high. These effects are all derived from vector units. These are just some examples, and I hope you understand why vectors are important.
Addition of Vectors
pass (will complete later)
Vectors Multiplication - Dot product (inner product)
Dot product (inner product): The dot product of 2 vectors results in a scaler and is calculated as the sum of t4he products their corresponding components.
Application of dot product in data science: Gen AI App => RAG
Cosine similarity: It is a measure used to determine how similar 2 vectors are. It calculates the cosine of the angle between 2 vectors, providing a similarity score that ranges -1(dissimilar) to 1(complete similar)
Heavily used in recommendation (like in Netflix). Suppose the movie Avengers has a reading of 5 dimension, Avengers [1,2,0,3,1]. These numbers are based on action, drama, story, duration, actor and so so. Another movie, let’s say B has a reading of [2,0,1,1,1]. Now, how do we decide if we should recommend movie B to someone who watched Avengers? We use cosine similarity for this. If the cosine similarity is high, we recommend the movie. If it is low or negative, it means the movie is very different from Avengers, so we won't recommend it. Who knows, it might even be a romantic movie, haha!
Let’s find out:
Step 1: A•B = 1•2 + 2•0 + 0•1 + 3•1 + 1•1 = 6
Step 2: The magnitude of A and B; ||A|| ||B||. We will do it using Euclidian’s norm (pls remember the name, i will teach it later on somewhere in this series). ||A|| = √(1²+2²+0²+3²+1²) = 3.873. Similarly, ||B|| = 2.646
Now for the theta, cos**θ = 6/ (**3.873 * 2.646) = 0.586; which is nothing but 58.6% positively similar. So there is 58/6% chance that if a person watches Avengers, B movie will also be recommended.
Vector Database- Examples of Cosines similarity
When we talk about vector database, we basically design a Rag (retrieval augmented generation) system.
I’m going to explain it with a situation. Suppose, I have a book of 10,000 pages (i mean, if it’s a history book— that explains). And i want to create a chatbot which will be build based on the book so that whenever i ask it a question, it will, without any delay present me the answer of my question from the book. so now, i will turn this book into vectors. There are multiple techniques of doing this, i.e., Bag of words, TF-IDF, word2vec and so on. However, once we convert this book into vectors, we will save it inside a vector database. Now from the user sides, whenever I will write a question, the query will be converted into a vector. Then it will get queried into the database. Once we query this vector database, a cosine similarity search will occur internally. We will then compare the results, and if there is, let's say, a 90% similarity with the query, the entire matching vector will be converted into text. After that, it will be returned and displayed.
Vectors Multiplication - Element Wise Multiplication
Def: In element-wise multiplication, corresponding elements of two vectors are multiplied to form a new vector of the same dimension.
pass (will complete later)
Vectors Multiplication - Scaler Multiplication
Def: It involves multiplying vectors by a scaler resulting in a vector where each component is scaled by the vector.
pass (will complete later)
Matrices and Application
A matrix is a rectangular array of numbers, symbols or expression arranged in rows and columns.
Examples of matrices in Data Science:
Data representation:
In a matrix (3×3) the columns specifies 3 features. And for representing the whole matrix we use ixj, row x column. We may have 10,000 rows where we can easily arrange them through their features.
Image in Computer Vision:
Images are made up of pixels. Each pixel has its own code for colors, and images can be represented through a matrix.
Confusion matrix:
This matrix is heavily used in machine learning. It basically calculates the accuracy of a model.
Neural network: (Linear regression)
Matrix is heavily used in neural n/w because it’s neural network is all about hidden layers, forward propagation and backward propagation. In forward propagation (if u have the basic idea of neural network) what we do is, we use the transpose of weights of the layers. Then we add bias with respect to the neuron. All these are basically matrix multiplication. So matrix is extensively used in neural networks. Also, in regression,
y = ∑mx + c => y = m1x1 + m2×2 + c => m’x + c
where, m = [m1,m2] and x = [x1,x2]. So you can see how extensively matrices are used in neural networks. These are just basic examples. If you don't know about neural networks yet, that's fine. I just want to show where matrices are used in data science.
NLP:
Let's say I have a dataset with two features: reviews and positive/negative. In NLP, we can convert the text into vectors. For example, "the food is good" could be [0 2 4 1], where we set 0 for a negative review and 1 for a positive review. After combining all the reviews, our collection will look like a matrix. So yes, matrices are everywhere!
pass (will add the visualization later)
Matrices Operations
Matrix operations are very fundamental to data science because they provide you the mechanism to manipulate.
Matrix Addition: add or subtract corresponding elements of 2 matrix of the same dimension.
Scaler Matrix Multiplication: scaler multiplication involves multiplying every elements of a matrix by a scaler value.
Example/ Scenario- 01: We have a matrix representing product prices in dollars and we want to adjust these prices for inflation by a factor of 1.05.
Now if we want to adjust it, we have to perform the scaler matric multiplication.
1.05
will get multiplied to every element thus we will have a adjusted prices based on inflation.Example/ Scenario- 02: Now I will talk about a real life example. I have a matrix where i have 3 columns representing the base salary of Receptionist, Accountant, HR. In 2025, we are facing an inflation of 8%. Now how will i adjust all the salaries? That’s where scaler matrix is used for efficiency. I will multiply the whole matrix with
0.08
and get the adjusted values.Matrix Multiplication: It involves the dot product of rows of the first matrix with column of the second matrix. For example, A(mxn) and B(nxp) => Result(mxp) [first matrix’s column should be equal to the 2nd matrix’s row number]
It is probably the most extensive operation int he world of machine learning.
pass (will add the visualization later)
Subscribe to my newsletter
Read articles from Fatima Jannet directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
