Mathematics for Machine Learning #1

KartavayKartavay
6 min read

In this post, my aim is to spread my understanding of the topics I have learned recently about linear algebra.

MATRICES

Matrices are very cool. You can think of them as some tiny little object holding information about a system of information, this information can be about the behavior of user on your website, a dataset about medical features of some diabetic patient or some random dataset about some institution.

Mathematics allows us to do many useful things with matrices and gain insights from them. Below are some properties.

Addition and Multiplication

  • Addition: When adding two matrices you can just add them, there is no fancy rule. Matrix addition happens element wise so you add 1st element of a matrix to the 1st element of other matrix and so on. One key thing to notice is that both matrices should have same dimensions to get added.

  • Multiplication: Multiplying two matrices is not as simple as addition, it does not happen element wise. There is a rule for this, you can find the elements of the resulting matrix by this. Interesting thing to notice here is the structure of the formula, the resulting matrix will have rows equal to the rows of 1st matrix and columns equal to columns of the 2nd matrix. For some matrix to be multiplied the first matrix should have same number of columns as that of second one.

    You can also notice that “l” is on the column side of “a” and row side of “b”, this is because of the rule mentioned above. We are not sure about the number of rows in A and number of columns in B but we are sure that columns in A match rows in B and if that is equal to n then l goes from 1 to n.

Multiplication Properties: Matrix multiplication comes up with it’s own properties which are quite similar to the simple multiplication operation in arithmetic but differs in some places. Below are the properties.

Associativity just means you can place that bracket anywhere and you will still get the same result, but we need to take care of the dimensions of matric multiplication here.

Distributivity is quite simple but this also requires us to follow same dimension similarity rule.

Identity matrix (I) is a square matrix with only 1’s on it’s diagonal and all other elements are 0. If you understand about transformations you can think of identity matrices as matrices that does not transform any vector or gives the same vector in output when passed through it. It looks simple but it’s very useful.

Inverse, Transpose and Symmetric matrices

Inverse: Inverse matrix of some other matrix can be thought of as a matrix which reverses the effect of some transformation. Let’s say you apply a transformation A (linear transformation and matrices are same) on some vector v and now it’s changed, to get the same vector back you can apply inverse of A.

Mathematically it is defined as below.

Remember: BA should also be equal to I not just AB (it’s obvious but people sometimes forget it)

Matrices which have an inverse are called invertible or non singular matrices and those which don’t are called singular or non invertible matrices. I like to think of invertible matrices as matrices which have a girl friend, that’s why they are not singular.

Determinant also tells us which matrices have an inverse and which don’t. Matrices with a non zero determinant have an inverse and only square matrices have determinants so we can conclude that only square matrices can be non singular or singular matrices (they can also have det of 0).

Transpose: Transpose of a matrix is just another matrix there is nothing fancy about it. In transpose of a matrix the indices of the elements gets swapped. Also if we have a non square matrix then the number rows of matrix A become equal to number of columns of B.

Below are some properties of inverse and transpose.

Symmetric Matrices: Symmetric matrices are the square matrices that are equal to their transpose. They have a cool property: The inverse of their transpose if equal to transpose of their inverse. (I don’t know why this is the case). But their products are not symmetric. Another thing to notice about them is that if you take an element with index i and j and swap those indices to j and i, you will get the same element which was at index i and j.

Multiplication by a Scalar

On multiplying a matrix by a scalar (like some real number) all the elements of the matrix get scaled or multiplied by the same scalar. There are some properties like associativity, distributivity for that.

SOLVING A SYSTEM OF LINEAR EQUATIONS

Solving a system of linear equations just means finding all the solutions of this equation → Ax = b. Here A is the matrix and x is the feature vector containing all the variables as it’s elements and b is the solution of the linear system.

We find the solution of the equation by using Gaussian elimination, which just involves reducing the matrix to a reduced row echelon form (you can learn it on Youtube!) and then we use back substitution to get the different values of our variables, and the solution that we get by solving Ax=b is called a particular solution.

This means that it is not the complete solution, complete solution is General solution which is the combination of particular solution and null space solution (i.e. solution of Ax=b).

But why are we finding the solution of some random equation Ax=0, it does not make any sense. Look, we know on multiplying x with A we get b and on multiplying some other x with A we get 0 in general solution we are just adding solution to Ax = b (particular solution) and Ax = 0 (null space solution) because if you add some vectors that give you 0 on multiplying with A there is nothing wrong with that, in fact it gives us many more solutions than just particular solution.

So, the reason behind finding the null space solution is to find all the possible solutions of Ax=b other than just particular solution. It’s like finding more ways go run from school than just the back door.

So, the simple process to find all the solutions to a system of equations is:

  • Find particular solution (Ax=b)

  • Find null space solution (Ax=0)

  • Add both of them to form general solution

You might think that Gaussian solution will always give you the solution a linear system but it fails for a linear system with million variables (i mean who needs that!!!)

Afterword

This was all for today about matrices, personally I found reading all this stuff very boring so I think the article was boring too. We will discuss in more depth on solving a system of linear equations in the next post.

0
Subscribe to my newsletter

Read articles from Kartavay directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kartavay
Kartavay

I put here the things I am learning related to tech