Principal Component Analysis in ML
In short,PCA(Principal Component Analysis) is a dimensional reduction technique used for high dimensional dataset(i.e many columns) , it's basically taking a snap shot of the dimension of data from an angle so that we can capture the max varience of the datapoints.A projection of higher dimensional data to a lower dimension, again dimension= feature matrix.
Now let's deep dive: If we want to reduce dimensions of dataset first we need to understand the covariance matrix .
Covariance matrix : it is a square matrix which just gives us the idea of how scattered data points are from mean value. In the matrix, diagonal elements represent variance along axis,and off-diagonal elements represents the covariance.
|7 3|
| 3 9|,
Here ( 7,9) is variance, (3,3) covariance.
Now,once we have the covariance matrix,we can proceed to linear transformation.
To transform datapoints linearly we juat need to multiply (x,y) value of datapoints with covariance matrix row or column wise i.e [(7x,3y),(3x,9y)] and we'll get new datapoints on a lower dimension.
And thus we'll have the liner transformation of the dataset.In linear transformation plane among many vectors(feature of data ) well find some vector which direction are same as main data set vectors i,e. [covariance matrix × vector in higher dimension = lambda × vector in lower dimension]
Those vectors are known as Eigenvectors and value by which we multipled(lambda) is known as Eigenvalues. If the co variance metrix is symmetrical the Eigenvectors will be perpendicular to each other. Also Eigenvector direction represents the direction along which data varies most
Now we choose the dimension i.e no of max Eigenvectors as max value of Eigenvector captures the most information of the data set.
Finally ,creating a plane with this Eigenvectors and projecting datapoints on it...voila!
We have our PC!
Thanks for reading.
Subscribe to my newsletter
Read articles from Md.Anisur Rahman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Md.Anisur Rahman
Md.Anisur Rahman
I am Tanim ,a Graduate from Bangladesh. area of interest are machine learning, artificial intelligence,Datascience & web development