What is PCA in Machine learning?

Kishar NathKishar Nath
2 min read

PCA is a dimensionality reduction technique we use in Data science. PCA is a unsupervised learning technique, meaning it does not rely on labeled data. It has several application like Image compression, Data visualization and Exploratory data analysis ,etc.

To understand PCA, we have to understand projection.

Here Projection of ox onto OZ is P. So P will be

$$\frac{X^TZ}{||Z||^2}Z$$

Why we use projection?

See here, the least distance from X to the line below is ||XP||.

We use this concept in PCA

Let's see now, How we can reduce dimensionality of data. Imagine we have 5 datapoints dataset in 2d. Let's say, to store one intger we need 1 byte of memory, So we need 10bytes of memory. Now if we use PCA , We can reduce it to 7 bytes of data. How?

If we want to store these 5 data points in a file we simply can store the vector

$$\overrightarrow{AB} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$

Then we can store all constant for points a,b,c,d,e why? we can get a, b,c,d or e by doing this

$$a/b/c/d/e =constant \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}$$

So we have to store 5 constant and one vector which has 2 integer. So in total we have to store 7 bytes of data. Previously we had to store 10 bytes of data

4
Subscribe to my newsletter

Read articles from Kishar Nath directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kishar Nath
Kishar Nath