Machine Learning Basics: Probability & Statistics

Hey everyone, Dhairya here

After 4 days of linear algebra + calculus, today I took a break from slopes and gradients and stepped into the world of probability and statistics. These are the tools ML uses to deal with uncertainty and data distribution.

🔢 What I Learned Today

Random Variables & Distributions – understood the difference between discrete and continuous variables, and why distributions (like uniform, normal) matter in ML.
Mean, Median, Mode – the core measures of central tendency (and why outliers can break the mean).
Variance & Standard Deviation – how spread-out data is, crucial for understanding model performance.
Probability Rules – basics of conditional probability & Bayes’ theorem — the foundation of probabilistic models.
ML Connection – realized that everything from evaluating models (accuracy, precision, recall) to Bayesian ML is rooted in probability & stats.

🌱 Reflections

Today’s study hit me with the fact that ML isn’t just math—it’s statistics in action. When we say a model is “confident,” it’s literally probability at work.

I also found visualizing distributions with Python helped me connect theory with practice.

💻 Notebook

I’ve uploaded my Day 5 notebook (covering probability basics and statistical measures with NumPy/Matplotlib) here:
👉 GitHub Link – Day 5 Notebook

📚 Resources

🎥 YouTube

🌐 Websites

GeeksforGeeks – Probability for Machine Learning
Towards Data Science – Statistics for Data Science
Khan Academy – Intro to Statistics

🎯 What’s Next?

For Day 6, I’ll dive deeper into Probability Distributions in detail (Normal, Binomial, Bernoulli, etc.), and how they’re used in ML models.

See you tomorrow 👋
— Dhairya

Day 5 – Probability & Statistics for Machine Learning