Machine Learning Fundamentals

Trinita RoyTrinita Roy
6 min read

Once a shadowy realm within academic circles and research papers, Machine Learning (ML) has emerged as a driving force behind today's technological marvels. From the helpful voices of our smart assistants to the vigilant eyes of fraud detection systems and the personalized touch of recommendation engines, ML quietly powers some of the most intelligent systems we interact with daily.

What Exactly is Machine Learning?

At its heart, Machine Learning is the art and science of enabling computers to learn from data, discern patterns, and make predictions without relying on explicitly programmed rules. Unlike traditional programming, where every step is dictated by human-written code, ML flips this paradigm: data becomes the instructor, and the model becomes the student.

Consider teaching a computer to differentiate between cats and dogs by showing it hundreds of labeled images. Over time, the system begins to recognize subtle visual cues โ€“ perhaps the shape of the ears or the structure of the snout. Eventually, it can identify cats and dogs in entirely new images it has never encountered before. That, in essence, is machine learning at work.

The Two Fundamental Approaches: Supervised vs. Unsupervised Learning

Machine Learning is broadly categorized into two main approaches:

  • Supervised Learning: Think of this as learning with a "teacher." The model is trained on a dataset where each data point has a corresponding "correct answer" or label. For instance, you might have data with features like the size of a house (input) and its selling price (output). The model learns to map these inputs to the desired outputs.

    • Examples: Linear Regression, Logistic Regression, Decision Trees, Support Vector Machines.
  • Unsupervised Learning: Here, the model is left to explore on its own. It's given unlabeled data and tasked with finding inherent structure, such as grouping similar data points together or reducing the complexity of the data.

    • Examples: K-Means Clustering, Principal Component Analysis (PCA), DBSCAN.

These two fundamental paradigms underpin the vast majority of Machine Learning applications you see in action.

Training, Testing, and the Crucial Need for Evaluation

As we venture further into the world of building models, it's essential to discuss how we train them effectively and, even more importantly, how we assess their reliability.

Train/Test Split

To accurately gauge how well a model can generalize to new, unseen data, we typically divide our available dataset into two distinct parts:

  • Training Set: This is the portion of the data we use to "teach" our model.

  • Testing Set: This is a separate, held-out portion that we use exclusively to evaluate the model's performance on data it hasn't seen during training.

A common split might be an 80/20 or 70/30 ratio, depending on the overall size of the dataset.

Cross-Validation: Going Beyond a Single Split

Relying on a single train/test split can sometimes lead to a biased evaluation due to the random nature of the split. This is where k-Fold Cross-Validation comes in handy:

  1. The dataset is divided into k equal parts (or "folds").

  2. The model is trained on kโˆ’1 of these folds and validated on the remaining one.

  3. This process is repeated k times, with each fold taking a turn as the validation set.

  4. Finally, the performance metrics from all k validations are averaged to give a more robust estimate of the model's generalization ability.

Cross-validation provides a more reliable assessment of a model's performance, reducing the impact of a particularly "lucky" or "unlucky" data split.

Loss Functions: Quantifying Model Mistakes

When our models make predictions, we need a way to measure how far off those predictions are. This is the role of loss functions โ€“ they assign a penalty for incorrect predictions.

For regression tasks (where we're predicting a continuous value), two common loss functions are:

  • Mean Squared Error (MSE):

    MSE =

    $$(\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2)$$

  • Here, y_i is the actual value, (\hat{y}_i) is the predicted value, and n is the number of data points.

    • It squares the errors, giving more weight to larger deviations.

    • It is sensitive to outliers.

    • Its smooth and differentiable nature makes it well-suited for optimization techniques like gradient descent.

  • Mean Absolute Error (MAE):

    MAE =

    $$(\frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|)$$

    • It takes the absolute value of the errors.

    • It is less sensitive to outliers compared to MSE.

    • It doesn't disproportionately penalize large errors.

The choice between MSE and MAE often depends on the specific problem and the sensitivity to outliers.

Descent into Gradients: The Magic of Optimization

Once we have a way to measure our model's errors (using a loss function), the next crucial step is to minimize this error. This is where Gradient Descent, a fundamental optimization algorithm in Machine Learning, comes into play.

How Gradient Descent Works:

  1. We start with initial, often random, values for our model's parameters (like the slope and intercept in linear regression).

  2. We calculate the gradient of the loss function with respect to each of these parameters. The gradient tells us the direction of the steepest increase in the loss.

  3. We then take small steps in the opposite direction of the gradient. This intuitively moves us towards lower values of the loss function.

  4. We repeat this process iteratively until the loss function converges to a minimum, meaning our model's predictions are as close to the actual values as possible.

Think of it like navigating down a foggy mountain. You can't see the entire landscape, but by feeling the slope of the ground beneath your feet (the gradient), you can take steps that lead you downwards towards the lowest point.

๐Ÿ’ก In the context of Linear Regression, gradient descent helps us find the optimal values for the model's weights (coefficients) so that the resulting line best fits the data.

The Generalization Tradeoff: Underfitting vs. Overfitting

No discussion about building models is complete without understanding the challenges of underfitting and overfitting โ€“ two common pitfalls for ML practitioners.

  • Underfitting occurs when your model is too simplistic to capture the underlying patterns in the data. Imagine trying to fit a straight line to data that clearly follows a curve. The line won't be a good representation of the data.

  • Overfitting, on the other hand, happens when your model learns not just the actual signal in the data but also the random noise. While an overfit model might perform exceptionally well on the training data, it will likely perform poorly on new, unseen data because it has essentially memorized the training set, including its peculiarities.

The art of machine learning often lies in finding the right balance โ€“ building a model that is complex enough to learn the meaningful patterns without being so complex that it memorizes the noise.

0
Subscribe to my newsletter

Read articles from Trinita Roy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Trinita Roy
Trinita Roy

I work at Bosch. Prior to this, I was a Gen - AI researcher at Fraunhofer IPA, and an AI/ML Research Scientist with 2 years of experience at SciSpace, currently pursuing my 2nd Masters in Computational Linguistics at the University of Stuttgart. I bring a robust foundation in applied NLP, Large Language Models (LLMs), Gen AI, Information Retrieval, and Retrieval Augmented Generation (RAG) pipelines. I am adept at optimizing and deploying cutting-edge AI infrastructure. I am passionate about creating impactful AI solutions by contributing to research endeavours.