One Brain, Three Games: How Neural Networks Solve Regression, Multiclass, and Multilabel Tasks


Neural networks are like pro gamers — they can play different games with the same brain.
The rules change depending on the game, but the process is always:
Guess → Check → Learn → Try Again.
Here’s the ultimate side-by-side so you can see exactly how the same network behaves when predicting numbers, picking one category, or handling multiple labels at once.
The Quick-Look Table
Step | Regression | Multiclass | Multilabel |
Objective | Predict a number | Pick one category | Predict multiple yes/no labels |
Activation | None | Softmax | Sigmoid |
Loss | MSE | CCE | BCE |
Example | Predict salary | Predict dog breed | Predict fleas & allergies |
How the Process Flows
🧮 Regression (Linear + MSE)
We want to predict a continuous value — here, a salary based on years of experience.
Input: The model sees x = 2.0 (years of experience).
Multiply & Add: Weight (10,000) × input (2.0) + bias (5,000) = 25,000. This is its best guess for the salary.
Activation: None, because we’re predicting a number, not a probability.
Loss: Compare predicted (25,000) to actual (50,000) using MSE. The bigger the gap, the bigger the penalty.
Gradient: Calculate how much the weight was responsible for the mistake — this tells us which way to adjust.
Update: Subtract a fraction of that gradient (scaled by learning rate) from the weight and bias.
Repeat: The model keeps cycling until the gap between prediction and reality is as small as possible.
🎯 Multiclass (Softmax + CCE)
We want the model to choose one correct category — here, the breed of a dog.
Input: The model gets x = 2.0 (maybe a simplified feature, like ear length score).
Multiply & Add: For each class, weight × input + bias → logits like [1.0, 0.6, −0.4].
Activation: Softmax turns these scores into probabilities that sum to 1: [0.521, 0.349, 0.129].
Loss: If the correct breed is class 0, CCE = −log(0.521). A low probability for the correct class means a higher penalty.
Gradient: For each class, find (Predicted − Actual) × Input to see how much to shift the weights.
Update: Adjust weights and biases for all classes based on gradients.
Repeat: Training continues until the probability for the correct class is consistently high.
✅ Multilabel (Sigmoid + BCE)
We want to answer several yes/no questions for each input — here, whether a pet has fleas, allergies, or both.
Input: The model gets x = 2.0 (maybe a feature like parasite risk score).
Multiply & Add: For each label, weight × input + bias → logits like [0.8, −0.2].
Activation: Sigmoid turns each score into an independent probability: [0.690, 0.450].
Loss: Compare each probability to the actual answer (1 for fleas, 0 for allergies) using BCE, then average.
Gradient: For each label, find (Predicted − Actual) × Input.
Update: Adjust weights and biases for each label independently.
Repeat: Continue until the predictions for each label are accurate most of the time.
💡 Big Takeaway: The network’s learning engine stays the same — only the activation, loss, and update details change depending on the game.
Subscribe to my newsletter
Read articles from Krupa Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
