How Neural Networks Tackle Regression, Multiclass, and Multilabel Problems

Neural networks are like pro gamers — they can play different games with the same brain.
The rules change depending on the game, but the process is always:
Guess → Check → Learn → Try Again.
Here’s the ultimate side-by-side so you can see exactly how the same network behaves when predicting numbers, picking one category, or handling multiple labels at once.
The Quick-Look Table
Step | Regression | Multiclass | Multilabel |
Objective | Predict a number | Pick one category | Predict multiple yes/no labels |
Activation | None | Softmax | Sigmoid |
Loss | MSE | CCE | BCE |
Example | Predict salary | Predict dog breed | Predict fleas & allergies |
How the Process Flows
Regression (Linear + MSE)
Teaching a Model to Guess Numbers
Imagine you're trying to teach a model to predict someone's salary based on how many years they've worked.
Input: You show the model "This person has worked for 2 years. What do you think they earn?"
The Model's Math: The model has learned some "rules" (we call these weights and bias). It thinks: "Hmm, for every year of experience, people usually earn about $10,000 more, plus there's a base salary of $5,000." So it calculates: 2 years × $10,000 + $5,000 = $25,000.
Activation: None needed here! Unlike other problems where we need to convert numbers to percentages, we just want the raw number prediction.
Loss: Reality check time! The person actually earns $50,000, not $25,000. We calculate the loss using MSE (Mean Squared Error) - think of it as a "mistake detector" that gets really upset about big errors. The bigger the gap between our guess and reality, the bigger the penalty.
Gradient: Now the model figures out exactly how wrong each of its "rules" was. The gradient tells us: "Your $10,000-per-year rule was way too low, and here's exactly how much you need to change it."
Update: The model tweaks its rules based on the gradient - maybe now it thinks each year is worth $12,000 instead of $10,000, and adjusts the base salary too.
Repeat: The model keeps seeing more examples, calculating loss, finding gradients, and updating until it gets really good at guessing salaries.
It's like learning to throw a basketball - you miss at first, but you keep adjusting your aim until you can hit the basket consistently!.
Multiclass (Softmax + CCE)
Teaching a Model to Pick One Right Answer
Now let’s take an example of a model to look at a dog and pick exactly one breed from several options - like choosing between Golden Retriever, Poodle, or Bulldog.
Input: You show the model a dog with an ear length score of 2.0 (keep things easy).
The Model's Math: The model has separate "rules" for each dog breed. It calculates a score for each: Golden Retriever gets 1.0, Poodle gets 0.6, and Bulldog gets -0.4. These raw scores are called logits.
Activation: Here's where Softmax comes in! It's like a translator that converts those raw scores into proper probabilities that add up to 100%.
So [1.0, 0.6, -0.4] becomes [52.1%, 34.9%, 12.9%]. The model is saying "I'm 52% sure it's a Golden Retriever."
Loss: Plot twist - it was actually a Golden Retriever (class 0)! We calculate the loss using CCE (Categorical Cross-Entropy). Since the model was only 52.1% confident about the right answer, it gets penalized. The less confident it was about the correct answer, the bigger the penalty.
Gradient: The model figures out how wrong each breed's "rules" were.
Update: The model adjusts all its breed-recognition rules based on the gradients - maybe it gets better at spotting Golden Retriever features.
Repeat: The model keeps seeing more dogs, calculating loss, finding gradients, and updating until it can confidently pick the right breed most of the time.
Multilabel (Sigmoid + BCE)
Teaching a Model to Answer Multiple Yes/No Questions
Imagine you're teaching a model to be like a pet doctor who needs to check for multiple health issues at once - like asking "Does this pet have fleas? Does it have allergies?" The pet could have both, neither, or just one!
Input: You show the model a pet with a parasite risk score of 2.0
The Model's Math: The model has separate "rules" for each health condition. It calculates a score for each: Fleas gets 0.8, Allergies gets -0.2. These raw scores are called logits.
Activation: Now Sigmoid jumps in! Unlike Softmax (which made everything add up to 100%), Sigmoid treats each question independently. It converts [0.8, -0.2] into [69.0%, 45.0%]. The model is saying "69% chance of fleas, 45% chance of allergies" - and these don't need to add up to 100% because they're separate questions!
Loss: Reality check! This pet actually has fleas (yes = 1) but no allergies (no = 0). We use BCE (Binary Cross-Entropy) to calculate the loss for each condition separately, then average them. The model gets penalized for being wrong about either condition.
Gradient: The model figures out how to improve each health condition detector separately. The gradient shows: "You were pretty good at detecting fleas, but you were too confident about allergies when there weren't any."
Update: The model adjusts its rules for each condition independently - improving its flea-detection without messing up its allergy-detection.
Repeat: The model keeps seeing more pets, calculating loss for each condition, finding gradients, and updating until it gets really good at answering all the yes/no questions accurately.
It's like learning to be a multi-tasking friend who can tell if someone is hungry AND tired at the same time - each skill develops independently!
💡 Big Takeaway: The network’s learning engine stays the same — only the activation, loss, and update details change depending on the game.
Subscribe to my newsletter
Read articles from Krupa Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
