How Neural Networks Tackle Regression, Multiclass, and Multilabel Problems

Krupa SawantKrupa Sawant
5 min read

Neural networks are like pro gamers — they can play different games with the same brain.
The rules change depending on the game, but the process is always:

Guess → Check → Learn → Try Again.

Here’s the ultimate side-by-side so you can see exactly how the same network behaves when predicting numbers, picking one category, or handling multiple labels at once.

The Quick-Look Table

StepRegressionMulticlassMultilabel
ObjectivePredict a numberPick one categoryPredict multiple yes/no labels
ActivationNoneSoftmaxSigmoid
LossMSECCEBCE
ExamplePredict salaryPredict dog breedPredict fleas & allergies

How the Process Flows

Regression (Linear + MSE)

Teaching a Model to Guess Numbers

Imagine you're trying to teach a model to predict someone's salary based on how many years they've worked.

  1. Input: You show the model "This person has worked for 2 years. What do you think they earn?"

  2. The Model's Math: The model has learned some "rules" (we call these weights and bias). It thinks: "Hmm, for every year of experience, people usually earn about $10,000 more, plus there's a base salary of $5,000." So it calculates: 2 years × $10,000 + $5,000 = $25,000.

  3. Activation: None needed here! Unlike other problems where we need to convert numbers to percentages, we just want the raw number prediction.

  4. Loss: Reality check time! The person actually earns $50,000, not $25,000. We calculate the loss using MSE (Mean Squared Error) - think of it as a "mistake detector" that gets really upset about big errors. The bigger the gap between our guess and reality, the bigger the penalty.

  5. Gradient: Now the model figures out exactly how wrong each of its "rules" was. The gradient tells us: "Your $10,000-per-year rule was way too low, and here's exactly how much you need to change it."

  6. Update: The model tweaks its rules based on the gradient - maybe now it thinks each year is worth $12,000 instead of $10,000, and adjusts the base salary too.

  7. Repeat: The model keeps seeing more examples, calculating loss, finding gradients, and updating until it gets really good at guessing salaries.

It's like learning to throw a basketball - you miss at first, but you keep adjusting your aim until you can hit the basket consistently!.


Multiclass (Softmax + CCE)

Teaching a Model to Pick One Right Answer

Now let’s take an example of a model to look at a dog and pick exactly one breed from several options - like choosing between Golden Retriever, Poodle, or Bulldog.

  1. Input: You show the model a dog with an ear length score of 2.0 (keep things easy).

  2. The Model's Math: The model has separate "rules" for each dog breed. It calculates a score for each: Golden Retriever gets 1.0, Poodle gets 0.6, and Bulldog gets -0.4. These raw scores are called logits.

  3. Activation: Here's where Softmax comes in! It's like a translator that converts those raw scores into proper probabilities that add up to 100%.

    So [1.0, 0.6, -0.4] becomes [52.1%, 34.9%, 12.9%]. The model is saying "I'm 52% sure it's a Golden Retriever."

  4. Loss: Plot twist - it was actually a Golden Retriever (class 0)! We calculate the loss using CCE (Categorical Cross-Entropy). Since the model was only 52.1% confident about the right answer, it gets penalized. The less confident it was about the correct answer, the bigger the penalty.

  5. Gradient: The model figures out how wrong each breed's "rules" were.

  6. Update: The model adjusts all its breed-recognition rules based on the gradients - maybe it gets better at spotting Golden Retriever features.

  7. Repeat: The model keeps seeing more dogs, calculating loss, finding gradients, and updating until it can confidently pick the right breed most of the time.


Multilabel (Sigmoid + BCE)

Teaching a Model to Answer Multiple Yes/No Questions

Imagine you're teaching a model to be like a pet doctor who needs to check for multiple health issues at once - like asking "Does this pet have fleas? Does it have allergies?" The pet could have both, neither, or just one!

  1. Input: You show the model a pet with a parasite risk score of 2.0

  2. The Model's Math: The model has separate "rules" for each health condition. It calculates a score for each: Fleas gets 0.8, Allergies gets -0.2. These raw scores are called logits.

  3. Activation: Now Sigmoid jumps in! Unlike Softmax (which made everything add up to 100%), Sigmoid treats each question independently. It converts [0.8, -0.2] into [69.0%, 45.0%]. The model is saying "69% chance of fleas, 45% chance of allergies" - and these don't need to add up to 100% because they're separate questions!

  4. Loss: Reality check! This pet actually has fleas (yes = 1) but no allergies (no = 0). We use BCE (Binary Cross-Entropy) to calculate the loss for each condition separately, then average them. The model gets penalized for being wrong about either condition.

  5. Gradient: The model figures out how to improve each health condition detector separately. The gradient shows: "You were pretty good at detecting fleas, but you were too confident about allergies when there weren't any."

  6. Update: The model adjusts its rules for each condition independently - improving its flea-detection without messing up its allergy-detection.

  7. Repeat: The model keeps seeing more pets, calculating loss for each condition, finding gradients, and updating until it gets really good at answering all the yes/no questions accurately.

It's like learning to be a multi-tasking friend who can tell if someone is hungry AND tired at the same time - each skill develops independently!


💡 Big Takeaway: The network’s learning engine stays the same — only the activation, loss, and update details change depending on the game.

0
Subscribe to my newsletter

Read articles from Krupa Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Krupa Sawant
Krupa Sawant