One Brain, Three Games: How Neural Networks Solve Regression, Multiclass, and Multilabel Tasks

Krupa SawantKrupa Sawant
3 min read

Neural networks are like pro gamers — they can play different games with the same brain.
The rules change depending on the game, but the process is always:

Guess → Check → Learn → Try Again.

Here’s the ultimate side-by-side so you can see exactly how the same network behaves when predicting numbers, picking one category, or handling multiple labels at once.

The Quick-Look Table

StepRegressionMulticlassMultilabel
ObjectivePredict a numberPick one categoryPredict multiple yes/no labels
ActivationNoneSoftmaxSigmoid
LossMSECCEBCE
ExamplePredict salaryPredict dog breedPredict fleas & allergies

How the Process Flows

🧮 Regression (Linear + MSE)

We want to predict a continuous value — here, a salary based on years of experience.

  1. Input: The model sees x = 2.0 (years of experience).

  2. Multiply & Add: Weight (10,000) × input (2.0) + bias (5,000) = 25,000. This is its best guess for the salary.

  3. Activation: None, because we’re predicting a number, not a probability.

  4. Loss: Compare predicted (25,000) to actual (50,000) using MSE. The bigger the gap, the bigger the penalty.

  5. Gradient: Calculate how much the weight was responsible for the mistake — this tells us which way to adjust.

  6. Update: Subtract a fraction of that gradient (scaled by learning rate) from the weight and bias.

  7. Repeat: The model keeps cycling until the gap between prediction and reality is as small as possible.


🎯 Multiclass (Softmax + CCE)

We want the model to choose one correct category — here, the breed of a dog.

  1. Input: The model gets x = 2.0 (maybe a simplified feature, like ear length score).

  2. Multiply & Add: For each class, weight × input + bias → logits like [1.0, 0.6, −0.4].

  3. Activation: Softmax turns these scores into probabilities that sum to 1: [0.521, 0.349, 0.129].

  4. Loss: If the correct breed is class 0, CCE = −log(0.521). A low probability for the correct class means a higher penalty.

  5. Gradient: For each class, find (Predicted − Actual) × Input to see how much to shift the weights.

  6. Update: Adjust weights and biases for all classes based on gradients.

  7. Repeat: Training continues until the probability for the correct class is consistently high.


Multilabel (Sigmoid + BCE)

We want to answer several yes/no questions for each input — here, whether a pet has fleas, allergies, or both.

  1. Input: The model gets x = 2.0 (maybe a feature like parasite risk score).

  2. Multiply & Add: For each label, weight × input + bias → logits like [0.8, −0.2].

  3. Activation: Sigmoid turns each score into an independent probability: [0.690, 0.450].

  4. Loss: Compare each probability to the actual answer (1 for fleas, 0 for allergies) using BCE, then average.

  5. Gradient: For each label, find (Predicted − Actual) × Input.

  6. Update: Adjust weights and biases for each label independently.

  7. Repeat: Continue until the predictions for each label are accurate most of the time.


💡 Big Takeaway: The network’s learning engine stays the same — only the activation, loss, and update details change depending on the game.

0
Subscribe to my newsletter

Read articles from Krupa Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Krupa Sawant
Krupa Sawant