Understanding Softmax Function Instability

The softmax, as we know, is numerically unstable when applied to vectors containing very small or very large numbers because of the exponential function involved in its computation.

The softmax formula is:

\(\text{softmax}(x_{i}) = \frac{e^{x_{i}}}{\Sigma_{j}e^{x_{j}}}\)

What are the reasons for numerical instability?

Overflow of very large numbers
- If \(x_{i}\) contains a very large number, \(e^{x_{i}}\) can become astronomically large, exceeding the maximum representable floating-point number, causing overflow.
- Example: If \(x_{i} = 1,000\), then \(e^{1000}\) is astronomical that cannot be stored in standard floating-point precision.
Similarly, underflow of very small numbers
- If \(x_{i}\) contains a very small (negative) number, \(e^{x_{i}}\) can become extremely small, possibly rounding down to zero, causing underflow.
- Example: If \(x_{i} = -1,000\), then \(e^{-1000}\) is so close to zero that it may be treated as zero in floating-point arithmetic.
Loss of precision when division
- Exponentials of astronomical numbers dominate the denominators when calculating normalization (norm) \(\Sigma_{j}e^{x_{j}}\), the smaller exponentials may become negligible. This can result in loss of accuracy when computing probabilities.

What is the fix?

A common technique is to subtract the maximum value of the vector from all elements before computing softmax

\(\tilde{x}_{i} = x_{i} - \max_{j} x_{j}\)

\(\text{softmax}(x_{i}) = \frac{e^{\tilde{x}_{i}}}{\sum_{j} e^{\tilde{x}_{j}}}\)

This transformation allows us to prevent both overflow and underflow, ensuring softmax stability even for very large or very small numbers in the input vector.

The instability of a softmax function

What are the reasons for numerical instability?

What is the fix?

Subscribe to my newsletter

Kashif

Kashif