Understanding the Sigmoid Function and Its Applications in Machine Learning

Let’s start with the basics—what exactly is the sigmoid function?

The sigmoid function is a mathematical function commonly used in machine learning, especially in binary classification problems and neural networks. It’s defined by the formula:

When plotted on a graph, the sigmoid function creates a smooth, S-shaped curve that asymptotically approaches 0 and 1 at the extremes.

In the realm of machine learning, the sigmoid function proves to be incredibly useful—particularly in classification tasks where outcomes are binary, with one class represented as 0 and the other as 1.

A prominent example of such an application is Logistic Regression, a fundamental algorithm used for binary classification. In this context, the sigmoid function is employed to convert the model's linear output into a probability between 0 and 1, allowing us to interpret the result as the likelihood of belonging to a particular class.

To understand this more concretely, let’s walk through a simple example using an arbitrary dataset with one independent (input) variable and one dependent (output) variable. This will help illustrate the role sigmoid plays in logistic regression.

Let’s plot the data to gain a visual understanding.

We clearly can't fit a straight line through the data points, as there is no apparent linear relationship.
However, for illustrative purposes, let’s assume a weight and intercept to fit an arbitrary linear model.

It’s evident that the linear model doesn’t help much in this case.
To address this, we apply the sigmoid function to transform the linear equation—essentially replacing xxx in the sigmoid expression with the linear combination of features (as defined in our earlier equation).

Now, let’s plot the transformed values against the independent variables to visualize the relationship.

From the graph above, we observe that the straight line has been transformed into an S-shaped sigmoid curve, with values ranging between 0 and 1.
Now, each data point can be projected onto the sigmoid curve, and we can define a threshold above which the predicted class is 1.
For this example, let’s set the threshold at 0.23.

For any new data, after applying the sigmoid transformation, if the resulting value exceeds the threshold (0.23), we classify it as belonging to class 1.
Note that, in the Logistic Regression model from the scikit-learn library, the default threshold is 0.5. Any data point with a sigmoid output greater than 0.5 is classified as class 1.

To fit the best sigmoid curve, we need to choose the optimal values for the weights and intercept. This process is similar to linear regression, where gradient descent is used to minimize the loss function. For logistic regression, the loss function is referred to as Log Loss, which is defined by the following expression:

💡
For more insights, projects, and articles, visit my portfolio at www.tuhindutta.com.
0
Subscribe to my newsletter

Read articles from Tuhin Kumar Dutta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tuhin Kumar Dutta
Tuhin Kumar Dutta

I decode data, craft AI solutions, and write about everything from algorithms to analytics. Here to share what I learn and learn from what I share. 🚀 Data Scientist | AI Enthusiast | Building intelligent systems & simplifying complexity through code and curiosity. Sharing insights, projects, and deep dives in ML, data, and innovation.