Understanding the Sigmoid Function and Its Role in Logistic Regression

Let’s start with the basics—what exactly is the sigmoid function?

The sigmoid function is a mathematical function commonly used in machine learning, especially in binary classification problems and neural networks. It’s defined by the formula:

When plotted on a graph, the sigmoid function creates a smooth, S-shaped curve that asymptotically approaches 0 and 1 at the extremes.

In the realm of machine learning, the sigmoid function proves to be incredibly useful—particularly in classification tasks where outcomes are binary, with one class represented as 0 and the other as 1.

A prominent example of such an application is Logistic Regression, a fundamental algorithm used for binary classification. In this context, the sigmoid function is employed to convert the model's linear output into a probability between 0 and 1, allowing us to interpret the result as the likelihood of belonging to a particular class.

To understand this more concretely, let’s walk through a simple example using an arbitrary dataset with one independent (input) variable and one dependent (output) variable. This will help illustrate the role sigmoid plays in logistic regression.

Let’s plot the data to gain a visual understanding.

We clearly can't fit a straight line through the data points, as there is no apparent linear relationship.
However, for illustrative purposes, let’s assume a weight and intercept to fit an arbitrary linear model.

It’s evident that the linear model doesn’t help much in this case.
To address this, we apply the sigmoid function to transform the linear equation—essentially replacing xxx in the sigmoid expression with the linear combination of features (as defined in our earlier equation).

Now, let’s plot the transformed values against the independent variables to visualize the relationship.

From the graph above, we observe that the straight line has been transformed into an S-shaped sigmoid curve, with values ranging between 0 and 1.
Now, each data point can be projected onto the sigmoid curve, and we can define a threshold above which the predicted class is 1.
For this example, let’s set the threshold at 0.23.

For any new data, after applying the sigmoid transformation, if the resulting value exceeds the threshold (0.23), we classify it as belonging to class 1.
Note that, in the Logistic Regression model from the scikit-learn library, the default threshold is 0.5. Any data point with a sigmoid output greater than 0.5 is classified as class 1.

To fit the best sigmoid curve, we need to choose the optimal values for the weights and intercept. This process is similar to linear regression, where gradient descent is used to minimize the loss function. For logistic regression, the loss function is referred to as Log Loss, which is defined by the following expression:

💡

For more insights, projects, and articles, visit my portfolio at www.tuhindutta.com.

Understanding the Sigmoid Function and Its Applications in Machine Learning

Subscribe to my newsletter

Tuhin Kumar Dutta

Tuhin Kumar Dutta