Exploring Neural Networks with One Hidden Layer: A Beginner's Guide, Part 2


Hi there! 👋
I’m Dhyuthidhar, and if you’re new here, welcome! I love exploring computer science topics, especially machine learning, and breaking them down into easy-to-understand concepts. Today, let’s continue to talk about neural networks.
In the previous blog, I delved into the mathematical foundation of neural network code, starting with the basics and moving on to forward propagation.
Previous blog: Link
In today’s blog, I will continue and show you till the end of the implementation.
Cost Function Implementation
def compute_cost(A2, Y):
m = Y.shape[1] # Number of examples
# Compute the cross-entropy cost
logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1 - A2), 1 - Y)
cost = -1/m * np.sum(logprobs)
cost = float(np.squeeze(cost)) # Makes sure cost is the dimension we expect, convert from [1] to 1
return cost
# Example usage of compute_cost
cost = compute_cost(A2, Y)
print("cost = " + str(cost))
We will compute the cost using this function, and the arguments are the output of the activation function of the hidden layer (A2) and the calculated output (Y).
The variable 'm' denotes the total number of training examples, which in this case is 400.
We know the cost function formula:
$$J(W, b) = -\frac{1}{m} \sum_{i=1}^{m} \left[ Y^{(i)} \log(A^{2}) + (1 - Y^{(i)}) \log(1 - A^{2}) \right]$$
This function will return the cost by taking arguments as A2 and Y.
Backward Propagation
def backward_propagation(parameters, cache, X, Y):
m = X.shape[1] # m -> training examples -> 400
# Retrieve parameters
W1 = parameters["W1"] # W1 -> shape -> (n_h,n_x) -> (4,2)
W2 = parameters["W2"] # W2 -> shape -> (n_y,n_h) -> (1,4)
# Retrieve cached values
A1 = cache["A1"] # Same shape as W1
A2 = cache["A2"] # Same shape as W2
# Backward propagation: calculate dW1, db1, dW2, db2
# We can do all these derivatives using pen and paper
dZ2 = A2 - Y
dW2 = 1/m * np.dot(dZ2, A1.T)
db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
dW1 = 1/m * np.dot(dZ1, X.T)
db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
# Example usage of backward_propagation
grads = backward_propagation(parameters, cache, X, Y)
print("dW1 = " + str(grads["dW1"]))
print("db1 = " + str(grads["db1"]))
print("dW2 = " + str(grads["dW2"]))
print("db2 = " + str(grads["db2"]))
This function is used to initialize the backpropagation process.
‘m’ represents the number of training examples in the data.
We take the arguments as parameters (W1, W2, and b1, b2), cache (A1, A2, and Z1, Z2) and X and Y.
- The gradient values will be determined using these arguments. By calculating the derivatives and scaling them by 1/m to obtain the average, we can then return the gradients.
Here is the image that illustrate this process:
Updating Parameters
def update_parameters(parameters, grads, learning_rate=1.2):
# Retrieve a copy of each parameter from the dictionary "parameters"
W1 = copy.deepcopy(parameters["W1"])
b1 = copy.deepcopy(parameters["b1"])
W2 = copy.deepcopy(parameters["W2"])
b2 = copy.deepcopy(parameters["b2"])
# Retrieve each gradient from the dictionary "grads"
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]
# Update parameters using gradient descent rule
# optimizing the values of the parameters
W1 = W1 - learning_rate * dW1
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
# Example usage of update_parameters
parameters = update_parameters(parameters, grads)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
We will optimize the gradients using the function
update_parameters
and return the updated parameters.We are using
.deepcopy()
, a function to copy the values of the weights and bias.The learning rate hyperparameter controls how quickly the model's parameters adjust toward the global minimum during optimization, particularly when using the gradient descent algorithm.
Then we will update the parameters and return them.
The commonly used range for the learning rate is between 0.001 and 0.01.
Build the neural network model
def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False):
np.random.seed(3)
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[2]
# Initialize parameters
parameters = initialize_parameters(n_x, n_h, n_y)
# Initialize cost list to track learning curve
costs = []
iterations = []
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation
A2, cache = forward_propagation(X, parameters)
# Cost function
cost = compute_cost(A2, Y)
# Store cost for plotting every 100 iterations
if i % 100 == 0:
costs.append(cost)
iterations.append(i)
# Backpropagation
grads = backward_propagation(parameters, cache, X, Y)
# Update parameters
parameters = update_parameters(parameters, grads)
# Print the cost every 1000 iterations
if print_cost and i % 1000 == 0:
print("Cost after iteration %i: %f" % (i, cost))
# Plot the learning curve
if print_cost:
plt.figure(figsize=(8, 5))
plt.plot(iterations, costs, color='blue')
plt.ylabel('Cost')
plt.xlabel('Iterations')
plt.title('Learning curve')
plt.grid(True)
plt.show()
return parameters, costs
To ensure consistent parameters for the model's output, we use the
random_seed()
function.The
layer_sizes()
function provides the dimensions of the input, hidden, and output layers.We will initialize the parameters.
We will give a specific number of iterations (epochs) as a hyperparameter, and we will run a for loop through it.
We return the optimized parameters through this function.
Predict function
def predict(parameters, X):
# Forward propagation
A2, cache = forward_propagation(X, parameters)
#Why we are doing again?
# Because here we get the new data as X. So we need to do the forward propagation then get the A2, cache from the forward_propogation function
# Convert probabilities to 0/1 predictions
predictions = (A2 > 0.5) # When the value is greater than 0.5 then prediction is 1 and if it is less than 0.5 prediction is 0.
return predictions
# Train the neural network with 4 hidden units
parameters,costs = nn_model(X, Y, n_h=4, num_iterations=10000, print_cost=True)
# Plot the decision boundary
def nn_predict_function(X):
"""
Wrapper function for predict to use with plot_decision_boundary
"""
return predict(parameters, X)
plot_decision_boundary(nn_predict_function, X, Y)
If the value of A2 exceeds 0.5, the prediction will be 1; otherwise, it will be 0. This results in a vector of predictions (either 0 or 1).
The function will return this prediction vector as the model's output.
We used the wrapper function to treat both the Sklearn logistic model and our neural network model equally.
We will plot the decision boundary using
plot_decision_boundary()
.We can see the learning curve in the below diagram:
- We can see that the learning curve shows how the cost decreases over training iterations. The steeper initial drop indicates rapid learning early on, followed by a more gradual improvement as the model fine-tunes its parameters. The eventual flattening suggests the model is approaching convergence.
Accuracy of the model
# Calculate accuracy for neural network
predictions = predict(parameters, X)
accuracy = 100 * np.mean(predictions == Y)
print("Accuracy for neural network: {:0.2f}%".format(accuracy))
# Accuracy for neural network: 97.50%
We are going to print the accuracy of the neural network model.
The accuracy of the model is 97.50.
Understanding Hyperparameters
Our neural network implementation relies on several hyperparameters that significantly impact performance:
Hidden Layer Size (n_h): Controls the model's capacity to learn complex patterns.
Too small (e.g., n_h=1 or 2): May underfit, similar to logistic regression
Balanced (e.g., n_h=4): Captures data patterns without excessive complexity
Too large (e.g., n_h=50): May overfit to training data or take longer to train
Learning Rate: Controls how quickly parameters update during training.
Too small (e.g., 0.0001): Very slow convergence, training takes too long
Balanced (e.g., 0.01): Steady progress toward minimum
Too large (e.g., 10): May overshoot minima, causing unstable training
Number of Iterations: Determines how long the model trains.
Too few: Underfitting, insufficient time to learn patterns
Too many: Potentially overfitting, wasted computational resources
The optimal values depend on your specific dataset. I encourage you to experiment with these hyperparameters using the provided code and observe how they affect model performance and training dynamics.
Computational Complexity Comparison
While neural networks offer more powerful modeling capabilities, they come with increased computational demands:
Time Complexity:
Logistic Regression: O(n_features × n_examples × n_iterations)
Neural Network: O((n_features × n_hidden + n_hidden × n_output) × n_examples × n_iterations)
Space Complexity:
Logistic Regression: O(n_features)
Neural Network: O(n_features × n_hidden + n_hidden × n_output)
The difference is minimal for our moon dataset with just 2 features. However, as we scale to problems with thousands of features and multiple hidden layers, these differences become significant. This illustrates the classic trade-off between model complexity and computational requirements.
Conclusion
Here is the comparison between logistic regression and neural network decision boundary and their accuracy:
- In the logistic regression graph, numerous blue points appear within the red zone, whereas in the neural network graph, only a few blue points are present in the red zone.
In this second part of our neural network journey, we've completed the full implementation of a neural network with one hidden layer. We've moved beyond just understanding the structure and forward propagation to explore how neural networks actually learn from data.
Key takeaways from this implementation include:
The Learning Process: We've seen how the cost function quantifies the model's performance, providing a clear metric to optimize. By implementing backpropagation, we've unlocked the neural network's ability to learn from its mistakes and improve over time.
Parameter Optimization: Through gradient descent and careful parameter updates, we've witnessed how a neural network gradually refines its understanding of the data, converging toward more accurate predictions with each iteration.
Impressive Results: Our simple neural network achieved 97.50% accuracy on the moon dataset, significantly outperforming the logistic regression model from part 1. This demonstrates the power of adding even just one hidden layer with non-linear activation functions.
Practical Implementation: We've built a complete neural network from scratch, without relying on high-level deep learning libraries. This hands-on approach provides deeper insights into how neural networks actually work under the hood.
The curved decision boundary our model learned perfectly captures the moon-shaped data pattern that logistic regression struggled with. This visually confirms what we've learned theoretically: neural networks excel at learning complex, non-linear relationships in data.
This implementation serves as a foundation for understanding more complex neural network architectures. The principles we've covered – forward propagation, cost calculation, backpropagation, and parameter optimization – are the same ones used in state-of-the-art deep learning models, just applied to larger networks with more layers.
I encourage you to experiment with the hyperparameters of this model, such as changing the number of hidden units, modifying the learning rate, or trying different activation functions. Each adjustment offers a new learning opportunity and helps build intuition about neural network behavior.
In our future discussions, we'll explore how to extend these concepts to deeper networks, handle different types of problems beyond binary classification, and implement more advanced optimization techniques. But for now, congratulations on building your first complete neural network from scratch!
References
These are the references that will be helpful to get to know more about this concept:
Here is the code: One-Hidden-Layer code
Deep Learning or Machine Learning course by Andrew Ng in Coursera (Week 3 in course 1).
The 100 Pages of ML book by Andriy Burkov
These two are wonderful resources for learning these concepts.
Action Step:
Let's create a tiny dataset with just 3 examples, 2 features, and binary labels:
X (2×3 matrix, features as rows):
X = [
[1.0, 0.5, -0.5], # Feature 1
[0.3, 1.0, -0.1] # Feature 2
]
Y (1×3 matrix, labels):
Y = [[1, 0, 0]]
Here’s a simple dataset for you to work with. Try implementing a neural network model using this data.
You can also practice the concept by working it out on paper.
Why I Share This
Simon Squibb believes that the best way to share knowledge is to make it simple and accessible. That’s exactly what I do—I break down complex technology into something easy and exciting.
Tech should inspire you, not intimidate you.
Imagine a world without machine learning—every company would need to manually analyze massive datasets just to extract insights. Deep learning changed that game. It enables anyone with data to uncover patterns and build intelligent systems without relying solely on traditional methods.
I share knowledge this way because I want you to feel that excitement too.
If this post made you think differently about tech, check out my other blogs. Let’s make tech easy and exciting—together! 🚀
Subscribe to my newsletter
Read articles from S.S.S DHYUTHIDHAR directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

S.S.S DHYUTHIDHAR
S.S.S DHYUTHIDHAR
I am a student. I am enthusiastic about learning new things.