Calculus for AI
data:image/s3,"s3://crabby-images/302c9/302c96206de7f97bacc0799e4b59f32f41553572" alt="daybreak"
data:image/s3,"s3://crabby-images/977b4/977b4cfde85712a7b522c0caabc7a0d889b6e29b" alt=""
Key Math Python Libs for AI
Numpy
A library for numerical computing in Python. It provides support for arrays, matrices, and mathematical functions.
Numpy handles large datasets and mathematical operations efficiently.
Example of usage of Numpy.
import numpy as np
# Create a NumPy array
array = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", array)
Pandas
A library for data manipulation and analysis. It provides data structures like DataFrames and Series.
Pandas simplifies data manipulation(ELT), exploration, and data analysis.
Example of usage of Pandas.
import pandas as pd
# Create a Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("Pandas DataFrame:\n", df)
Calculus
Calculus is the backbone of the optimization of AI algorithms. The key concepts for AI Calculus including:
Derivatives
Gradients
Optimization
Derivatives measure how a function changes when the inputs change.
Gradients are the generalization of derivatives for multivariable functions.
Optimization is to find the minimum and maximum of a function.
Derivative
The derivative of a function f(x) measures “the rate of change” of f with respect to x. In simpler terms, it tells us how quickly the function is changing at that point.
Geometric Interpretation: The slope of the tangent line to the curve at a point.
Real-life example
Imagine driving on a highway and your speedometer shows how many kilometres you travel per hour. The speed is the rate of change of your position. If your position is a function of time, the speed is the derivative of the function.
Example of calculating derivatives in Python
import numpy as np
import matplotlib.pyplot as plt
# Define a function
def f(x):
return x**2
# Define its derivative
def df(x):
return 2*x
# Plot the function and its derivative
x = np.linspace(-10, 10, 100)
plt.plot(x, f(x), label="f(x) = x^2")
plt.plot(x, df(x), label="f'(x) = 2x")
plt.legend()
plt.title("Function and Its Derivative")
plt.show()
Gradient
A gradient is a vector that represents the direction and rate of the maximum increase of a scalar function. It tells how things change in that direction. It tells us both the direction and rate of the fastest change. Geometric Interpretation: Points in the direction of the steepest ascent of the function.
Real-life example: Temperature Distribution in a Room
Imagine you’re in a room where the temperature varies from place to place. The temperature at any point in the room can be described by a function T(x, y, z), where x, y, and z are the coordinates in the room. The gradient of the temperature function \nabla T at any point tells you:
Direction: The direction in which the temperature is increasing the fastest. If you follow this direction, you’ll get to the warmest spot the quickest.
Rate of Increase: The magnitude of the gradient tells you how quickly the temperature changes in that direction. A large gradient means the temperature changes rapidly over a short distance, while a small gradient means the temperature changes slowly.
For example, if you’re feeling cold and want to move to a warmer spot, you should move in the direction of the gradient of the temperature function. If you want to stay in a place where the temperature is relatively constant, you should move perpendicular to the gradient.
Python code for this example
import numpy as np
import matplotlib.pyplot as plt
# Define the temperature function
def T(x, y):
return 20 + 5 * np.exp(-0.1 * (x**2 + y**2))
# Create a grid of points
x = np.linspace(-5, 5, 20)
y = np.linspace(-5, 5, 20)
X, Y = np.meshgrid(x, y)
# Compute the temperature values
Z = T(X, Y)
# Compute the gradient of the temperature function
grad_T_x, grad_T_y = np.gradient(Z, x, y)
# Create a contour plot
plt.contourf(X, Y, Z, levels=20, cmap='coolwarm')
plt.colorbar(label='Temperature (°C)')
# Add the gradient vectors as a quiver plot
plt.quiver(X, Y, grad_T_x, grad_T_y, color='white')
# Add labels and title
plt.xlabel('x')
plt.ylabel('y')
plt.title('Temperature Distribution and Gradient Vectors')
plt.show()
Relationship between the derivative and the gradient
The derivative and the gradient are closely related but used in different contexts.
The derivative is a scalar that represents the rate of change of a function for a single variable. The gradient, on the other hand, is a vector that represents the rate of change of a function for multiple variables.
Role of Gradients in AI Optimizer
In machine learning and deep learning, the gradient is the core concept of optimization algorithms, esp. when training models.
As I showed the concept of the gradient previously, the gradient represents the derivative of the loss function to the model parameters. It indicates the direction in the parameters space where the loss function changes most rapidly. By calculating the gradients, the optimization algorithms can adjust the model’s parameters to gradually reduce the value of the loss function, thereby improving the model’s performance.
Gradient Descent (GD)
Gradient Descent is an optimization algorithm used to find the local minimum of a differentiable function. In machine learning, it’s widely used to train models by iteratively adjusting the model parameters to minimize the loss function.
How GD Works
Initialize the Parameters: Starts with a random set of model parameters
Compute the Gradient: Calculate the gradient of the loss function for the current parameters. The gradient is a vector that points in the direction where the loss function increases most rapidly.
Update parameters (Gradient Descending): Update the parameters in the opposite direction of the gradient. (i.e. the direction where the loss function decreases most rapidly). The step size of the update is controlled by the Learning Rate (LR).
Iterate: Repeat the process until the loss function converges to a local minimum.
Gradient Descent Formula
The Gradient Descent algorithm is used to minimize a function (loss function) by iteratively moving in the direction of the negative gradient. The formula for updating the parameters is:
Where:
θold: The current value of the parameter (e.g., weights in a model).
θnew: The updated value of the parameter after one iteration.
η: The learning rate, a hyperparameter that controls the step size of the update.
∇J(θold): The gradient of the cost function J with respect to θ.
Code Example: Implementing GD with Python
import numpy as np
# Define a function and its gradient
def f(x):
return x**2
def df(x):
return 2*x
# Gradient Descent
def gradient_descent(starting_point, learning_rate, num_iterations):
x = starting_point
for i in range(num_iterations):
grad = df(x)
x = x - learning_rate * grad
print(f"Iteration {i+1}: x = {x}, f(x) = {f(x)}")
return x
# Run gradient descent
starting_point = 10
learning_rate = 0.1
num_iterations = 20
optimized_x = gradient_descent(starting_point, learning_rate, num_iterations)
print("Optimized x:", optimized_x)
Example output:
Observation
The goal is to find the value of x that minimizes the cost function f(x). In the example, the cost function is f(x)=x², and its minimum occurs at x\=0. After 20 iterations, the parameter “x” is optimized to a much smaller value “0.115", At this point, f(x)=(0.0115)²=0.000133, which is very close to the minimum value of 0.
Why Is GD Important in AI?
In machine learning, the cost function/loss function J(θ) represents the error of the model (e.g., mean squared error for regression).
The goal of training a model is to find the parameters θ that minimize the loss/cost function J(θ).
Gradient Descent is used to iteratively update the parameters θ to achieve this goal.
The "optimized" parameters are the ones that result in the best performance of the model.
Summary
Today in my article about AI Calculus, I explained the derivative, gradient, and gradient descent. Also, this article covers what’s Gradient Descent and why these mathematical concepts are important in AI. And I also show these concepts with Python code. I hope this article is easy to understand and you enjoy AI learning so far, I will uncover the veil of Machine Learning and some hands-on exercises on Data Processing in Machine Learning:) See you next time!
Subscribe to my newsletter
Read articles from daybreak directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/302c9/302c96206de7f97bacc0799e4b59f32f41553572" alt="daybreak"
daybreak
daybreak
I’m Xin, a passionate AI enthusiast and tech-savvy explorer on a mission to demystify the world of artificial intelligence. My journey into AI began with a fascination for how machines can learn and adapt, and it has since grown into a deep dive into the cutting-edge technologies that are shaping our future. On this blog, I aim to share my discoveries, insights, and experiences with fellow AI aficionados and curious minds. Whether you’re a seasoned developer, a tech student, or just someone intrigued by the possibilities of AI, I hope you’ll find something valuable here. From the latest breakthroughs in machine learning to practical applications in everyday life, I strive to make complex concepts accessible and engaging. Join me as we explore the fascinating intersection of AI, technology, and human ingenuity. Feel free to reach out if you have any questions or just want to chat about all things AI. Let’s embark on this exciting journey together!