Numerical Computing for AI

Introduction
Numerical computing in computer science involves using computers to perform calculations on real numbers, which are often approximated due to the limitations of computer representation. This field is crucial for various scientific and engineering applications where analytical solutions are difficult or impossible to obtain.
In this course, we mainly focus on what numerical computing actually is and why it's called "numerical computing."
Analytical vs. Numerical Mathematics
The math that we have done so far (like calculus and multivariable calculus) is analytical math. In a sense, we use numeric values and insert them into equations to simplify and get a precise solution. Analytical math is good when we have limited variables or objects and need a simplified solution.
Take Newton's Second Law:
F = ma
This works well with two variables. But if we add a third object and try to calculate a radius or interaction, it becomes difficult to solve analytically. That’s where numerical methods come in. These methods use approximations to solve problems that are too complex for traditional analytical solutions. These approximations are very close to the actual answers.
So why do we need approximations? Because analytical methods are only feasible for limited objects. For large-scale problems, mathematicians use numerical methods, and when these methods are implemented via computers, it's called numerical computation.
Scalars and Vectors
To begin understanding numerical computing, we start with the concepts of scalars and vectors.
Scalars: A single value, with no direction. In machine learning, if we consider an equation like
area = length * width
, the result (area) is a scalar.Vectors: A collection of scalars, often having direction. In machine learning, even if we are using
length
andwidth
together, they are treated as a vector.
Matrices
A matrix is not just rows and columns; it acts as a transformer of vectors.
When a matrix is multiplied by a vector, the result is a new vector that represents a transformation — involving direction, magnitude, or both. Entire fields like ML and computer vision are built upon these transformations.
Linearity
What is linearity? Suppose we have two parallel lines. If, after applying a matrix transformation, the lines remain parallel and preserve the origin, this is linearity.
Eigenvectors and Eigenvalues
Eigenvector: A special vector that doesn’t change direction when a transformation (matrix) is applied. Only the magnitude changes.
Eigenvalue: Tells how much the eigenvector is stretched or shrunk during the transformation.
Applications:
Used in graph algorithms like PageRank.
Google Search is built around this.
Principal Component Analysis (PCA) uses eigenvectors to find the direction of maximum variance.
Summary:
Eigenvectors = direction of patterns
Eigenvalues = strength of those patterns
Scalars, Vectors, Matrices, and Tensors
Scalars are single numbers.
Vectors are collections of scalars.
Matrices are collections of vectors.
Tensors are higher-dimensional collections of matrices.
Floating-Point Representation
How does a computer store floating-point numbers like 10.665?
IEEE 754 Standard
Stored as:
(-1)^sign × mantissa × 2^exponent
Parts:
Sign bit: 0 = positive, 1 = negative
Mantissa: Stores the digits
Exponent: Tells where the decimal point goes
Example:
10.665 in binary: 1010.1010101...
becomes 1.010101 × 2^3
Dynamic Decimal Point: Allows storing both very large and very small numbers using exponents.
Hardware Perspective
FPU (Floating-Point Unit): Special circuit in the CPU for float operations.
Registers: Store mantissa and exponent.
Instruction Set: Includes operations like FADD, FMUL.
Precision Modes: FP16, FP32, FP64 (used in AI)
Software Perspective
Handled by programming languages and libraries.
Data Types
float16
: Half precisionfloat32
: Common in MLfloat64
: Scientific computing
Python Example:
import numpy as np
x = np.float32(10.665)
y = np.float64(10.665)
Libraries
NumPy, SciPy, TensorFlow handle precision, rounding, and overflow/underflow automatically.
Errors and warnings can be managed using
np.seterr()
HDF5 Format
Used when working with Keras/TensorFlow.
Stores: model architecture, weights, optimizer state.
More scalable than NumPy arrays (which reside entirely in memory).
Can load parts of the dataset dynamically from disk (like SSD), improving memory efficiency.
Distance Metrics
Euclidean Distance
Straight-line distance
Formula: √((x2 - x1)^2 + (y2 - y1)^2)
Used in: KNN, K-Means, recommendation systems
Weakness: sensitive to different feature scales, fails in high dimensions
Manhattan Distance
Grid-based distance (L1 norm)
Formula: |x2 - x1| + |y2 - y1|
Used in: sparse data (text, images), Lasso Regression
Weakness: ignores angles and direction
Matrix Decomposition Methods
LU Decomposition (Doolittle)
A = LU where:
L = Lower triangular (diagonal = 1)
U = Upper triangular
Used for solving Ax = b
Crout Method
A = LU where:
- U has diagonal of 1s
Cholesky Decomposition
A = LLᵀ
For symmetric positive-definite matrices
Used in Gaussian processes, Kalman filters
Gauss-Seidel Method
Iterative method to solve Ax = b
Improves guess step-by-step using latest calculated values
Use in AI:
Optimization problems
Sparse systems like recommendation engines
Reinforcement learning with constraints
Root Finding
Find x such that f(x) = 0
Methods:
Bisection Method: split interval in half
Newton-Raphson Method: uses derivatives
Secant Method: approximates without derivatives
Intermediate Value Theorem
If a continuous function changes sign between two points a and b, then it must cross zero somewhere between them.
Foundation of Bisection Method
Guarantees solution in a given interval
Newton’s Method (Root Finding)
Uses:
x1 = x0 - f(x0)/f'(x0)
Fast convergence, but needs derivative
Inspired gradient descent in ML
Interpolation
Estimate value between known data points
Used in:
Missing data filling
Signal smoothing
Graphics, animations
Newton’s Interpolation
Builds a polynomial that fits multiple points
Flexible and used to smooth curves
Taylor Series
Approximates complex functions using polynomials
Used in:
Newton’s method
Approximating sin(x), e^x
Solving differential equations
Numerical Differentiation
Estimate derivatives using data points
Formula: f'(x) ≈ (f(x+h) - f(x)) / h
Used in:
Optimization
Training ML models
Gradient Descent
Method for minimizing errors
Steps:
Start with a guess
Compute the gradient
Update weights
Repeat until convergence
Used in training neural networks
Libraries handle this internally (e.g., TensorFlow, PyTorch)
Sanity Check
Quick test to verify if results make basic sense
Prevents obvious errors
Used in data validation, debugging, and before/after training
P.S:if you spot any mistakes, feel free to point them out — we’re all here to learn together! 😊
Haris
FAST-NUCES
BS Computer Science | Class of 2027
🔗 Portfolio: zenvila.github.io
🔗 GitHub: github.com/Zenvila
🔗 LinkedIn: linkedin.com/in/haris-shahzad-7b8746291
🔬 Member: COLAB (Research Lab)
Subscribe to my newsletter
Read articles from Zenvila directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Zenvila
Zenvila
I'm Haris aka Zen, currently in my 4th semester of Computer Science at FAST-NUCES and a member of COLAB (Research Lab) in Tier 3. I'm currently exploring AI/ML in its early stages, and also focusing on improving my problem-solving techniques. 🐧 Proud user of Arch Linux | Command line is my playground. I'm interested in Automation & Robotics Automation enthusiast on a mission to innovate! 🚀 Passionate about turning manual tasks into automated brilliance.