Numerical Computing for AI

Introduction

Numerical computing in computer science involves using computers to perform calculations on real numbers, which are often approximated due to the limitations of computer representation. This field is crucial for various scientific and engineering applications where analytical solutions are difficult or impossible to obtain.

In this course, we mainly focus on what numerical computing actually is and why it's called "numerical computing."

Analytical vs. Numerical Mathematics

The math that we have done so far (like calculus and multivariable calculus) is analytical math. In a sense, we use numeric values and insert them into equations to simplify and get a precise solution. Analytical math is good when we have limited variables or objects and need a simplified solution.

Take Newton's Second Law:

F = ma

This works well with two variables. But if we add a third object and try to calculate a radius or interaction, it becomes difficult to solve analytically. That’s where numerical methods come in. These methods use approximations to solve problems that are too complex for traditional analytical solutions. These approximations are very close to the actual answers.

So why do we need approximations? Because analytical methods are only feasible for limited objects. For large-scale problems, mathematicians use numerical methods, and when these methods are implemented via computers, it's called numerical computation.

Scalars and Vectors

To begin understanding numerical computing, we start with the concepts of scalars and vectors.

Scalars: A single value, with no direction. In machine learning, if we consider an equation like area = length * width, the result (area) is a scalar.
Vectors: A collection of scalars, often having direction. In machine learning, even if we are using length and width together, they are treated as a vector.

Matrices

A matrix is not just rows and columns; it acts as a transformer of vectors.

When a matrix is multiplied by a vector, the result is a new vector that represents a transformation — involving direction, magnitude, or both. Entire fields like ML and computer vision are built upon these transformations.

Linearity

What is linearity? Suppose we have two parallel lines. If, after applying a matrix transformation, the lines remain parallel and preserve the origin, this is linearity.

Eigenvectors and Eigenvalues

Eigenvector: A special vector that doesn’t change direction when a transformation (matrix) is applied. Only the magnitude changes.
Eigenvalue: Tells how much the eigenvector is stretched or shrunk during the transformation.

Applications:

Used in graph algorithms like PageRank.
Google Search is built around this.
Principal Component Analysis (PCA) uses eigenvectors to find the direction of maximum variance.

Summary:

Eigenvectors = direction of patterns
Eigenvalues = strength of those patterns

Scalars, Vectors, Matrices, and Tensors

Scalars are single numbers.
Vectors are collections of scalars.
Matrices are collections of vectors.
Tensors are higher-dimensional collections of matrices.

Floating-Point Representation

How does a computer store floating-point numbers like 10.665?

IEEE 754 Standard

Stored as:
(-1)^sign × mantissa × 2^exponent

Parts:

Sign bit: 0 = positive, 1 = negative
Mantissa: Stores the digits
Exponent: Tells where the decimal point goes

Example:
10.665 in binary: 1010.1010101... becomes 1.010101 × 2^3

Dynamic Decimal Point: Allows storing both very large and very small numbers using exponents.

Hardware Perspective

FPU (Floating-Point Unit): Special circuit in the CPU for float operations.
Registers: Store mantissa and exponent.
Instruction Set: Includes operations like FADD, FMUL.
Precision Modes: FP16, FP32, FP64 (used in AI)

Software Perspective

Handled by programming languages and libraries.

Data Types

float16: Half precision
float32: Common in ML
float64: Scientific computing

Python Example:

import numpy as np
x = np.float32(10.665)
y = np.float64(10.665)

Libraries

NumPy, SciPy, TensorFlow handle precision, rounding, and overflow/underflow automatically.
Errors and warnings can be managed using np.seterr()

HDF5 Format

Used when working with Keras/TensorFlow.
Stores: model architecture, weights, optimizer state.
More scalable than NumPy arrays (which reside entirely in memory).
Can load parts of the dataset dynamically from disk (like SSD), improving memory efficiency.

Distance Metrics

Euclidean Distance

Straight-line distance
Formula: √((x2 - x1)^2 + (y2 - y1)^2)
Used in: KNN, K-Means, recommendation systems
Weakness: sensitive to different feature scales, fails in high dimensions

Manhattan Distance

Grid-based distance (L1 norm)
Formula: |x2 - x1| + |y2 - y1|
Used in: sparse data (text, images), Lasso Regression
Weakness: ignores angles and direction

Matrix Decomposition Methods

LU Decomposition (Doolittle)

A = LU where:
- L = Lower triangular (diagonal = 1)
- U = Upper triangular
Used for solving Ax = b

Crout Method

A = LU where:
- U has diagonal of 1s

Cholesky Decomposition

A = LLᵀ
For symmetric positive-definite matrices
Used in Gaussian processes, Kalman filters

Gauss-Seidel Method

Iterative method to solve Ax = b
Improves guess step-by-step using latest calculated values

Use in AI:

Optimization problems
Sparse systems like recommendation engines
Reinforcement learning with constraints

Root Finding

Find x such that f(x) = 0
Methods:
- Bisection Method: split interval in half
- Newton-Raphson Method: uses derivatives
- Secant Method: approximates without derivatives

Intermediate Value Theorem

If a continuous function changes sign between two points a and b, then it must cross zero somewhere between them.

Foundation of Bisection Method
Guarantees solution in a given interval

Newton’s Method (Root Finding)

Uses:

x1 = x0 - f(x0)/f'(x0)
Fast convergence, but needs derivative
Inspired gradient descent in ML

Interpolation

Estimate value between known data points
Used in:
- Missing data filling
- Signal smoothing
- Graphics, animations

Newton’s Interpolation

Builds a polynomial that fits multiple points
Flexible and used to smooth curves

Taylor Series

Approximates complex functions using polynomials
Used in:
- Newton’s method
- Approximating sin(x), e^x
- Solving differential equations

Numerical Differentiation

Estimate derivatives using data points
Formula: f'(x) ≈ (f(x+h) - f(x)) / h
Used in:
- Optimization
- Training ML models

Gradient Descent

Method for minimizing errors
Steps:
1. Start with a guess
2. Compute the gradient
3. Update weights
4. Repeat until convergence
Used in training neural networks
Libraries handle this internally (e.g., TensorFlow, PyTorch)

Sanity Check

Quick test to verify if results make basic sense
Prevents obvious errors
Used in data validation, debugging, and before/after training

P.S:if you spot any mistakes, feel free to point them out — we’re all here to learn together! 😊

Haris
FAST-NUCES
BS Computer Science | Class of 2027

🔗 Portfolio: zenvila.github.io

🔗 GitHub: github.com/Zenvila

🔗 LinkedIn: linkedin.com/in/haris-shahzad-7b8746291
🔬 Member: COLAB (Research Lab)