Beginner's Guide to NumPy: From Zero to Hero


๐ Introduction
What is NumPy?
NumPy (Numerical Python) is a powerful Python library used for working with arrays, mathematical functions, and numerical data.
Why use NumPy?
Itโs faster and more efficient than regular Python lists. Itโs also the backbone of many data science and machine learning libraries like Pandas, TensorFlow, and Scikit-learn.
What will you learn?
In this post, youโll learn:
How to install and import NumPy
How to create and use NumPy arrays
Basic operations like indexing, reshaping, and math with arrays
๐งฎ NumPy Array Creation โ A Beginner-Friendly Guide
NumPy is a core Python library used in data science and machine learning. If you're working with numbers, arrays, or matrices โ NumPy is your go-to.
In this post, we'll learn how to create different types of arrays in NumPy: from lists, with default values, sequences, and identity matrices.
๐น 1. Creating Arrays from Python Lists
Use np.array()
to convert a Python list into a NumPy array.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
#output
[1 2 3 4]
๐น 2. Creating Arrays with Default Values
NumPy offers shortcuts to create arrays filled with default values like zeros, ones, or any number you choose.
โ
np.zeros(shape)
Creates an array filled with 0s.
zeroes_array = np.zeros(3)
print(zeroes_array)
#output
[0. 0. 0.]
โ
np.ones(shape)
Creates an array filled with 1s.
ones_array = np.ones((2, 3))
print(ones_array)
#output
[[1. 1. 1.]
[1. 1. 1.]]
โ
np.full(shape, value)
filled_array = np.full((2, 2), 7)
print(filled_array)
#output
[[7 7]
[7 7]]
๐น 3. Creating Sequences with np.arange()
np.arange(start, stop, step)
generates arrays with evenly spaced values.
arr = np.arange(1, 10, 2)
arr2 = np.arange(2, 10, 2)
print(arr) # [1 3 5 7 9]
print(arr2) # [2 4 6 8]
๐น 4. Creating Identity Matrices with np.eye()
An identity matrix is a square matrix with 1s on the diagonal and 0s elsewhere. Use np.eye()
to create one.
identity_matrix = np.eye(3)
print(identity_matrix)
#output
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
๐ Understanding NumPy Array Properties and Operations
After creating arrays, it's important to explore what they contain and how they behave. Here are the most commonly used array attributes and operations in NumPy.
๐น 1. .shape
โ Array Shape (Rows, Columns)
import numpy as np
arr_2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr_2d.shape)
#๐ Output: (2, 3)
๐ง Tells you the number of rows and columns in the array.
๐น 2. .size
โ Total Number of Elements
arr = np.array([[10, 20, 30], [40, 50, 60]])
print(arr.size)
#๐ Output: 6
๐ง Tells you how many elements are in the entire array.
๐น 3. .ndim
โ Number of Dimensions
arr_1d = np.array([1, 2, 3])
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_3d = np.array([[[1, 2], [3, 4], [5, 6]]])
print(arr_1d.ndim, arr_2d.ndim, arr_3d.ndim)
#๐ Output: 1 2 3
๐ง Shows how many dimensions (axes) the array has:
1D โ Vector
2D โ Matrix
3D โ Tensor
๐น 4. .dtype
โ Data Type of Elements
arr = np.array([10, 20, 30.5, 40])
print(arr.dtype)
#๐ Output: float64
๐ง NumPy uses specific data types. This tells you whether the array stores int, float, etc.
๐น 5. Type Conversion (astype()
)
Convert an array's data type using .astype()
.
arr = np.array([1.2, 2.5, 3.8])
print(arr.dtype)
int_arr = arr.astype(int)
print(int_arr)
print(int_arr.dtype)
#๐ Output:
float64
[1 2 3]
int64
๐ง Converts each element to integer (truncates decimals).
๐ง NumPy Arithmetic Operations
NumPy supports element-wise math operations directly on arrays โ no loops needed!
arr = np.array([10, 20, 30])
print(arr + 2) # Add 2 to every element
print(arr - 2) # Subtract 2
print(arr * 2) # Multiply by 2
print(arr ** 2) # Square each element
#๐ Output:
[12 22 32]
[ 8 18 28]
[20 40 60]
[100 400 900]
๐ NumPy Statistical Functions
Useful for data analysis and scientific work.
arr = np.array([10, 20, 30, 40, 50])
print(np.sum(arr)) # Total
print(np.mean(arr)) # Average
print(np.min(arr)) # Minimum
print(np.max(arr)) # Maximum
print(np.std(arr)) # Standard Deviation
print(np.var(arr)) # Variance
#๐ Output:
150
30.0
10
50
14.14...
200.0
๐ Indexing and Slicing in NumPy
Accessing and manipulating elements in an array is super easy with NumPy. Here's how you can index, slice, and filter arrays effectively.
๐น 1. Basic Indexing
Access individual elements using square brackets.
๐ง Note: Negative indices count from the end of the array
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # First element โ 10
print(arr[2]) # Third element โ 30
print(arr[-1]) # Last element โ 50
๐น 2. Indexing in 2D Arrays
Use [row, column]
format.
arr_2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr_2d[0, 1]) # First row, second column โ 2
print(arr_2d[1, 2]) # Second row, third column โ 6
๐น 3. Slicing Arrays
Use [start:stop:step]
to get a range of values.
arr = np.array([10, 20, 30, 40, 50, 60])
print(arr[1:5]) # Elements from index 1 to 4 โ [20 30 40 50]
print(arr[:4]) # First 4 elements โ [10 20 30 40]
print(arr[::2]) # Every second element โ [10 30 50]
print(arr[::-1]) # Reverse the array โ [60 50 40 30 20 10]
๐ง Remember:
start:end
goes fromstart
toend - 1
step=-1
reverses the array
๐น 4. Fancy Indexing
You can pass a list of indices to extract multiple elements.
arr = np.array([10, 20, 30, 40, 50, 60])
print(arr[[0, 2, 4]]) # โ [10 30 50]
๐น 5. Boolean Indexing (Filtering)
Select elements based on a condition:
arr = np.array([10, 20, 30, 40, 50])
print(arr[arr > 25]) # โ [30 40 50]
๐ง This is super useful in data cleaning, filtering, and masking operations.
๐ Summary
Use
arr[index]
for basic accessUse
arr[start:stop:step]
for slicingUse
arr[[i1, i2]]
for fancy indexingUse
arr[condition]
for filtering
These operations help you quickly select and manipulate array data โ without any loops!
๐ Reshaping and Flattening Arrays in NumPy
Sometimes you need to change the shape of your array โ from 1D to 2D or vice versa. NumPy makes this easy using reshape()
, flatten()
, and ravel()
.
๐น 1. Reshaping Arrays with .reshape()
You can reshape a 1D array into a 2D matrix (or any other shape) if the total number of elements remains the same.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
#๐ Output:
[[1 2 3]
[4 5 6]]
๐ง .reshape(rows, columns)
rearranges the elements without changing the data.
๐น 2. Flattening Arrays
NumPy offers two main ways to flatten a multi-dimensional array into 1D:
โ
flatten()
โ Returns a copy
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d.flatten()) # [1 2 3 4 5 6]
โ
ravel()
โ Returns a view
print(arr_2d.ravel()) # [1 2 3 4 5 6]
๐ Key Difference:
flatten()
returns a new array (copy of data)ravel()
returns a view (modifying it may affect the original)
๐ฏ Summary
Function | What It Does | Notes |
reshape() | Changes shape of array | Must match element count |
flatten() | Flattens to 1D (copy) | Safe to modify |
ravel() | Flattens to 1D (view) | Faster, but modifies original |
๐ง Modifying NumPy Arrays
NumPy provides several functions to insert, append, delete, stack, and split arrays. These are useful for data preprocessing, matrix transformation, and more.
๐น np.insert()
โ Insert Elements
arr = np.array([10, 20, 30, 40])
new_arr = np.insert(arr, 2, 100)
print(new_arr) # [10 20 100 30 40]
For 2D arrays, use the axis
parameter:
arr_2d = np.array([[1, 2], [3, 4]])
print(np.insert(arr_2d, 1, [5, 6], axis=0)) # Insert row
print(np.insert(arr_2d, 1, [5, 6], axis=1)) # Insert column
๐ axis=0
โ row-wise, axis=1
โ column-wise, axis=None
flattens array first.
๐น np.append()
โ Append Elements
arr = np.array([10, 20, 30])
new_arr = np.append(arr, [40, 50])
print(new_arr) # [10 20 30 40 50]
๐น np.concatenate()
โ Join Arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.concatenate((arr1, arr2))) # [1 2 3 4 5 6]
You can also stack in 2D using axis=0
or axis=1
.
๐น np.delete()
โ Remove Elements
arr = np.array([10, 20, 30, 40])
new_arr = np.delete(arr, 0)
print(new_arr) # [20 30 40]
2D Example:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(np.delete(arr_2d, 0, axis=0)) # Delete first row
๐น Stacking Arrays
np.vstack()
โ Vertical (row-wise) stacknp.hstack()
โ Horizontal (column-wise) stack
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.vstack((arr1, arr2)))
print(np.hstack((arr1, arr2)))
๐น Splitting Arrays
Split one array into multiple sub-arrays:
pythonCopyEditarr = np.array([10, 20, 30, 40, 50, 60])
print(np.split(arr, 2)) # Split into 2 equal parts
Also works with 2D arrays:
np.hsplit()
โ split horizontally (columns)np.vsplit()
โ split vertically (rows)
โ Summary Table
Function | Purpose |
np.insert() | Insert elements into array |
np.append() | Append elements |
np.concatenate() | Combine arrays |
np.delete() | Delete elements |
np.vstack() | Stack arrays vertically |
np.hstack() | Stack arrays horizontally |
np.split() | Split array equally |
๐ NumPy Broadcasting โ Fast Array Operations
โ The Problem with Loops
In vanilla Python:
prices = [100, 200, 300]
discount = 10
final_prices = []
for price in prices:
final_price = price - (price * discount / 100)
final_prices.append(final_price)
print(final_prices)
โ It works, but it's slow. Looping in Python isn't efficient for large data.
โ NumPy Solution: Broadcasting
import numpy as np
prices = np.array([100, 200, 300])
discount = 10
final_prices = prices - (prices * discount / 100)
print(final_prices) # [90. 180. 270.]
๐ Broadcasting allows NumPy to apply operations between arrays of different shapes without writing loops.
๐น More Broadcasting Examples
Multiply all elements:
arr = np.array([100, 200, 300])
print(arr * 2) # [200 400 600]
Add vector to matrix (row-wise broadcasting):
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([10, 20, 30])
result = matrix + vector
print(result)
โก NumPy automatically stretches the vector to match matrix shape.
โ ๏ธ Example: Incompatible Shapes
arr1 = np.array([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
arr2 = np.array([1, 2]) # shape (2,)
result = arr1 + arr2 # โ ValueError: shapes (2,3) and (2,) not aligned
To fix it, make sure shapes are broadcast-compatible. (You could reshape arr2
to (2,1) or (1,3), depending on intent.)
๐ Element-wise Operations โ Python Lists vs NumPy Arrays
๐ข Native Python (with zip
+ list comprehension)
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = [x + y for x, y in zip(list1, list2)]
print(result) # [5, 7, 9]
โ It works, but it's not as readable and doesn't scale well for large data.
โก NumPy Makes It Easier
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
print(result) # [5 7 9]
No
zip
No loops
Clean, efficient, and fast
๐งฎ Scalar Operations
arr = np.array([10, 20, 30])
print(arr * 3) # [30 60 90]
โก NumPy applies the operation to each element automatically โ thanks to broadcasting.
๐งผ Cleaning Data with NaN and Inf in NumPy
๐ Detecting NaN (Not a Number)
import numpy as np
arr = np.array([1, 2, np.nan, 4, np.nan, 6])
print(np.isnan(arr)) # [False False True False True False]
np.isnan()
returns a boolean array markingNaN
values.
โ Note:
print(np.nan == np.nan) # False
You can't compare NaN
with ==
. Always use np.isnan()
.
๐ Replacing NaN with a Value
cleaned_arr1 = np.nan_to_num(arr) # Default replaces NaN with 0
cleaned_arr2 = np.nan_to_num(arr, nan=100)
print(cleaned_arr1) # [ 1. 2. 0. 4. 0. 6.]
print(cleaned_arr2) # [ 1. 2. 100. 4. 100. 6.]
โ ๏ธ Detecting and Replacing Infinite Values
arr = np.array([1, 2, np.inf, 4, -np.inf, 6])
print(np.isinf(arr))
# [False False True False True False]
You can also replace them using np.nan_to_num
:
cleaned_arr = np.nan_to_num(arr, posinf=1000, neginf=-1000)
print(cleaned_arr) # [ 1. 2. 1000. 4. -1000. 6.]
โ Summary
np.isnan()
โ Check forNaN
np.isinf()
โ Check forinf
/-inf
np.nan_to_num()
โ Replace all problematic values in one go
This is critical for data cleaning in machine learning pipelines or large-scale numerical computation.
Subscribe to my newsletter
Read articles from Harsh Gohil directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
