20 Scipy concepts with Before-and-After Examples
Table of contents
- 1. Optimization (scipy.optimize.minimize) ๐
- 2. Root Finding (scipy.optimize.root) ๐งฎ
- 3. Linear Algebra (scipy.linalg.solve) ๐งฎ
- 4. Integration (scipy.integrate.quad) ๐
- Why do we need to calculate the area under the curve?
- 5. Solving Differential Equations (scipy.integrate.odeint) ๐
- 6. Statistics (scipy.stats.norm) ๐ฒ
- Analogy:
- Why use statistical distributions?
- 7. Signal Processing (scipy.signal.find_peaks) ๐ถ
- 8. Sparse Matrices (scipy.sparse.csr_matrix) ๐งฎ
- Why do we need sparse matrices?
- 9. Fourier Transforms (scipy.fftpack.fft) ๐
- 10. Interpolation (scipy.interpolate.interp1d) ๐
- 11. Image Processing (scipy.ndimage) ๐ผ๏ธ
- 12. Distance Calculations (scipy.spatial.distance) ๐
- Summary:
- 13. Clustering (scipy.cluster.hierarchy) ๐ฅ
- 14. Statistics Test (scipy.stats.ttest_ind) ๐
- Why is this useful?
- 15. Cubic Spline Interpolation (scipy.interpolate.CubicSpline) ๐
- 16. Signal Convolution (scipy.signal.convolve) ๐
- Why use convolution?
- 17. Gaussian KDE (Kernel Density Estimation) ๐๏ธ
- 18. Matrix Decompositions (scipy.linalg.svd) ๐งฉ
- 19. Signal Filtering (scipy.signal.butter and lfilter) ๐
- 20. Principal Component Analysis (PCA with scipy.linalg) ๐ง
1. Optimization (scipy.optimize.minimize) ๐
Going to the lowest point while hiking, in the context of optimization, is like finding the most comfortable and efficient path to reach your goal ๐๏ธ. Imagine you're carrying a heavy backpack, and the lowest point in the valley represents the place with the least effort needed to continue (just like minimizing energy, cost, or other objectives in real life).
In optimization, the lowest point represents the most efficient solution, where the model is "happy" with the least amount of error or cost. This is like saving energy in the following ways:
Energy in computations: The model doesn't have to keep adjusting itself over and over again if it finds the optimal solution quickly. This minimizes the computational "energy" (resources) needed.
Better performance: Just like conserving energy during a hike makes it easier, finding the minimum of a function allows the model to work more efficiently and make better predictions without extra strain (like overfitting or unnecessary calculations).
In real-life optimization:
The lowest point represents the optimal solutionโwhether it's using the least energy, saving the most money, or reducing risk.
Boilerplate Code:
from scipy.optimize import minimize
Use Case: Perform optimization to find the minimum value of a function. ๐
Goal: Find the best solution by minimizing an objective function. ๐ฏ
Sample Code:
# Define an objective function
def objective(x):
return x**2 + 5*x + 4
# Minimize the objective function
result = minimize(objective, x0=0)
print(result.x)
Before Example: need to find the minimum value of a function but doesnโt know how to proceed. ๐ค
Objective Function: x**2 + 5x + 4
After Example: With scipy.optimize.minimize(), we find the minimum efficiently! ๐
Minimum Value: -2.5
Challenge: ๐ Try optimizing a multi-dimensional function with constraints.
2. Root Finding (scipy.optimize.root) ๐งฎ
Why do we need to find the root of a function?
In real-world problems, finding the root of a function helps solve important questions, such as:
Solving equations: If you have an equation and want to know when something equals zero (like balance in an account, or when a projectile hits the ground), finding the root gives you the answer.
Intersection points: If you want to know when two things are equal (like supply vs. demand in economics), finding the root tells you where they meet.
Physics and engineering: Roots are used to calculate when forces, velocities, or movements result in equilibrium (zero force, zero velocity, etc.).
Boilerplate Code:
from scipy.optimize import root
Use Case: Find the roots of a function, i.e., where the function equals zero. ๐งฎ
Goal: Solve equations by finding the values that make the function output zero. ๐ฏ
Sample Code:
# Define a function
def func(x):
return x**2 - 4
# Find the root
result = root(func, x0=0)
print(result.x)
Before Example: Need to solve an equation but doesnโt know where the function crosses zero. ๐ค
Function: x**2 - 4
After Example: With scipy.optimize.root(), we find the root of the function! ๐งฎ
Root: ยฑ2
Challenge: ๐ Try finding roots for more complex nonlinear systems of equations.
3. Linear Algebra (scipy.linalg.solve) ๐งฎ
Boilerplate Code:
from scipy.linalg import solve
Use Case: Solve linear equations such as Ax = b
where A
is a matrix and x
is a vector of unknowns. ๐งฎ
Goal: Find the solution to systems of linear equations. ๐ฏ
Sample Code:
# Define a matrix A and vector b
A = [[3, 2], [1, 2]]
b = [5, 5]
# Solve the system of equations
x = solve(A, b)
print(x)
Before Example: we have a system of linear equations but no efficient way to solve it. ๐ค
Equations: 3x + 2y = 5, x + 2y = 5
After Example: With scipy.linalg.solve(), the system is solved efficiently! ๐งฎ
Solution: x = 1, y = 2
Challenge: ๐ Try solving larger systems with more equations and variables.
4. Integration (scipy.integrate.quad) ๐
Calculating the area under a curve is like measuring how much water fills a pool ๐โโ๏ธ. Imagine the curve is the shape of the poolโs floor, and you want to know how much water is needed to fill it up. The area under the curve tells you the total amount of "water" (or space) contained between the curve and the ground (the x-axis).
Why do we need to calculate the area under the curve?
In real-world applications, the area under the curve is important for:
Physics: It can represent total energy, distance traveled, or accumulated quantity (like charge or mass).
Economics: You might calculate the total profit or cost over time (for example, area under a demand curve).
Probability: In statistics, the area under a probability density function represents the likelihood of an event occurring within certain limits.
So, calculating the area under the curve helps us understand the total effect or accumulation of something across a range of values ๐ฏ, whether it's distance, profit, probability, or energy!
Boilerplate Code:
from scipy.integrate import quad
Use Case: Perform numerical integration to calculate the area under a curve. ๐
Goal: Integrate a function between two limits. ๐ฏ
Sample Code:
# Define a function to integrate
def integrand(x):
return x**2
# Perform the integration from 0 to 1
result, error = quad(integrand, 0, 1)
print(result)
Before Example: We have a function but doesnโt know how to calculate the area under the curve. ๐ค
Function: x**2, limits from 0 to 1
After Example: With scipy.integrate.quad(), we get the integral and error estimate! ๐
Integral: 0.333 (area under the curve)
Challenge: ๐ Try integrating functions with more complex boundaries or integrals.
5. Solving Differential Equations (scipy.integrate.odeint) ๐
Boilerplate Code:
from scipy.integrate import odeint
Use Case: Solve ordinary differential equations (ODEs). ๐
Why do we need to solve ODEs?
Physics: ODEs describe how systems change over time, like the motion of objects, electric circuits, or chemical reactions.
Biology: ODEs are used to model population growth, the spread of diseases, or biological processes.
Economics: ODEs can model financial systems, such as interest rates or investments over time.
In summary, solving ODEs helps us predict the future behavior of dynamic systems based on their current state and how they change ๐
Goal: Compute the solutions for ODEs with initial conditions. ๐ฏ
Sample Code:
# Define an ODE system
def model(y, t):
dydt = -0.5 * y
return dydt
# Initial condition and time points
y0 = 5
t = [0, 1, 2, 3, 4, 5]
# Solve the ODE
result = odeint(model, y0, t)
print(result)
Before Example: We have a differential equation but no numerical way to solve it. ๐ค
ODE: dy/dt = -0.5 * y
After Example: With scipy.integrate.odeint(), the solution to the ODE is found! ๐
Solution: y at different time points
Challenge: ๐ Try solving a system of differential equations with multiple variables.
6. Statistics (scipy.stats.norm) ๐ฒ
Working with a normal distribution is like baking batches of cookies ๐ช. Imagine you're baking 100 cookies, and you want them to all be about the same size. Most of your cookies will come out close to the perfect size, but a few will be slightly bigger or smaller. However, itโs very rare to have cookies that are way too big or way too small.
Analogy:
Random Samples: If you randomly pick cookies from the batch, most will be close to the ideal size (the average), but some will be a little larger or smaller.
PDF (Probability Density Function): The PDF tells you how likely it is for a cookie to be a certain size. Around the ideal size, the likelihood is high (most cookies will be that size), but as you look at much larger or smaller sizes, the likelihood drops.
CDF (Cumulative Distribution Function): The CDF is like counting how many cookies are smaller than a certain size. It helps you see the overall spread of cookie sizes.
Why use statistical distributions?
Modeling real-world data: The normal distribution can model data like cookie sizes, where most values (cookie sizes) are close to the average, but some are slightly different.
Probabilities: You can calculate the likelihood of an event, like how likely a cookie will be within a certain size range.
Random sampling: Useful for simulations, like mimicking different batches of cookies with varying sizes.
So, using scipy.stats.norm
is like understanding the typical size of your cookies ๐ชโmost are close to average, with a few that are bigger or smaller!
Boilerplate Code:
from scipy.stats import norm
Use Case: Work with statistical distributions such as the normal distribution. ๐ฒ
Goal: Generate random samples, calculate probabilities, and more. ๐ฏ
Sample Code:
# Generate random samples from a normal distribution
samples = norm.rvs(loc=0, scale=1, size=100)
# Calculate the probability density function (PDF)
pdf = norm.pdf(0)
# Calculate the cumulative density function (CDF)
cdf = norm.cdf(0)
Before Example: need to generate samples and work with probability distributions but lacks tools. ๐ค
Need: Normal distribution samples and calculations.
After Example: With scipy.stats.norm(), we can work with normal distributions easily! ๐ฒ
Samples, PDF, CDF: all calculated from the normal distribution.
Challenge: ๐ Try working with other distributions like scipy.stats.binom
or scipy.stats.poisson
.
7. Signal Processing (scipy.signal.find_peaks) ๐ถ
Why do we need to find peaks?
Identifying key events: In many real-world scenarios, peaks represent important events, like a heart rate spike in medical data, price surges in financial markets, or signal bursts in communication systems.
Pattern recognition: Peaks can help reveal the underlying pattern or rhythm in data, such as periodic signals or cyclical trends.
Anomaly detection: Peaks often indicate unusual or significant behavior, like finding the highest points in a dataset where something noteworthy occurs.
Boilerplate Code:
from scipy.signal import find_peaks
Use Case: Detect peaks in a signal or dataset, often used in signal processing. ๐ถ
Goal: Identify points where the signal reaches a local maximum. ๐ฏ
Sample Code:
# Create a simple signal
signal = [0, 1, 0, 2, 0, 3, 0]
# Find the peaks
peaks, _ = find_peaks(signal)
print(peaks)
Before Example: we have a signal but struggles to find the peaks. ๐ค
Signal: [0, 1, 0, 2, 0, 3, 0]
After Example: With find_peaks(), we easily identifies the peak locations! ๐ถ
Peaks: at indices [1, 3, 5]
Challenge: ๐ Try working with noisy signals and using different parameters to fine-tune peak detection.
8. Sparse Matrices (scipy.sparse.csr_matrix) ๐งฎ
Using sparse matrices is like keeping a list of just the highlighted pages in a long book ๐. Imagine you have a massive book with 1,000 pages, but only a few pages have important information highlighted (non-zero values). Instead of carrying around the whole book, you keep a list of just the page numbers that have highlights. This way, you can quickly refer to the important parts without carrying or flipping through the entire book.
Why do we need sparse matrices?
Memory savings: Instead of storing every page (including the unimportant ones), you store only the relevant page numbers (non-zero values), saving lots of space.
Efficient operations: If you only need to reference or process the highlighted pages, itโs much faster than going through every single page, especially if most of them donโt matter (are zeros).
Real-world data: In many fields like machine learning, text analysis, or network analysis, the data we work with often has lots of โemptyโ spots (zeros), so focusing on just the important parts (non-zero entries) makes things much more efficient.
This analogy helps illustrate how sparse matrices let you efficiently manage and process large datasets by focusing only on the important data ๐!
Boilerplate Code:
from scipy.sparse import csr_matrix
Use Case: Efficiently store and work with sparse matrices where most elements are zero. ๐งฎ
Goal: Save memory and computational power by using sparse matrices. ๐ฏ
Sample Code:
# Create a sparse matrix
matrix = csr_matrix([[0, 0, 1], [1, 0, 0], [0, 1, 0]])
# Convert back to a dense matrix if needed
dense_matrix = matrix.toarray()
Before Example: work with large matrices filled mostly with zeros, wasting memory. ๐ค
Dense matrix: inefficient memory usage.
After Example: With scipy.sparse.csr_matrix(), the data is stored more efficiently! ๐งฎ
Sparse Matrix: memory-efficient storage.
Challenge: ๐ Try performing matrix operations like addition or multiplication on sparse matrices.
9. Fourier Transforms (scipy.fftpack.fft) ๐
The point of using Fourier Transforms is to understand the "hidden" frequency patterns in a signal ๐ถ. Imagine you're listening to a piece of music. In the time domain, all you hear is a mix of different sounds playing at once (similar to how the raw signal looks). But by converting it to the frequency domain (using a Fourier Transform), you can break down the music into its individual notes or instruments, revealing which frequencies are present and how strong they are.
Why use Fourier Transforms?
Find underlying patterns: Some signals, like sound waves or stock market data, might look random in the time domain, but in the frequency domain, you can detect repeating patterns or frequencies.
Filter out noise: In audio, communications, and image processing, Fourier transforms help you identify and remove unwanted frequencies (noise).
Analyze vibrations: Engineers use Fourier Transforms to study vibrations in machines, identifying problematic frequencies that might indicate wear or failure.
In essence, Fourier Transforms help you see the signalโs โingredientsโ by analyzing its frequency components, giving you insights into hidden patterns or characteristics that arenโt obvious in the time domain ๐.
Boilerplate Code:
from scipy.fftpack import fft
Use Case: Compute the Fourier transform of a signal to convert it from the time domain to the frequency domain. ๐
Goal: Analyze the frequency components of a signal. ๐ฏ
Sample Code:
# Create a simple signal
signal = [0, 1, 0, 2, 0, 3, 0]
# Compute the Fourier Transform
transformed_signal = fft(signal)
print(transformed_signal)
Before Example: We have a time-domain signal but needs to analyze its frequency components. ๐ค
Signal: in time domain.
After Example: With fft(), the signal is transformed into the frequency domain! ๐
Frequency Components: transformed signal.
Challenge: ๐ Try computing the inverse Fourier transform with ifft()
to return to the time domain.
10. Interpolation (scipy.interpolate.interp1d) ๐
In short, interpolation is a process of determining the unknown values that lie in between the known data points.
Real-world use case in machine learning:
In machine learning, interpolation is useful when:
Missing data: You might have a dataset with gaps in it (missing values). Interpolation helps fill in those gaps by estimating what those values should be based on surrounding data.
Resampling: When you have time series data (e.g., stock prices, sensor readings) recorded at irregular intervals, interpolation helps you create regular intervals by estimating values between the recorded data points.
Data smoothing: When you're trying to create smoother curves or trends in your data, interpolation helps generate intermediate val ues that smooth out the visual or computational representation.
In summary, interpolation helps fill in the missing pieces between known data points, ensuring smooth transitions and estimations for better predictions ๐
Boilerplate Code:
from scipy.interpolate import interp1d
Use Case: Perform interpolation to estimate unknown values between known data points. ๐
Goal: Create a function that smoothly interpolates between data points. ๐ฏ
Sample Code:
# Known data points
x = [0, 1, 2, 3]
y = [0, 1, 0, 1]
# Create the interpolation function
f = interp1d(x, y, kind='linear')
# Interpolate new values
new_values = f([0.5, 1.5, 2.5])
print(new_values)
Before Example: has data but needs to estimate values between known points. ๐ค
Known Points: [0, 1, 2, 3]
After Example: With interp1d(), can now estimate values between points! ๐
Interpolated Values: smooth estimates between known points.
Challenge: ๐ Try using different interpolation methods such as cubic
or nearest
.
11. Image Processing (scipy.ndimage) ๐ผ๏ธ
Boilerplate Code:
from scipy import ndimage
Use Case: Perform basic image processing tasks like filtering, transformations, or morphological operations. ๐ผ๏ธ
Goal: Manipulate and process images for analysis. ๐ฏ
Sample Code:
# Sample image (2D array)
image = [[0, 1, 1], [1, 0, 1], [0, 0, 1]]
# Apply a Gaussian filter
filtered_image = ndimage.gaussian_filter(image, sigma=1)
print(filtered_image)
Before Example: Need to apply transformations and filters to an image but doesn't have the tools. ๐ค
Image: unprocessed, noisy, or blurred.
After Example: With scipy.ndimage, we can apply filters and transformations to improve the image! ๐ผ๏ธ
Filtered Image: Gaussian smoothed image.
Challenge: ๐ Try experimenting with different filters like median_filter()
or transformations like rotate()
.
12. Distance Calculations (scipy.spatial.distance) ๐
Think of distance calculations as figuring out how "different" two things are ๐, without diving into complicated math. Letโs break it down:
Euclidean Distance: This is like measuring the straight-line distance between two houses on a map ๐ ๐
Cosine distance is like comparing the way two people are walking ๐โโ๏ธ๐โโ๏ธ. Imagine two people walking:
Small Cosine Distance: If they are walking in the same direction, theyโre very similar (low distance).
Large Cosine Distance: If theyโre walking in opposite directions, theyโre very different (high distance).
Cosine distance doesnโt care how far apart they are, just whether they are going in the same or opposite directions. This is useful for comparing things like text or behaviors, where the "direction" (similarity) matters more than actual distance.
Why do we care about distances in real life?
In machine learning, distances are important because:
Comparing data: For example, when recommending movies, you might want to know how "close" your preferences are to someone else's. The smaller the distance, the more similar the preferences, and you might get the same movie recommendation.
Clustering: You want to group similar things together, like putting similar customers in the same group based on their behavior. Distance tells you how similar or different they are.
In these two cases, the choice between Cosine and Euclidean distance depends on the nature of the data and what youโre comparing:
Movie Recommendations (Comparing Data):
- Cosine Distance is typically used here because it focuses on the "direction" or similarity between preferences rather than their magnitude. For example, two users might give similar ratings to different sets of movies, even if the actual values are different (e.g., one user rates on a scale of 1-5 and another on 2-10). Cosine distance measures how aligned their preferences are regardless of how "far apart" the ratings are. Itโs useful when you want to compare the overall pattern of preferences rather than exact numbers.
Clustering (Grouping Similar Customers):
- Euclidean Distance is often used for clustering when you want to group customers based on exact numerical values like age, income, or spending habits. This distance measures the "straight-line" distance between points, so itโs effective when you care about the actual magnitude of the difference between customer behaviors or characteristics.
Summary:
Cosine Distance: Use when comparing patterns or relationships (e.g., movie preferences, text similarities).
Euclidean Distance: Use when comparing numerical values and magnitudes (e.g., customer behavior, clustering based on measurable features).
Boilerplate Code:
from scipy.spatial.distance import euclidean, cosine
Use Case: Compute various distance metrics between points or vectors, such as Euclidean, Manhattan, or Cosine distances. ๐
Goal: Measure how far apart two points or vectors are. ๐ฏ
Sample Code:
# Define two points
point1 = [1, 2]
point2 = [4, 6]
# Compute Euclidean and Cosine distance
euclid_dist = euclidean(point1, point2)
cosine_dist = cosine(point1, point2)
print(euclid_dist, cosine_dist)
Before Example: Need to calculate the distance between data points but isn't sure how. ๐ค
Points: [1, 2], [4, 6]
After Example: With scipy.spatial.distance, distances between points are calculated! ๐
Distances: Euclidean = 5, Cosine = 0.02
Challenge: ๐ Try calculating different distances (e.g., Manhattan, Minkowski) for various datasets.
13. Clustering (scipy.cluster.hierarchy) ๐ฅ
Hierarchical clustering helps you figure out which people naturally belong in the same team by checking how similar they are.
Linkage: Think of it like measuring how close two people are in terms of their interests. You start by linking the two most similar people, then slowly add more people to the teams based on how similar they are to the existing members.
Dendrogram: This is like a family tree ๐งฌ that shows how the groups were formed. It starts with individuals and branches out, showing how smaller groups join to form bigger teams.
Boilerplate Code:
from scipy.cluster.hierarchy import linkage, dendrogram
Use Case: Perform hierarchical clustering to group similar data points together. ๐ฅ
Goal: Visualize and analyze clusters in data. ๐ฏ
Sample Code:
# Sample data
data = [[1, 2], [3, 4], [5, 6], [8, 8]]
# Perform hierarchical clustering
linked = linkage(data, method='ward')
# Create dendrogram
dendrogram(linked)
Before Example: The intern has data but can't identify meaningful clusters. ๐ค
Data: [1, 2], [3, 4], [5, 6], [8, 8]
After Example: With linkage() and dendrogram(), the data is grouped into meaningful clusters! ๐ฅ
Clusters: hierarchical visualization of groups.
Challenge: ๐ Try using different clustering methods like single
, complete
, or average
. Try grouping customers: A business might want to cluster customers who have similar buying habits. Try organizing data: Scientists might use clustering to group similar species or chemicals.
14. Statistics Test (scipy.stats.ttest_ind) ๐
Think of a t-test like comparing the average scores of two teams after a game ๐. You want to know if one team really played better or if the difference in scores was just by chance.
Group1 vs. Group2: Imagine you have two teams, and youโre comparing their average scores after a match.
T-test: This is like asking, "Did one team consistently score higher, or is the difference just random?"
The t-test checks if the difference in averages between two groups is big enough to say, โYes, this team really performed better!โ rather than just getting lucky in a few rounds.
- P-value: If the p-value is small (like less than 0.05), it means the difference is likely real. If itโs bigger, the difference could just be by chance.
Why is this useful?
Compare treatments: In medicine, you might compare two treatments to see if one works better.
Test results: In education, you might compare test scores from two different classes to see if one teaching method is better.
So, a t-test helps you figure out if two groups are truly different or if itโs just random chance ๐!
Boilerplate Code:
from scipy.stats import ttest_ind
Use Case: Perform a t-test to check if the means of two samples are significantly different. ๐
Goal: Test whether the difference between two groups is statistically significant. ๐ฏ
Sample Code:
# Sample data
group1 = [2.1, 2.5, 2.8, 3.2]
group2 = [3.1, 3.3, 3.6, 3.8]
# Perform the t-test
stat, p_value = ttest_ind(group1, group2)
print(p_value)
Before Example: Need to compare two groups but doesn't know if the difference is significant. ๐ค
Data: Group1 = [2.1, 2.5], Group2 = [3.1, 3.3]
After Example: With ttest_ind(), we can determine if the difference is statistically significant! ๐
P-Value: 0.02 (Significant difference)
Challenge: ๐ Try running other tests like paired t-tests (ttest_rel()
) or non-parametric tests (mannwhitneyu()
).
15. Cubic Spline Interpolation (scipy.interpolate.CubicSpline) ๐
Boilerplate Code:
from scipy.interpolate import CubicSpline
Use Case: Perform cubic spline interpolation to create a smooth curve through data points. ๐
Goal: Fit a smooth curve between data points with cubic splines. ๐ฏ
Sample Code:
# Known data points
x = [0, 1, 2, 3]
y = [0, 2, 1, 3]
# Create the cubic spline interpolation function
cs = CubicSpline(x, y)
# Interpolate new values
new_values = cs([0.5, 1.5, 2.5])
print(new_values)
Before Example: We have data but needs a smooth curve through the points. ๐ค
Known Points: [0, 1, 2, 3], [0, 2, 1, 3]
After Example: With CubicSpline(), we create a smooth interpolating curve! ๐
Interpolated Values: Smooth estimates between points.
Challenge: ๐ Try plotting the spline function along with the original data points for visualization.
16. Signal Convolution (scipy.signal.convolve) ๐
Convolution means combining/blending two things to create a new result.
In image processing, convolution is used to apply filters to images, like sharpening or blurring a photo. It's like "mixing" the image data with a filter to get a new result.
Why use convolution?
Signal processing: Combine two signals to apply filters, smooth out noise, or modify data.
Data filtering: Helps clean up or modify data, like removing noise from audio or improving the clarity of an image.
Boilerplate Code:
from scipy.signal import convolve
Use Case: Perform convolution of two signals to combine them. ๐
Goal: Convolve signals to filter or modify them for various purposes. ๐ฏ
Sample Code:
# Sample signals
signal1 = [1, 2, 3]
signal2 = [0, 1, 0.5]
# Perform convolution
convolved_signal = convolve(signal1, signal2)
print(convolved_signal)
Before Example: We have two signals but doesnโt know how to combine them through convolution. ๐ค
Signals: [1, 2, 3], [0, 1, 0.5]
After Example: With convolve(), the signals are combined through convolution! ๐
Convolved Signal: [0, 1, 2.5, 4, 1.5]
Challenge: ๐ Try convolving different signals and analyzing the result, or apply it to image filtering.
17. Gaussian KDE (Kernel Density Estimation) ๐๏ธ
When to Use KDE:
Small datasets: If you have limited data, histograms might be too rough or misleading because each bar might fluctuate too much. KDE gives a better sense of the true data distribution.
Avoiding bin choice issues: Histograms depend heavily on how you choose the bins (the bar widths). A wrong bin size can either hide or exaggerate patterns in the data. KDE avoids this issue by smoothing things out automatically.
Smooth trends: In fields like biology, economics, or finance, you often want to see gradual trends in data, rather than sharp jumps. KDE is great for spotting these trends.
Probability Density Estimation: When you want to estimate the likelihood of data points within certain ranges, KDE gives you a smoother probability curve, which can be useful in statistics and machine learning.
Example:
Stock market data: If you're analyzing stock prices over time, a smooth KDE curve can help you identify overall trends (like when prices cluster around certain values), rather than seeing jagged, short-term fluctuations.
Income distribution: In economics, you might want to know how incomes are spread across a population. KDE provides a smoother estimate of how common different income levels are, without the rigid cutoff of histogram bins.
In short, you use KDE when you need a clearer picture of underlying patterns and want to avoid the rigid, blocky nature of histograms ๐ฏ.
Boilerplate Code:
from scipy.stats import gaussian_kde
Use Case: Perform Kernel Density Estimation (KDE) to estimate the probability density function of a dataset. ๐๏ธ
Goal: Smoothly estimate the distribution of your data. ๐ฏ
Sample Code:
# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
# Perform KDE
kde = gaussian_kde(data)
density = kde.evaluate([2, 3, 4])
print(density)
Before Example: need to estimate the underlying probability distribution. ๐ค
Data: [1, 2, 2, 3, 3, 3, 4, 4]
After Example: With gaussian_kde(), we can estimate the probability density function! ๐๏ธ
Density: estimated at points [2, 3, 4]
Challenge: ๐ Try plotting the KDE along with a histogram of the original data.
18. Matrix Decompositions (scipy.linalg.svd) ๐งฉ
Matrix decomposition, like Singular Value Decomposition (SVD), is kind of like taking apart a LEGO structure ๐งฉ so you can understand its individual piece.
In real life, SVD helps you:
Simplify data: SVD can reduce the size of big data without losing much important information (like simplifying the LEGO structure but keeping the main parts).
Image compression: It can break down an image into parts and help reduce file size by focusing on the most important details.
Recommendation systems: In movie recommendation systems (like Netflix), SVD helps find patterns between users and movies by simplifying big data matrices into manageable pieces.
In short, SVD breaks down a complex matrix into smaller, understandable pieces, just like breaking down a LEGO model to see how itโs built ๐งฉ!
Boilerplate Code:
from scipy.linalg import svd
Use Case: Perform Singular Value Decomposition (SVD) to decompose a matrix into its components. ๐งฉ
Goal: Break down a matrix into singular values and vectors. ๐ฏ
Sample Code:
# Define a matrix
matrix = [[1, 2], [3,
4]]
# Perform SVD
U, s, Vh = svd(matrix)
print(U, s, Vh)
Before Example: We have a matrix but need to decompose it for analysis. ๐ค
Matrix: [[1, 2], [3, 4]]
After Example: With SVD, the matrix is decomposed into its singular values and vectors! ๐งฉ
Decomposed: U, s, Vh matrices
Challenge: ๐ Try reconstructing the original matrix from the SVD components.
19. Signal Filtering (scipy.signal.butter and lfilter) ๐
In real life, signal filtering helps:
Clean up audio: Remove unwanted noise from sound recordings.
Data smoothing: Make your data easier to analyze by removing random fluctuations (noise).
Medical signals: Filter out noise in heart rate or brain wave signals, so doctors can see clean data.
Boilerplate Code:
from scipy.signal import butter, lfilter
Use Case: Design a Butterworth filter and apply it to a signal for filtering. ๐
Goal: Remove noise or specific frequencies from a signal. ๐ฏ
Sample Code:
# Design a low-pass Butterworth filter
b, a = butter(N=2, Wn=0.2, btype='low')
# Apply the filter to a signal
filtered_signal = lfilter(b, a, [1, 2, 3, 4, 5])
print(filtered_signal)
Before Example: We have a noisy signal but needs to filter out unwanted frequencies. ๐ค
Signal: noisy, unfiltered data.
After Example: With butter() and lfilter(), the signal is filtered for smoother analysis! ๐
Filtered Signal: noise reduced.
Challenge: ๐ Try designing high-pass, band-pass, or band-stop filters for different signal types.
20. Principal Component Analysis (PCA with scipy.linalg) ๐ง
Boilerplate Code:
from scipy.linalg import eigh
Use Case: Perform Principal Component Analysis (PCA) to reduce the dimensionality of a dataset. ๐ง
Goal: Extract the most important components of your data. ๐ฏ
Sample Code:
# Sample covariance matrix
cov_matrix = [[2.9, 0.8], [0.8, 0.6]]
# Perform PCA using eigen decomposition
eigenvalues, eigenvectors = eigh(cov_matrix)
print(eigenvalues, eigenvectors)
Before Example: We have high-dimensional data and needs to reduce it for simpler analysis. ๐ค
Data: high-dimensional, complex
After Example: With PCA, the data is reduced to its most important components! ๐ง
Principal Components: eigenvectors extracted.
Challenge: ๐ Try applying PCA to a real-world dataset and plot the resulting principal components.
Subscribe to my newsletter
Read articles from Anix Lynch directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by