Day 13: Mastering Support Vector Machines (SVM)

Saket KhopkarSaket Khopkar
8 min read

Let’s say you’re a bouncer at a nightclub. People arrive, and based on their outfit and age, you decide if they belong in the “VIP” area or the “General” crowd.

Now, you want to draw a line (mentally or on paper) that clearly separates these two groups.

SVM helps you find the best possible boundary; not just any line, but the optimal one, such that:

  • It’s as far away as possible from both groups.

  • It’s robust, meaning small shifts or noise in data won’t easily mess it up.

This “best boundary” is called the maximum margin hyperplane.

Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and sometimes regression problems.


Things you should know

Well, before learning this algorithm in much detail, there are few key terminologies you should be well aware of.

TermMeaning
HyperplaneA line (2D), plane (3D), or n-dimensional surface that divides the space into classes.
Support VectorsData points that lie closest to the decision boundary (they “support” the boundary).
MarginThe distance from the hyperplane to the nearest data point on either side.
Maximal MarginThe widest possible distance between the boundary and the closest data points from each class.

Support Vector Machine Algorithm - Tpoint Tech

A hyperplane is just a fancy word for a line (in 2D), a plane (in 3D), or a surface that separates two different classes.

Support Vectors are the closest data points to the hyperplane from each class. The SVM algorithm only cares about these points when deciding where the boundary should be drawn.

The margin is the distance between the hyperplane and the closest support vectors. SVM tries to maximize this margin, because a wider margin means better generalization (more confident predictions).

Keep in mind margin is quite important here, just a k was important for KNN Algorithm.

  • A large margin = better generalization to unseen data.

  • If the boundary is too close to data points, the model becomes sensitive (overfits).

So, SVM is all about finding the maximum margin classifier.


When your data is not separable in 2D:

SVM transforms it into higher dimensions (like 3D) where it becomes separable and then finds a boundary there

Analogy: You can't separate intertwined spaghetti on a plate (2D), but lift them up with a fork (3D) and they’re easily separable.

Wondering what am I talking about here? This is called as Kernel Trick, also can be termed as SVM’s secret weapon. But why this secret weapon is even needed?

Sometimes, data is not separable in current dimensions, like this:

Example: Imagine a circular dataset:

  • Inside the circle = Class A

  • Outside the circle = Class B

A straight line won’t work in this case.

So, we transform the data to a higher dimension where a linear separator does exist.

This is where the kernel trick comes in.

  • Polynomial Kernel: Adds polynomial terms to data

  • RBF Kernel (Gaussian): Projects into infinite dimensions (very flexible)

We will have a look at it during code level practicals ahead.


Coders Assemble!

Enough theoritical knowledge, time for practicals.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load iris dataset (using only 2 classes for binary classification)
iris = datasets.load_iris()
X = iris.data[:100, :2]  # Take first 2 features for 2D plot
y = iris.target[:100]    # 0 or 1

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train the model
model = SVC(kernel='linear', C=1.0)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

# Plotting
def plot_decision_boundary(X, y, model):
    plt.figure(figsize=(8,6))
    sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, palette='Set1')

    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    # Create grid
    xx = np.linspace(xlim[0], xlim[1], 30)
    yy = np.linspace(ylim[0], ylim[1], 30)
    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T

    Z = model.decision_function(xy).reshape(XX.shape)

    # Plot decision boundary and margins
    ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1],
               alpha=0.5, linestyles=['--', '-', '--'])

    # Support vectors
    ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
               s=100, linewidth=1, facecolors='none', edgecolors='k')
    plt.title("SVM Decision Boundary with Support Vectors")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()

plot_decision_boundary(X, y, model)

This model has achieved 100% accuracy, with the decision boundaries well placed at the distance. The circled dots represent the support vectors. These are the points closest to the decision boundary, and they are the only ones that influence where the boundary is drawn. If you remove other points far from the boundary, the decision line won’t change. But if you remove a support vector, it might change drastically.

The wider the margin between the 2 dotted lines in above picture, the better generalization our model can achieve on outside data. Think of this as a neutral zone. The neutral zone is as wide as possible to reduce future conflicts (errors on new data).

The thick black line in the middle is what seperates the 2 classes, it is called as Hyperplane. As in above diagram, if any point falls above the line, then it is classified in “Class 0“ and vice versa.

But things aren’t always simple, are they? 😈

Sometimes, data is not linearly separable in its current form (i.e., you can't draw a straight line to separate classes). Instead of manually transforming the features to higher dimensions, SVM uses kernels to do this mathematically and implicitly.

What I am trying to convey above will be cleared in code examples below

We will generate a dataset in which points are plotted as 2 concentric circles.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import seaborn as sns

# Generate synthetic non-linear dataset (circle within circle)
X, y = make_circles(n_samples=300, factor=0.4, noise=0.1, random_state=0)

# Split into training/testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Plot data to visualize
plt.figure(figsize=(6,6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolor='k')
plt.title("Circular Non-Linear Data")
plt.xlabel("X1")
plt.ylabel("X2")
plt.show()

Now, give me a moment to prove my point.

We cannot just simply draw a decision line in between. If we try to do so, the accuracy will be extremely horrible. Lets have a look from below code example.

# Linear SVM Fails here
# Train Linear SVM
linear_model = SVC(kernel='linear')
linear_model.fit(X_train, y_train)

# Accuracy (you’ll see it’s low)
y_pred_linear = linear_model.predict(X_test)
print("Linear SVM Accuracy:", accuracy_score(y_test, y_pred_linear)

I got this answer : Linear SVM Accuracy: 0.5555555555555556

Additonally, if we try to draw a straight line as Linear SVM, we get this:

# Plotting decision boundary
def plot_decision_boundary(X, y, model, title):
    h = .02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.figure(figsize=(6,6))
    plt.contourf(xx, yy, Z, cmap='bwr', alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolor='k')
    plt.title(title)
    plt.xlabel("X1")
    plt.ylabel("X2")
    plt.show()

plot_decision_boundary(X, y, linear_model, "Decision Boundary (Linear SVM)")

Now lets employ the use of RBF Kernel. The key idea behind it is:

💡
“Don’t reshape the data manually. Let the math behind kernels do it for you.”

How? Let see.

# RBF Kernel
# Train SVM with RBF Kernel
rbf_model = SVC(kernel='rbf', gamma='auto')
rbf_model.fit(X_train, y_train)

# Accuracy (should be much better!)
y_pred_rbf = rbf_model.predict(X_test)
print("RBF Kernel SVM Accuracy:", accuracy_score(y_test, y_pred_rbf))

Guess the answer I got : RBF Kernel SVM Accuracy: 1.0

A 100% Accuracy. Wondering how the plot looks like?

plot_decision_boundary(X, y, rbf_model, "Decision Boundary (RBF Kernel SVM)")

Now you’ll see a circular decision boundary, which cleanly separates the inner and outer circle; this is the power of the kernel trick in action!

Summarizing the codes above:

ModelCan Handle Circular Data?AccuracyVisual
Linear SVMNoLowStraight line (fails)
RBF Kernel SVMYesHighCurved boundary (succeeds)

The RBF kernel lifted the circular data into a higher dimension, where the SVM could draw a straight boundary; but that boundary, when mapped back to 2D, becomes a circle.


Conclusion for the day

Take notes peeps!!

  • What SVM is: A powerful supervised machine learning algorithm for classification and regression, especially useful for high-dimensional spaces. Think of SVM like a tightrope walker trying to balance between two buildings (classes). The support ropes (support vectors) hold the rope taut; if you move them, the rope (decision boundary) shifts. And kernels? They build a staircase if you can't balance in flat terrain!

  • Hyperplane & Decision Boundary: SVM finds the best line (or hyperplane) that separates classes with the maximum margin.

  • Support Vectors: These are the data points closest to the decision boundary, and they are crucial — they “support” the margin.

  • Kernel Trick: Transforms data into higher dimensions where it becomes linearly separable (like turning a circle into a line!)

  • Code Implementation: You saw how to create SVM models, visualize decision boundaries, and even explore non-linear classification with kernels.


You may try this algorithm on real life datasets to explore various possibilities. Try tweaking around values around a bit, the more you play, the more you get acquainted with the concepts. Lets close the day for today.

Happy Coding!! Ciao!

0
Subscribe to my newsletter

Read articles from Saket Khopkar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Saket Khopkar
Saket Khopkar

Developer based in India. Passionate learner and blogger. All blogs are basically Notes of Tech Learning Journey.