t-Distributed Stochastic Neighbor Embedding (t-SNE) – Visualizing High-Dimensional Data

Tushar PantTushar Pant
4 min read

Introduction

In the world of machine learning and data science, high-dimensional datasets are common. Visualizing these datasets is challenging due to their complexity and high dimensionality. t-Distributed Stochastic Neighbor Embedding (t-SNE) is a powerful non-linear dimensionality reduction technique that is particularly useful for visualizing high-dimensional data in 2D or 3D space. It preserves local structure, making it ideal for exploring clusters and patterns.

Why Use t-SNE?

  • Visualization: Projects high-dimensional data into 2D or 3D for intuitive visualization.

  • Non-linear Relationships: Captures complex, non-linear patterns.

  • Clustering: Reveals hidden clusters and groupings in data.

  • Data Exploration: Ideal for exploratory data analysis (EDA).


1. What is t-SNE?

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique developed by Laurens van der Maaten and Geoffrey Hinton. It is widely used for visualizing high-dimensional data by converting similarities between data points into probabilities and minimizing the Kullback-Leibler divergence between joint probabilities in high-dimensional and low-dimensional spaces.

1.1 Key Characteristics of t-SNE:

  • Non-linear Transformation: Captures complex, non-linear patterns in data.

  • Local Structure Preservation: Retains local similarities and neighborhood relationships.

  • Global Structure Compromise: Global distances are not preserved accurately.

  • Visualization-Friendly: Projects data into 2D or 3D for intuitive visualization.

1.2 When to Use t-SNE?

  • When you need to visualize high-dimensional data in 2D or 3D.

  • When exploring cluster structures or groupings in data.

  • For exploratory data analysis (EDA) to gain insights into hidden patterns.

  • When traditional methods like PCA fail to capture complex relationships.


2. How t-SNE Works

t-SNE consists of the following steps:

Step 1: Compute Pairwise Similarities in High-Dimensional Space

  • Compute pairwise similarities between points in high-dimensional space.

  • Use Gaussian distribution to measure similarity:

Where:

  • Pij = Conditional probability that point xix_i would pick xj as its neighbor.

  • σi = Perplexity-based variance for point xi.

Step 2: Compute Pairwise Similarities in Low-Dimensional Space

  • Initialize low-dimensional counterparts yi randomly.

  • Compute similarities using a Student-t distribution with 1 degree of freedom (heavy tails):

Where:

  • Qij = Joint probability in the low-dimensional space.

  • Heavy tails prevent crowding of points in the center.

Step 3: Minimize Kullback-Leibler (KL) Divergence

  • Minimize the KL Divergence between high-dimensional and low-dimensional distributions:

  • This is achieved using Gradient Descent.

Step 4: Update Low-Dimensional Embeddings

  • Iteratively update low-dimensional points using the gradients of the KL divergence.

3. Mathematical Concepts Behind t-SNE

3.1 Perplexity and σ Selection

  • Perplexity controls the balance between local and global aspects of data.

  • It influences the variance (σ\sigma) of the Gaussian distribution.

3.2 Heavy-Tailed Student-t Distribution

  • The Student-t distribution with 1 degree of freedom is used in the low-dimensional space to:

    • Prevent crowding problem.

    • Maintain separation between distant points.

3.3 KL Divergence as Cost Function

  • Measures the difference between high-dimensional and low-dimensional distributions.

  • Asymmetric: Emphasizes preserving local structure.


4. Key Parameters in t-SNE

  • Perplexity: Controls the number of effective neighbors. Default = 30.

  • Learning Rate: Affects convergence. Too low = slow; Too high = divergence.

  • Iterations: Number of gradient descent iterations.

  • Number of Components: Usually 2 or 3 for visualization.


5. Advantages and Disadvantages

5.1 Advantages:

  • Visualizes Complex Data: Captures non-linear relationships.

  • Local Structure Preservation: Maintains neighborhood relationships.

  • Cluster Discovery: Excellent for identifying hidden clusters.

5.2 Disadvantages:

  • Computationally Expensive: Not suitable for large datasets.

  • Non-deterministic Results: Different runs may yield different outputs.

  • No Interpretability: Output dimensions have no direct interpretation.

  • Global Structure Loss: Does not preserve global distances.


6. t-SNE vs PCA

Featuret-SNEPCA
TypeNon-linearLinear
Local StructurePreservedNot Preserved
Global StructureCompromisedPreserved
ScalabilityLow (Slow for large data)High (Fast for large data)
InterpretabilityLowHigh (Linear combinations)

7. Implementation of t-SNE in Python

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
# Load Dataset
data = load_iris()
X = data.data
y = data.target
# Standardize the Data
X_std = StandardScaler().fit_transform(X)
# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X_std)
# Plot the Results
plt.figure(figsize=(8,6))
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y, cmap='viridis', edgecolor='k', s=100)
plt.title('t-SNE - Iris Dataset')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.grid(True)
plt.show()


8. Real-World Applications

  • Image Recognition: Visualizing high-dimensional image features.

  • Natural Language Processing (NLP): Word embeddings visualization.

  • Genomics: Identifying gene expression clusters.

  • Anomaly Detection: Revealing outliers in complex datasets.


9. Conclusion

t-SNE is a powerful non-linear dimensionality reduction technique that effectively visualizes high-dimensional data by preserving local neighborhood structures. It is widely used for data exploration and cluster discovery. However, t-SNE is computationally expensive and lacks interpretability, making it more suitable for visualization rather than downstream machine learning tasks.

0
Subscribe to my newsletter

Read articles from Tushar Pant directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tushar Pant
Tushar Pant