Data Visualization in Python: A Beginner's Guide

πŸ“Œ Introduction

Data visualization is one of the most essential skills in data science, helping us to analyze and interpret complex datasets effectively. Python provides several powerful libraries for creating stunning and informative visualizations.

πŸš€ Learn More with TechGyan! Subscribe to TechGyan YouTube Channel for in-depth Python tutorials, coding insights, and hands-on projects. Don't miss out on the latest tech updates!

In this blog, you’ll learn: βœ… Why data visualization is important
βœ… Top Python libraries for visualization
βœ… Hands-on examples with Matplotlib, Seaborn, and Plotly

Let’s get started! πŸš€


1. Univariate Plots

A univariate plot is a type of graph that helps us understand one variable at a time. "Uni" means one, so we only analyze a single feature.

Examples of univariate plots:

  • Histogram - Shows how data is distributed.

  • Box Plot - Shows the median, range, and outliers.

  • Line Chart - Displays trends over time.

For example, if we analyze students' heights, we can use a histogram to see how many students fall into different height ranges.

2. Multivariate Plots

A multivariate plot is used when we want to study relationships between two or more variables.

Examples of multivariate plots:

  • Scatter Plot - Shows the relationship between two variables.

  • Heatmap - Shows correlations using color intensity.

  • Pair Plot - Displays relationships between multiple variables in one view.

For example, if we analyze the relationship between students' height and weight, a scatter plot can show whether taller students tend to be heavier.

3. Training Data & Test Data

When we build a machine learning model, we divide the data into two parts:

  • Training Data - This is used to train the model so it can learn patterns.

  • Test Data - This is used to check how well the model performs on unseen data.

For example, if we train a model to predict student grades based on their study hours, we use training data for learning and test data to check accuracy.

4. Performance Measures

After training a model, we need to measure how good it is. Some common performance measures are:

  • Accuracy - The percentage of correct predictions.

  • Precision & Recall - Used in classification problems to check correctness.

  • Mean Squared Error (MSE) - Used in regression to measure errors.

For example, if our model predicts student grades, we can compare its predictions with actual grades and measure accuracy.


1️⃣ Why is Data Visualization Important?

πŸ”Ή Helps in understanding trends and patterns in data.
πŸ”Ή Makes it easier to communicate insights.
πŸ”Ή Helps in decision-making based on data-driven analysis.
πŸ”Ή Useful in machine learning for feature selection and analysis.

2️⃣ Top Python Libraries for Data Visualization

Python offers multiple libraries for visualization, each serving different purposes:

LibraryBest For
MatplotlibBasic plots and customization
SeabornStatistical data visualization
PlotlyInteractive and dynamic plots
Pandas VisualizationQuick plotting from DataFrames
BokehHigh-performance interactive plots
ggplot (Plotnine)Grammar of graphics-style plotting

3️⃣ Getting Started with Matplotlib

Matplotlib is the most fundamental visualization library in Python. It allows you to create simple static plots.

πŸ“Œ Install Matplotlib

pip install matplotlib

πŸ“Œ Example: Creating a Basic Line Chart

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]

plt.plot(x, y, marker='o', linestyle='-', color='b', label='Data')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Basic Line Chart")
plt.legend()
plt.show()

Output after execution: python file_name.py

4️⃣ Data Visualization with Seaborn

Seaborn is built on top of Matplotlib and provides a more aesthetically pleasing interface for statistical graphics.

πŸ“Œ Install Seaborn

pip install seaborn

πŸ“Œ Example: Creating a Histogram

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Sample data
data = pd.DataFrame({"values": [12, 15, 20, 25, 30, 30, 35, 40, 42, 50]})

sns.histplot(data["values"], bins=5, kde=True, color='g')
plt.title("Histogram Example by techGyan")
plt.show()

Ouput after execution : python file_name.py

5️⃣ Interactive Visualization with Plotly

If you want interactive charts, Plotly is the best choice.

πŸ“Œ Install Plotly

pip install plotly

πŸ“Œ Example: Creating an Interactive Bar Chart

import plotly.express as px
import pandas as pd

# Sample Data
data = pd.DataFrame({
    "Category": ["A", "B", "C", "D"],
    "Values": [10, 25, 40, 30]
})

fig = px.bar(data, x='Category', y='Values', title="Interactive Bar Chart by techGyan", color='Category')
fig.show()

6️⃣ Best Practices for Effective Data Visualization

βœ”οΈ Choose the right chart type (bar, line, scatter, histogram, etc.).
βœ”οΈ Use labels, legends, and titles for clarity.
βœ”οΈ Keep the design simple and clean (avoid too much clutter).
βœ”οΈ Use color contrast effectively for better readability.
βœ”οΈ Make use of interactive elements when necessary.


Types of Data Visualization Charts: From Basic to Advanced

In this guide, we’ll explore the different types of data visualizations and how to create them using Python libraries like Matplotlib, Seaborn, and Plotly.

Simple Charts for Data Visualization
These are the basic charts you’ll use when starting with data visualization. They are easy to create, simple to understand, and help you quickly analyze your data. We use Python libraries like Matplotlib and Seaborn to make these charts.

1. Bar Charts:- Comparing Categories

A bar chart is used to compare different categories using rectangular bars.

πŸ“Œ When to Use

βœ” Comparing sales across different products
βœ” Showing population distribution by country
βœ” Comparing monthly revenue trends

πŸ“Œ Example Code (Matplotlib)

import matplotlib.pyplot as plt  

categories = ["A", "B", "C", "D"]  
values = [10, 25, 40, 30]  

plt.bar(categories, values, color=['blue', 'orange', 'green', 'red'])  
plt.xlabel("Categories")  
plt.ylabel("Values")  
plt.title("Bar Chart")  
plt.show()

A line chart is useful for displaying data over time.

πŸ“Œ When to Use

βœ” Analyzing stock market trends
βœ” Tracking website traffic over months
βœ” Showing temperature changes over time

πŸ“Œ Example Code (Matplotlib)

x = [1, 2, 3, 4]  
y = [10, 20, 25, 30]  

plt.plot(x, y, marker='o', linestyle='-', color='blue')  
plt.xlabel("Time")  
plt.ylabel("Values")  
plt.title("Line Chart")  
plt.show()

3️⃣ Pie Charts – Displaying Proportions

A pie chart is best for representing percentage distributions.

πŸ“Œ When to Use

βœ” Market share of different companies
βœ” Percentage of expenses in a budget
βœ” Customer segmentation data

πŸ“Œ Example Code (Matplotlib)

import matplotlib.pyplot as plt

# Data for the pie chart
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [30, 25, 20, 25]

# Create the pie chart
plt.figure(figsize=(6,6))
plt.pie(values, labels=categories, autopct='%1.1f%%', colors=['blue', 'orange', 'green', 'red'])
plt.title("Pie Chart by TechGyan")

# Save as a JPG file
plt.savefig("pie_chart_techgyan.jpg", format='jpg')
plt.show()

4️⃣ Scatter Plots – Showing Relationships Between Variables

A scatter plot helps visualize relationships between two numerical variables.

πŸ“Œ When to Use

βœ” Examining correlation between age and income
βœ” Identifying trends in customer purchases
βœ” Analyzing height vs. weight distribution

πŸ“Œ Example Code (Seaborn)

import matplotlib.pyplot as plt

# Data for scatter plot
x = [1, 2, 3, 4]  
y = [10, 20, 25, 30]  

# Create scatter plot
plt.scatter(x, y, color='purple')  
plt.xlabel("X Values")  
plt.ylabel("Y Values")  
plt.title("Scatter Plot")  

# Save as a JPG file
plt.savefig("scatter_plot.jpg", format='jpg')

# Show the plot
plt.show()

5️⃣ Histograms – Understanding Data Distributions

A histogram represents the distribution of numerical data by dividing it into bins.

πŸ“Œ When to Use

βœ” Analyzing exam scores distribution
βœ” Checking income distribution in a city
βœ” Visualizing age groups in a population

πŸ“Œ Example Code (Seaborn)

import matplotlib.pyplot as plt

# Data for histogram
data = [10, 20, 20, 30, 40, 40, 40, 50]  

# Create histogram
plt.figure(figsize=(6,6))
plt.hist(data, bins=4, color='gray', edgecolor='black')  
plt.xlabel("Value Ranges")  
plt.ylabel("Frequency")  
plt.title("Histogram by techgyan")   

# Save as a JPG file
plt.savefig("histogram.jpg", format='jpg')

# Show the plot
plt.show()


Advanced Charts for Data Visualization

After learning basic charts, it's time to explore advanced charts! These charts help you:
βœ… Dive deeper into your data
βœ… Find detailed insights
βœ… Visualize multiple variables
βœ… Uncover hidden patterns and relationships

Advanced charts provide a more comprehensive analysis, making it easier to spot trends, correlations, and anomalies in your data. πŸš€

6️⃣ Heatmaps – Visualizing Correlations

A heatmap is useful for displaying relationships between multiple variables.

πŸ“Œ When to Use

βœ” Showing correlation between stock prices
βœ” Analyzing website user activity by hour
βœ” Visualizing temperature variations by region

πŸ“Œ Example Code (Seaborn)

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Sample Data (4x4 Matrix)
data = np.array([[10, 20, 30, 40], 
                 [20, 30, 40, 50], 
                 [30, 40, 50, 60], 
                 [40, 50, 60, 70]])

# Create Heatmap
plt.figure(figsize=(6, 5))
sns.heatmap(data, annot=True, cmap="coolwarm", linewidths=0.5, fmt="d")

# Labels & Title
plt.title("Heatmap Example")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")

# Save as Image
heatmap_path = "/mnt/data/heatmap_example.jpg"
plt.savefig(heatmap_path, format="jpg", dpi=300)
plt.show()

heatmap_path


🎯 Conclusion

Different types of data visualizations serve different purposes. Here’s a quick summary:

Visualization TypeBest For
Bar ChartComparing categories
Line ChartShowing trends over time
Pie ChartDisplaying proportions
Scatter PlotFinding relationships between variables
HistogramUnderstanding data distribution
HeatmapAnalyzing correlations visually

By choosing the right visualization type, you can make your data more insightful and meaningful! πŸš€

πŸŽ₯ Want to master data visualization? Watch detailed tutorials on the TechGyan YouTube Channel and boost your Python skills!

3
Subscribe to my newsletter

Read articles from techGyan : smart tech study directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

techGyan : smart tech study
techGyan : smart tech study

TechGyan is a YouTube channel dedicated to providing high-quality technical and coding-related content. The channel mainly focuses on Android development, along with other programming tutorials and tech insights to help learners enhance their skills. What TechGyan Offers? βœ… Android Development Tutorials πŸ“± βœ… Programming & Coding Lessons πŸ’» βœ… Tech Guides & Tips πŸ› οΈ βœ… Problem-Solving & Debugging Help πŸ” βœ… Latest Trends in Technology πŸš€ TechGyan aims to educate and inspire developers by delivering clear, well-structured, and practical coding knowledge for beginners and advanced learners.