Visualising Data with Seaborn - A Tutorial

sambit choudhurysambit choudhury
20 min read

Seaborn is a powerful Python library built on Matplotlib that helps you create beautiful and informative statistical graphics. This guide will walk you through various Seaborn plot types using the California Housing dataset.

Since it's not always easy to decide how to best tell the story behind your data, we've broken the chart types into three broad categories to help with this:

  • Trends: A pattern of change over time or a continuous variable.

    • sns.lineplot: Line charts show trends over time, allowing multiple lines for group comparisons.
  • Relationship: Understanding connections between variables.

    • sns.barplot: Bar charts compare quantities across different groups.

    • sns.heatmap: Heatmaps reveal colour-coded patterns in numerical tables.

    • sns.scatterplot: Scatter plots show relationships between two continuous variables, with optional color-coding for a third categorical variable.

    • sns.regplot: Adds a regression line to scatter plots, making linear relationships clearer.

    • sns.lmplot: Useful for drawing multiple regression lines when groups are colour-coded in a scatter plot.

    • sns.swarmplot: Categorical scatter plots show the relationship between a continuous variable and a categorical variable by preventing point overlap.

  • Distribution: Visualising the possible values a variable can take and their likelihood, often across categories.

    • sns.histplot: Histograms show the distribution of a single numerical variable.

    • sns.kdeplot: KDE plots (or 2D KDE plots) display a smooth, estimated distribution for one or two numerical variables.

    • sns.jointplot: Combines a 2D KDE plot with corresponding KDE plots for individual variables.

    • sns.boxplot: Summarises the distribution of a numerical variable, showing median, quartiles, and outliers.

    • sns.violinplot: Shows the density distribution of a numerical variable, along with its median and quartiles.

    • sns.boxenplot: Provides more detailed quantile information than a box plot, especially for large datasets.

    • sns.countplot: Displays the count of observations for each category in a categorical variable, with optional sub-categorization using hue.

    • sns.FacetGrid: Allows creating multiple plots (e.g., histograms or KDEs) across different subsets of your data to compare distributions.

1. Initial Setup and Data Loading

First, let's get everything set up by loading the necessary libraries and the California Housing dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Fetch the California Housing dataset
housing_data = fetch_california_housing()

# Create a Pandas DataFrame
housing_df = pd.DataFrame(housing_data.data, columns=housing_data.feature_names)
housing_df['target'] = housing_data.target

# Display the first few rows of the DataFrame
print(housing_df.head())

# Set the Seaborn style for better aesthetics
sns.set(style='ticks', palette='Set2')

Cheat Sheet: Initial Setup

CodeDescription
import pandas as pdImports Pandas for data manipulation.
import seaborn as snsImports Seaborn for plotting.
import matplotlib.pyplot as pltImports Matplotlib for plot customization.
from sklearn.datasets import fetch_california_housingImports the California Housing dataset.
housing_df = pd.DataFrame(...)Converts the dataset into a Pandas DataFrame.
sns.set(style='ticks', palette='Set2')Sets the visual style of Seaborn plots. style can be 'white', 'dark', 'whitegrid', 'darkgrid', 'ticks'. palette offers various color schemes.

Here's a look at the first few rows of the housing_df DataFrame:

MedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudetarget
8.325241.06.9841271.023810322.02.55555637.88-122.234.526
8.301421.06.2381370.9718802401.02.10984237.86-122.223.585
7.257452.08.2881361.073446496.02.80226037.85-122.243.521
5.643152.05.8173521.073059558.02.54794537.85-122.253.413
3.846252.06.2818531.081081565.02.18146737.85-122.253.422

2. Visualising Relationships

These plots are designed to show how two or more variables interact with each other, helping to identify correlations, patterns, and dependencies.

2.1 Scatter Plot (sns.lmplot): Visualising Linear Relationships

Scatter plots are great for seeing how two numerical variables relate to each other. Seaborn's lmplot can even add a regression line to show the trend and a confidence interval around it.

def scatterPlot():
    # Create the scatter plot with a regression line
    sns.lmplot(x="AveRooms", y="AveBedrms", data=housing_df)
    # Remove excess chart lines and ticks for a cleaner look
    sns.despine()
    plt.title('Average Rooms vs. Average Bedrooms')
    plt.xlabel('Average Rooms per Household')
    plt.ylabel('Average Bedrooms per Household')
    plt.show()

# To run this plot, uncomment the line below:
# scatterPlot()

What it means: This plot shows the connection between the average number of rooms and bedrooms in a household. You'd expect to see a strong positive correlation, meaning more rooms generally lead to more bedrooms. The line helps highlight this trend, and the shaded area represents the confidence interval for the regression estimate.

Here's an example of what such a scatter plot might look like:

Cheat Sheet: Scatter Plot (lmplot)

CodeDescription
sns.lmplot(x="col1", y="col2", data=df)Creates a scatter plot with an optional regression line. x and y specify the columns for the axes.
sns.despine()Removes the top and right borders from the plot, making it look cleaner.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
plt.show()Displays the plot.

2.2 Regression Plot (sns.regplot): More Flexible Linear Regression

Similar to lmplot, regplot also plots data and a linear regression model fit. However, regplot works with axes-level functions, allowing more flexibility in integrating with other plots.

def regPlot():
    sns.regplot(x="MedInc", y="target", data=housing_df, scatter_kws={'alpha':0.3})
    sns.despine()
    plt.title('Median Income vs. House Value (Regression Plot)')
    plt.xlabel('Median Income')
    plt.ylabel('Median House Value')
    plt.show()

# To run this plot, uncomment the line below:
# regPlot()

What it means: This plot directly shows the linear relationship between median income (MedInc) and median house value (target). The line indicates the best-fit linear regression, and the shaded area is the confidence interval. We can clearly see a positive correlation, meaning higher incomes are generally associated with higher house values. The scatter_kws argument makes the points slightly transparent (alpha=0.3), which helps visualise areas of high data density.

Here's an example of what such a regression plot might look like:

Cheat Sheet: Regression Plot (regplot)

CodeDescription
sns.regplot(x="col1", y="col2", data=df, ...)Plots data and a linear regression model fit. scatter_kws can customize scatter points.
sns.despine()Removes the top and right borders from the plot, making it look cleaner.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
plt.show()Displays the plot.

2.3 Joint Plot (sns.jointplot): Bivariate Distribution

Joint plots are fantastic for seeing the relationship between two variables, and they also show their individual distributions along the edges.

def joinplot_distribution():
    sns.jointplot(data=housing_df, x='MedInc', y='Population', kind="hex")
    plt.suptitle('Median Income vs. Population (Hexbin)', y=1.02) # Adjust suptitle position
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# joinplot_distribution()

What it means: This joint plot visualises the relationship between median income and population. The kind="hex" option creates hexagonal bins, where the colour intensity tells you how many data points fall into each hexagon. This is particularly useful for datasets with many overlapping points. The marginal histograms on the sides show the individual distributions of median income and population, providing a comprehensive view of both variables and their interaction.

Here's an example of what such a joint plot might look like:

Cheat Sheet: Joint Plot

CodeDescription
sns.jointplot(data=df, x='col1', y='col2', kind="type")Creates a joint plot. kind can be "scatter", "kde", "hist", "reg", "resid", "hex".
plt.suptitle('Title', y=pos)Sets a main title for the entire figure, which is useful for joint plots since plt.title applies to just one part.

2.4 Heatmap (sns.heatmap): Visualising Correlations and Matrices

Heatmaps are excellent for displaying matrices of data where colour intensity represents values. They're often used for correlation matrices or to show relationships between two categorical variables and a continuous one.

def heatmap():
    # Create income and age categories for the heatmap
    housing_df['IncomeCategory'] = pd.cut(housing_df['MedInc'], bins=8, labels=[
    'Very Low', 'Low', 'Low-Med', 'Medium', 'Med-High', 'High', 'Very High', 'Wealthy'])
    housing_df['AgeCategory'] = pd.cut(housing_df['HouseAge'], bins=10, labels=[
    '0-5', '6-10', '11-15', '16-20', '21-25', '26-30', '31-35', '36-40', '41-45', '46-50'])

    # Create a pivot table to aggregate the 'target' (house value) by categories
    heatmap_data = housing_df.pivot_table(
    values='target',
    index='IncomeCategory',
    columns='AgeCategory',
    aggfunc='mean'
    )

    plt.figure(figsize=(10, 6))
    sns.heatmap(heatmap_data, annot=True, cmap='YlOrRd', fmt='.1f')
    plt.title('Average House Value by Income and House Age Category')
    plt.xlabel('House Age Category')
    plt.ylabel('Income Category')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# heatmap()

What it means: This heatmap shows the average house value (target) based on categories of median income and house age. The color intensity (and annotations) indicate the average house value for each combination. This helps identify which combinations of income and house age tend to have higher or lower property values, making it easy to spot trends like "wealthy households in older homes" potentially correlating with high property values.

Here's an example of what such a heatmap might look like:

Cheat Sheet: Heatmap

CodeDescription
pd.pivot_table(df, values, index, columns, aggfunc)Creates a pivot table. values is the column to aggregate, index and columns define the rows and columns of the new table, aggfunc is the aggregation function (e.g., 'mean').
sns.heatmap(data, annot, cmap, fmt)Creates a heatmap. annot=True displays the values on the heatmap, cmap sets the color map, fmt formats the annotation text.
plt.figure(figsize=(width, height))Creates a new figure with a specified size.

2.5 Pair Grid (sns.PairGrid): Visualising Pairwise Relationships

Pair grids (and the simpler pairplot) are fantastic for seeing relationships between multiple variables in your dataset. They create a grid of scatter plots for every pair of variables and show histograms or KDEs for individual variables.

def pairgird():
    g = sns.PairGrid(housing_df[['AveRooms', 'AveBedrms', 'Population','MedInc']])
    g.map_upper(sns.scatterplot) # Scatter plots on the upper part of the grid
    g.map_lower(sns.scatterplot) # Scatter plots on the lower part of the grid (Corrected from kdeplot)
    g.map_diag(sns.histplot, kde=True) # Histograms with KDE on the diagonal
    plt.suptitle('Pairwise Relationships of Key Housing Variables', y=1.02) # Add a main title
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# pairgird()

What it means: This pair grid shows you all the relationships between 'AveRooms', 'AveBedrms', 'Population', and 'MedInc'. The top-right and bottom-left sections now consistently show scatter plots, while the diagonal displays histograms (with KDE) for each variable. This gives a comprehensive overview of how these features relate and are distributed, helping to quickly identify potential linear or non-linear relationships and common value ranges for each variable.

Here's an example of what such a pair grid might look like:

Cheat Sheet: Pair Grid

CodeDescription
sns.PairGrid(df[columns])Sets up a PairGrid with a specific subset of columns.
g.map(plot_function)Applies a plot function to all cells in the grid.
g.map_upper(plot_function)Applies a plot function to the cells in the upper triangle of the grid.
g.map_lower(plot_function)Applies a plot function to the cells in the lower triangle of the grid.
g.map_diag(plot_function)Applies a plot function to the cells on the diagonal (where a variable is plotted against itself).

2.6 Swarm Plot (sns.swarmplot): Showing Individual Observations

Swarm plots are similar to strip plots, but they adjust the points along the categorical axis so that they do not overlap. This gives a better representation of the distribution of values, especially when there are many data points.

def swarmplot_distribution():
    # First, ensure 'AgeCategory' is created if not already
    if 'AgeCategory' not in housing_df.columns:
        housing_df['AgeCategory'] = pd.cut(housing_df['HouseAge'],
                                          bins=[0, 10, 20, 30, 40, 50, 100],
                                          labels=['0-10', '11-20', '21-30', '31-40', '41-50', '50+'])

    # Reduced marker size to prevent overlap warnings for dense data
    sns.swarmplot(x="AgeCategory", y="MedInc", data=housing_df, s=3) # 's' controls marker size
    plt.title('Median Income Distribution by House Age Category')
    plt.xlabel('House Age Category')
    plt.ylabel('Median Income')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# swarmplot_distribution()

What it means: This swarm plot displays the individual median income values for each house age category. Unlike a box plot or violin plot which summarize the distribution, the swarm plot shows every single data point, with points "swarming" around the areas of higher density. This allows you to see the precise spread of income values within each age group and identify any clusters or gaps. It's particularly useful when you want to show the exact values of each observation in relation to a categorical variable. Note: For very large datasets, swarmplot might still struggle with point placement, leading to warnings. In such cases, consider using sns.stripplot (which allows overlap) or sns.violinplot for a density-based summary.

Here's an example of what such a swarm plot might look like:

Cheat Sheet: Swarm Plot

CodeDescription
sns.swarmplot(x="categorical_col", y="numerical_col", data=df)Creates a swarm plot showing individual data points, adjusted to avoid overlap. x is the categorical variable, y is the numerical variable.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
sns.despine()Removes the top and right borders from the plot.
plt.show()Displays the plot.

These plots are best suited for showing changes or patterns over a continuous variable, typically time, but can also represent other sequential or ordered variables.

Line plots are ideal for showing how something changes or trends over a continuous variable. Here, we'll use it to see how median income changes with house age.

def lineplot():
    age_values = housing_df.groupby('HouseAge')['MedInc'].mean().reset_index()
    plt = sns.lineplot(data=age_values, x='HouseAge', y='MedInc')
    plt.title('Average Median Income by House Age')
    plt.xlabel('House Age')
    plt.ylabel('Average Median Income')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# lineplot()

What it means: This line plot illustrates the average median income as house age increases. It can reveal if there's a particular age range of houses associated with higher or lower average incomes, or if the income generally increases or decreases with house age. This helps identify trends in the economic profiles of different age segments of housing.

Here's an example of what such a line plot might look like:

Cheat Sheet: Line Plot

CodeDescription
df.groupby('col1')['col2'].mean().reset_index()Calculates the average of col2 for each unique value in col1 and resets the index to make col1 a regular column again.
sns.lineplot(data=df, x='col1', y='col2')Creates a line plot.

4. Visualising Distributions

These plots help to understand the spread, central tendency, and shape of a single variable, or the distribution of a numerical variable across different categories.

4.1 Histogram (sns.histplot): Understanding Data Distribution

Histograms show how a single numerical variable is distributed. Seaborn's histplot can also include a Kernel Density Estimate (KDE), which is a smoothed curve showing the probability density.

def histogram_distribution():
    sns.histplot(housing_df.MedInc, bins=100, kde=True) # Using histplot for newer seaborn
    plt.title('Distribution of Median Income')
    plt.xlabel('Median Income')
    plt.ylabel('Count')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# histogram_distribution()

What it means: This histogram shows how median income is spread out in the California Housing dataset. The shape of the bars and the KDE curve can tell you if incomes are mostly clustered, skewed (leaning to one side), or have multiple peaks. For MedInc, it typically reveals a right-skewed distribution, indicating that most areas have lower to moderate median incomes, with a long tail extending to higher incomes.

Here's an example of what such a histogram might look like:

Cheat Sheet: Histogram

CodeDescription
sns.histplot(data, bins, kde)Creates a histogram. bins controls the number of bars, and kde=True adds the smoothed density curve.

4.2 Box Plot (sns.boxplot): Summarizing Data Distribution

Box plots are great for summarizing the distribution of numerical data. They clearly show the median, quartiles, and potential outliers.

def boxplot_distribution():
    sns.boxplot(x=housing_df.MedInc) # Use x= for a horizontal box plot
    plt.title('Box Plot of Median Income')
    plt.xlabel('Median Income')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# boxplot_distribution()

What it means: The box plot for median income shows you the middle 50% of the data (the box), the median (the line inside the box), and the spread of the rest of the data (the "whiskers"). Any points outside the whiskers are considered outliers. For MedInc, this plot typically highlights the range where most median incomes fall and points out any exceptionally high (or low) income districts.

Here's an example of what such a box plot might look like:

Cheat Sheet: Box Plot

CodeDescription
sns.boxplot(x=data)Creates a horizontal box plot. Use y=data for a vertical one.

4.3 Violin Plot (sns.violinplot): Detailed Distribution with Density

Violin plots combine the best parts of box plots and kernel density plots. They show the overall shape (density) of the data's distribution, plus its median and quartiles.

def violinplot_distribution():
    sns.violinplot(y=housing_df.HouseAge, orient="v") # Ensure orient="v" for vertical
    plt.title('Violin Plot of House Age')
    plt.ylabel('House Age')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# violinplot_distribution()

What it means: This violin plot of house age gives you a more detailed look at its distribution than a simple box plot. Wider parts of the "violin" mean more data points are clustered there, while narrower parts mean fewer. The small box inside shows the median and quartiles. This helps to visualise the density of house ages and identify if there are multiple peaks or a smooth distribution.

Here's an example of what such a violin plot might look like:

Cheat Sheet: Violin Plot

CodeDescription
sns.violinplot(y=data)Creates a vertical violin plot. Use x=data for a horizontal one. orient="v" explicitly sets vertical orientation.

4.4 Boxen Plot (Letter-value Plot - sns.boxenplot): Enhanced Distribution Detail

Boxen plots, also known as letter-value plots, are an enhancement of the traditional box plot. They are designed to provide more detailed information about the shape of a distribution, especially for larger datasets, by showing more quantiles (the "letter values") in the tails. This allows for a richer understanding of the distribution's extremities without obscuring individual data points.

def boxenplot_distribution():
    # First, ensure 'AgeCategory' is created if not already
    if 'AgeCategory' not in housing_df.columns:
        housing_df['AgeCategory'] = pd.cut(housing_df['HouseAge'],
                                          bins=[0, 10, 20, 30, 40, 50, 100],
                                          labels=['0-10', '11-20', '21-30', '31-40', '41-50', '50+'])

    sns.boxenplot(x="AgeCategory", y="MedInc", data=housing_df)
    plt.title('Median Income Distribution by House Age Category (Boxen Plot)')
    plt.xlabel('House Age Category')
    plt.ylabel('Median Income')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# boxenplot_distribution()

What it means: This boxen plot provides a refined view of the median income distribution across different house age categories. While similar to a box plot, its nested boxes represent a larger number of quantiles, giving a more granular insight into the density and spread of data, particularly in the tails of the distribution. This helps in understanding the subtle differences in income spread for various house ages and identifying potential skewness more clearly.

Here's an example of what such a boxen plot might look like:

Cheat Sheet: Boxen Plot

CodeDescription
sns.boxenplot(x="categorical_col", y="numerical_col", data=df)Creates a boxen plot showing enhanced quantile information for a numerical variable across categories. x is the categorical variable, y is the numerical variable.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
sns.despine()Removes the top and right borders from the plot.
plt.show()Displays the plot.

4.5 Facet Grid (sns.FacetGrid): Creating Multi-Panel Plots for Distribution

Facet grids allow you to visualise the distribution of a variable or the relationship between multiple variables across different subsets of your dataset. While also useful for relationships, they are powerful for breaking down distributions.

def facetgrid():
    # Create age groups for faceting
    housing_df['AgeGroup'] = pd.cut(housing_df['HouseAge'], bins=4, labels=[
    'New (0-13)', 'Moderate (13-26)', 'Old (26-39)', 'Very Old (39-52)'])

    # Create a FacetGrid with rows based on 'AgeGroup'
    g = sns.FacetGrid(housing_df, row="AgeGroup", height=3, aspect=2, margin_titles=True)
    g.map(plt.hist, "MedInc", bins=50, alpha=0.6, color="teal") # Using hist to show distribution across facets
    g.set_axis_labels("Median Income", "Count")
    g.set_titles(row_template='House Age: {row_name}') # Set individual row titles
    plt.suptitle('Median Income Distribution by House Age Group (Facet Grid)', y=1.02)
    plt.tight_layout() # Adjust layout to prevent labels overlapping
    plt.show()

# To run this plot, uncomment the line below:
# facetgrid()

What it means: This facet grid displays histograms of 'Median Income' for different house age groups. Each row represents a different age group, allowing for a quick comparison of the income distribution across these groups. This helps to see if the income spread changes significantly based on the age of the property, revealing distinct demographic or economic characteristics for each age bracket.

Here's an example of what such a facet grid might look like:

Cheat Sheet: Facet Grid (for Distribution)

CodeDescription
sns.FacetGrid(data, row, col, hue, height, aspect)Initializes a FacetGrid. row and col define the variables to create rows and columns of subplots. hue can color plot elements. height and aspect control subplot size.
g.map(plot_function, *args, **kwargs)Applies a plotting function (e.g., plt.hist, sns.kdeplot) to each facet to show distributions.
g.set_axis_labels("X Label", "Y Label")Sets labels for the x and y axes of all subplots.
g.set_titles(row_template='{row_name}')Sets titles for the individual facets.
plt.tight_layout()Adjusts subplot params for a tight layout.

4.6 Count Plot (sns.countplot): Visualising Categorical Counts

Count plots display the number of observations in each category using bars. It is essentially a histogram for a categorical variable. This is useful for quickly seeing the frequency of each unique value in a categorical column.

def countplot_distribution():
    # First, ensure 'AgeCategory' is created if not already
    if 'AgeCategory' not in housing_df.columns:
        housing_df['AgeCategory'] = pd.cut(housing_df['HouseAge'],
                                          bins=[0, 10, 20, 30, 40, 50, 100],
                                          labels=['0-10', '11-20', '21-30', '31-40', '41-50', '50+'])

    sns.countplot(x="AgeCategory", data=housing_df)
    plt.title('Count of Houses by Age Category')
    plt.xlabel('House Age Category')
    plt.ylabel('Count')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# countplot_distribution()

What it means: This count plot visualises the number of houses falling into each defined age category. It provides a direct count for each group, helping to understand the distribution of house ages within the dataset. For instance, you can quickly see which age ranges of houses are most (or least) common in the California Housing data.

Here's an example of what such a count plot might look like:

Cheat Sheet: Count Plot

CodeDescription
sns.countplot(x="categorical_col", data=df)Creates a count plot showing the number of observations in each category. x is the categorical variable.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
sns.despine()Removes the top and right borders from the plot.
plt.show()Displays the plot.

4.7 Count Plot with Hue (sns.countplot with hue): Categorical Distribution with Sub-Categories

The Count Plot with Hue extends the basic count plot by adding another categorical variable to further subdivide the bars. This allows you to compare the counts of sub-categories within each main category, providing a richer understanding of the data's composition.

def countplot_with_hue():
    # Ensure 'AgeCategory' and 'IncomeCategory' are created if not already
    if 'AgeCategory' not in housing_df.columns:
        housing_df['AgeCategory'] = pd.cut(housing_df['HouseAge'],
                                          bins=[0, 10, 20, 30, 40, 50, 100],
                                          labels=['0-10', '11-20', '21-30', '31-40', '41-50', '50+'])
    if 'IncomeCategory' not in housing_df.columns:
        housing_df['IncomeCategory'] = pd.cut(housing_df['MedInc'], bins=3, labels=['Low Income', 'Medium Income', 'High Income']) # Simplified for example

    sns.countplot(x="AgeCategory", hue="IncomeCategory", data=housing_df, palette="viridis")
    plt.title('Count of Houses by Age & Income Category')
    plt.xlabel('House Age Category')
    plt.ylabel('Count')
    plt.legend(title='Income Level')
    sns.despine()
    plt.show()

# To run this plot, uncomment the line below:
# countplot_with_hue()

What it means: This count plot with hue visualises the number of houses within each AgeCategory, further broken down by their IncomeCategory. You can see, for example, how many "Low Income" households reside in "0-10" year old houses versus "Medium Income" households. This helps in understanding the joint distribution of two categorical variables and identifying which income levels are more prevalent in different age groups of housing.

Here's an example of what such a count plot with hue might look like:

Cheat Sheet: Count Plot with Hue

CodeDescription
sns.countplot(x="categorical_col1", hue="categorical_col2", data=df, palette="color_map")Creates a count plot with bars split by a second categorical variable (hue). palette sets the color scheme for the hue categories.
plt.title('Title')Sets the title of your plot.
plt.xlabel('Label')Sets the label for the x-axis.
plt.ylabel('Label')Sets the label for the y-axis.
plt.legend(title='Legend Title')Displays and titles the legend, which is crucial when hue is used.
sns.despine()Removes the top and right borders from the plot.
plt.show()Displays the plot.

This tutorial provides a comprehensive overview of how to use Seaborn for data visualisation, categorised by the type of insights you want to gain from your data.

0
Subscribe to my newsletter

Read articles from sambit choudhury directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

sambit choudhury
sambit choudhury