Exploring Pandas DataFrame Views: A Comprehensive Guide
When working with data in Python, the Pandas library provides a powerful DataFrame structure to manage and analyze data effectively. To make the most out of Pandas, understanding how to view and manipulate DataFrame content is crucial. Here’s a detailed guide on various methods and properties to view and inspect data in a Pandas DataFrame.
Let's create a sample dataset containing information about cars and use it to demonstrate the output of various Pandas DataFrame view methods. Here's the dataset and how each function would be applied to it:
Sample Car Dataset
import pandas as pd
# Creating a sample dataset
data = {
'Car': ['Toyota', 'Honda', 'BMW', 'Audi', 'Ford', 'Chevrolet', 'Tesla'],
'Model': ['Camry', 'Civic', 'X5', 'A4', 'Mustang', 'Impala', 'Model S'],
'Year': [2020, 2019, 2021, 2018, 2020, 2017, 2021],
'Price': [24000, 22000, 60000, 35000, 26000, 29000, 80000],
'Mileage': [30000, 40000, 20000, 50000, 15000, 45000, 10000],
'Electric': [False, False, False, False, False, False, True]
}
# Creating DataFrame
df = pd.DataFrame(data)
print(df)
Applying Various Pandas DataFrame View Methods
Display the First Few Rows:
print(df.head())
Output:
Car Model Year Price Mileage Electric 0 Toyota Camry 2020 24000 30000 False 1 Honda Civic 2019 22000 40000 False 2 BMW X5 2021 60000 20000 False 3 Audi A4 2018 35000 50000 False 4 Ford Mustang 2020 26000 15000 False
Display the Last Few Rows:
print(df.tail())
Output:
Car Model Year Price Mileage Electric 2 BMW X5 2021 60000 20000 False 3 Audi A4 2018 35000 50000 False 4 Ford Mustang 2020 26000 15000 False 5 Chevrolet Impala 2017 29000 45000 False 6 Tesla Model S 2021 80000 10000 True
View Column Names:
print(df.columns)
Output:
Index(['Car', 'Model', 'Year', 'Price', 'Mileage', 'Electric'], dtype='object')
View Data Types of Each Column:
print(df.dtypes)
Output:
Car object Model object Year int64 Price int64 Mileage int64 Electric bool dtype: object
Calculate Summary Statistics:
print(df.describe())
Output:
Year Price Mileage count 7.000000 7.000000 7.000000 mean 2019.428571 39428.571429 32714.285714 std 1.511858 19045.381353 13856.588075 min 2017.000000 22000.000000 10000.000000 25% 2018.500000 25500.000000 20000.000000 50% 2020.000000 29000.000000 30000.000000 75% 2020.000000 42500.000000 40000.000000 max 2021.000000 80000.000000 50000.000000
Get Detailed Information:
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7 entries, 0 to 6 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Car 7 non-null object 1 Model 7 non-null object 2 Year 7 non-null int64 3 Price 7 non-null int64 4 Mileage 7 non-null int64 5 Electric 7 non-null bool dtypes: bool(1), int64(3), object(2) memory usage: 455.0+ bytes None
View Unique Values in a Column:
print(df['Car'].unique())
Output:
['Toyota' 'Honda' 'BMW' 'Audi' 'Ford' 'Chevrolet' 'Tesla']
Check for Null Values:
print(df.isnull())
Output:
Car Model Year Price Mileage Electric 0 False False False False False False 1 False False False False False False 2 False False False False False False 3 False False False False False False 4 False False False False False False 5 False False False False False False 6 False False False False False False
Check for Non-Null Values:
print(df.notnull())
Output:
Car Model Year Price Mileage Electric 0 True True True True True True 1 True True True True True True 2 True True True True True True 3 True True True True True True 4 True True True True True True 5 True True True True True True 6 True True True True True True
View Values in 2D Array:
print(df.values)
Output:
[['Toyota' 'Camry' 2020 24000 30000 False] ['Honda' 'Civic' 2019 22000 40000 False] ['BMW' 'X5' 2021 60000 20000 False] ['Audi' 'A4' 2018 35000 50000 False] ['Ford' 'Mustang' 2020 26000 15000 False] ['Chevrolet' 'Impala' 2017 29000 45000 False] ['Tesla' 'Model S' 2021 80000 10000 True]]
View Column and Row Names:
print(df.columns) print(df.index)
Output:
Index(['Car', 'Model', 'Year', 'Price', 'Mileage', 'Electric'], dtype='object') RangeIndex(start=0, stop=7, step=1)
Get Shape of the DataFrame:
print(df.shape)
Output:
(7, 6)
Print Specific Columns:
print(df[['Car', 'Model']])
Output:
Car Model 0 Toyota Camry 1 Honda Civic 2 BMW X5 3 Audi A4 4 Ford Mustang 5 Chevrolet Impala 6 Tesla Model S
Print Specific Rows:
print(df.iloc[0:3])
Output:
Car Model Year Price Mileage Electric 0 Toyota Camry 2020 24000 30000 False 1 Honda Civic 2019 22000 40000 False 2 BMW X5 2021 60000 20000 False
Print a Column as a Series:
print(df['Car'])
Output:
0 Toyota 1 Honda 2 BMW 3 Audi 4 Cheverlot 5 Tesla Name: Car, dtype: object
Print a Column as a DataFrame:
print(df[['Car']])
Output:
Car 0 Toyota 1 Honda 2 BMW 3 Audi 4 Ford 5 Chevrolet 6 Tesla
Select Specific Row and Column:
print(df.loc[0:2, ['Car', 'Model']])
Output:
Car Model 0 Toyota Camry 1 Honda Civic 2 BMW X5
Shift Rows:
print(df.shift(1))
Output:
Car Model Year Price Mileage Electric 0 NaN NaN NaN NaN NaN NaN 1 Toyota Camry 2020.0 24000.0 30000.0 False 2 Honda Civic 2019.0 22000.0 40000.0 False 3 BMW X5 2021.0 60000.0 20000.0 False 4 Audi A4 2018.0 35000.0 50000.0 False 5 Ford Mustang 2020.0 26000.0 15000.0 False 6 Chevrolet Impala 2017.0 29000.0 45000.0 False
Sort by Column Values:
print(df.sort_values(by=['Price']))
Output:
Car Model Year Price Mileage Electric 1 Honda Civic 2019 22000 40000 False 0 Toyota Camry 2020 24000 30000 False 4 Ford Mustang 2020 26000 15000 False 5 Chevrolet Impala 2017 29000 45000 False 3 Audi A4 2018 35000 50000 False 2 BMW X5 2021 60000 20000 False 6 Tesla Model S 2021 80000 10000 True
Sort by Index:
print(df.sort_index(ascending=False))
Output:
Car Model Year Price Mileage Electric 6 Tesla Model S 2021 80000 10000 True 5 Chevrolet Impala 2017 29000 45000 False 4 Ford Mustang 2020 26000 15000 False 3 Audi A4 2018 35000 50000 False 2 BMW X5 2021 60000 20000 False 1 Honda Civic 2019 22000 40000 False 0 Toyota Camry 2020 24000 30000 False
Adjust Display Options:
pd.set_option('display.max_colwidth', None) pd.set_option('display.max_rows', None) pd.set_option('display.max_columns', None)
One-Hot Encode Columns:
print(pd.get_dummies(df, columns=['Car'], prefix=['Car'], drop_first=True))
Output:
Model Year Price Mileage Electric Car_BMW Car_Chevrolet Car_Ford Car_Honda Car_Tesla Car_Toyota 0 Camry 2020 24000 30000 False 0 0 0 0 0 1 1 Civic 2019 22000 40000 False 0 0 0 1 0 0 2 X5 2021 60000 20000 False 1 0 0 0 0 0 3 A4 2018 35000 50000 False 0 0 0 0 0 0 4 Mustang 2020 26000 15000 False 0 0 1 0 0 0 5 Impala 2017 29000 45000 False 0 1 0 0 0 0 6 Model S 2021 80000 10000 True 0 0 0 0 1 0
By mastering these Pandas DataFrame view methods, you can effectively inspect, manipulate, and analyze your data, leading to better insights and decisions.
Subscribe to my newsletter
Read articles from Emeron Marcelle directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Emeron Marcelle
Emeron Marcelle
As a doctoral scholar in Information Technology, I am deeply immersed in the world of artificial intelligence, with a specific focus on advancing the field. Fueled by a strong passion for Machine Learning and Artificial Intelligence, I am dedicated to acquiring the skills necessary to drive growth and innovation in this dynamic field. With a commitment to continuous learning and a desire to contribute innovative ideas, I am on a path to make meaningful contributions to the ever-evolving landscape of Machine Learning.