Pandas Explained: Bear Species or Powerful Python Library?

Mehul PardeshiMehul Pardeshi
4 min read

Hello there, it's me again! I hope that from the last article, you learned a little about me.

Well, today we will be talking about pandas. Not the black and white animals, sorry to disappoint you all. But wait a second, pandas in Python are also equally fascinating. So lets start by an introduction:

A powerful and flexible open-source data analysis and manipulation library for Python, providing data structures like DataFrames for handling and analyzing structured data efficiently.

Hmm...

I personally hate textbook definitions, because I feel they're lifeless. By lifeless I mean they don't convey their true feeling or what they're trying to actually say.

So, let's break it down.

Imagine pandas as your hard working mom.

She cleans the dishes, makes your messy bedroom good to look at, and organizes your chaotic closet with a touch of elegance.

In the same way, pandas brings order to your data chaos. It takes your raw, unorganized information and helps you structure it, making it easy to understand and analyze. With pandas, you can clean, sort, and manipulate your data effortlessly, just like how she turns your cluttered space into an organized environment.

Whether you're tracking your expenses, sorting out your contacts list, or analyzing survey results, pandas can do it all in just a few lines of code. It provides a structure called a DataFrame, which is like similar to Excel spreadsheet, but better. You can load data into it, manipulate it, and analyze it, all without putting minimal effort.

Okay, enough talk. Let's see a code snippet on how we use pandas. Don't worry, it's not a code that we won't be able to understand. Here it goes...

import pandas as pd

# Load some data into a DataFrame
data = {
    'Item': ['Apple', 'Banana', 'Orange'],
    'Quantity': [10, 5, 8],
    'Price': [15, 10, 12]
}

df = pd.DataFrame(data)

# Calculate the total cost for each item
df['Total'] = df['Quantity'] * df['Price']

print(df)

I won't copy paste the definition of what this code means from ChatGPT. Instead we will break this down and understand it together.

Firstly always remember that if we want to use any libraries, we have to use the import statement. We have the done the same thing here too. We have imported pandas library and renamed it as pd. We can renamed it as we see fit and use our own names, but generally its renamed as pd for simplicity.

Once we have imported pandas, we start by creating a dictionary. We can then create a DataFrame from this dictionary, or we could import data from a .csv or .xlsx file.

Usually, when working with a large dataset, we import the dataset in its original format. But for understanding purposes, we will create a small DataFrame here.

Now the data created here is in the form of python dictionary. This dictionary has three keys: 'Item', 'Quantity', and 'Price' and each key is associated with a list of values.

Now we convert and load this data into a DataFrame using the pd.DataFrame function from pandas.

After that, we used simple arithmetic to calculate the total cost for each item and printed it as the output.

Commonly used functions

We won't be getting into more advanced pandas functions. Rather I feel it's better to understand and know the more commonly pandas functions. So, let's start by defining a basic code:

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
  1. pd.read_csv()

    Reads a CSV file into a DataFrame and is useful for loading datasets.

df = pd.read_csv('data.csv')
  1. df.head()

    Displays the first few rows of the DataFrame for quick inspection of data.

     df.head(2)
    
        Name  Age
     0  Alice   25
     1    Bob   30
    
  2. df.info()

    Provides statistical summaries of numerical columns for getting insights into the distribution and characteristics of the data.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    2 non-null      object
 1   Age     2 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 150.0+ bytes
  1. df.describe()

    Provides summary statistics of numerical columns.

     df.describe()
    
             Age
     count   2.0
     mean   27.5
     std     3.54
     min    25.0
     25%    26.3
     50%    27.5
     75%    28.8
     max    30.0
    
  2. df.loc[] and df.iloc[]

    Accesses rows and columns by labels (loc) or integer positions (iloc).

df.loc[0]   # Using label-based indexing
df.iloc[1]  # Using integer-based indexing

For df.loc[0]:

Name    Alice
Age         25
Name: 0, dtype: object

For df.iloc[1]:

Name    Bob
Age      30
Name: 1, dtype: object

Well, those are a few of the many functions used. Now that you have gone through all these, it will be easier for us to understand what's happening when we use the pandas library for any task we're assigned!

The main reason for writing this article was not to provide an in-depth understanding, but to explain what pandas are, how to use them, and how to get started with them. By now, I sincerely hope that everyone reading up to this point has understood this.

It's a start, and we will be talking on various other topics in this interesting field of AI. Thank you for joining me on this journey, and I look forward to exploring more exciting topics in artificial intelligence with you in the future.

21
Subscribe to my newsletter

Read articles from Mehul Pardeshi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mehul Pardeshi
Mehul Pardeshi

I am an AI and Data Science enthusiast. With hands-on experience in machine learning, deep learning, and generative AI. An active member of the ML community under Google Developer Clubs, where I regularly leads workshops. I am also passionate about blogging, sharing insights on AI and ML to educate and inspire others. Certified in generative AI, Python, and machine learning, as I continue to explore innovative applications of AI with my fellow colleagues.