🐼 Pandas for Everyone: From Basic to Advanced

Nitin KumarNitin Kumar
3 min read

1️⃣ What is Pandas & Why Use It?

Pandas is a powerful Python library used for data manipulation and analysis. It introduces two main structures:

Why use Pandas?

  • Handles large, tabular data effortlessly

  • Performs statistics (mean, sum), grouping, merging

  • Built-in support for dates, missing values, CSV/Excel

  • Fast and intuitive – perfect for class projects


2️⃣ Installation & Import

Install with pip:

pip install pandas

Import in Python:

import pandas as pd
import numpy as np  # often used together

3️⃣ Creating Pandas Objects

Series:

s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

Output:

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Here, NaN stands for missing data .

DataFrame:

dates = pd.date_range("2023-01-01", periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list("ABCD"))
print(df)

Creates a 6×4 table with random numbers and dates as row labels .


4️⃣ Key Attributes of DataFrame

Let's inspect a DataFrame’s attributes:

print(df.shape)     # (6, 4)
print(df.ndim)      # 2
print(df.size)      # 24
print(df.index)     
print(df.columns)
print(df.dtypes)
print(df.values[:2])

5️⃣ Viewing Data: .head() & .tail()

print(df.head(3))
print(df.tail(2))
  • head(n): first n rows

  • tail(n): last n rows


6️⃣ Selection & Indexing

print(df['A'])          # column A (as Series)
print(df[['A','B']])    # DataFrame of A & B
print(df.loc[dates[0]]) # by label
print(df.iloc[2])       # by integer position
print(df.at[dates[1], 'B'])  # single value label-based
print(df.iat[2,1])           # single value position-based

7️⃣ Basic Computations

print(df.mean())       # column-wise
print(df.mean(axis=1))# row-wise
print(df['A'] + df['B'])

Supports quick math and statistics, handling NaN values intelligently.


8️⃣ Handling Missing Data

print(df.isna())
df2 = df.dropna()      # drop any row with NaN
df3 = df.fillna(0)     # fill NaN with 0

Methods like .isna(), .dropna(), and .fillna() help clean data hossainlab.github.io+1medium.com+1.


9️⃣ Adding, Renaming & Replacing Columns

df['E'] = df['A'] + df['B']  # new column
df.rename(columns=str.lower, inplace=True)
df.columns = [c.upper() for c in df.columns]

Easy to manipulate column names and add new data.


🔟 Merging, Concatenating & Reshaping

Merging:

merged = pd.concat([df, df], axis=0)  # stack vertically

Pivoting:

df_long = df.reset_index().melt(id_vars='index', var_name='col', value_name='val')
print(df_long.head())

Learn more about merge, concat, pivot_table, etc. youtube.com+11pandas.pydata.org+11en.wikipedia.org+11zh.wikipedia.org


1️⃣1️⃣ Grouping & Aggregation

df['Group'] = ['X','X','Y','Y','X','Y']
grp = df.groupby('Group').agg({'A':'mean', 'B':'sum'})
print(grp)

Powerful tool to split data and compute stats


1️⃣2️⃣ Working with Time Series

df_ts = df.copy()
df_ts.index = pd.date_range(start='2023-01-01', periods=len(df))
print(df_ts['2023-01-01':'2023-01-03'])
print(df_ts.resample('D').mean())

Filter by dates and resample for daily, weekly summaries.


1️⃣3️⃣ IO: Reading & Writing Data

df.to_csv('data.csv')
df2 = pd.read_csv('data.csv', index_col=0, parse_dates=True)

Supports CSV, Excel, JSON, SQL, and more


🎓 Tips for Students

  • Use .head() & .tail() often to peek at data

  • Always clean missing values before analysis

  • Explore .describe() for summary stats

  • Combine methods (e.g., df.groupby().sum()) for powerful pipelines


✅ Wrap-Up

Pandas combines the speed of NumPy with rich data handling—making it perfect for ML, science, and school projects. With this guide, students can confidently explore data like pros 🏆.

0
Subscribe to my newsletter

Read articles from Nitin Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nitin Kumar
Nitin Kumar