Mastering Basic Operations with Pandas

Srushti Srushti
3 min read

code resource

Pandas, an open-source data analysis and manipulation library, provides high-performance, easy-to-use data structures and tools for working with structured data. Whether you're a data scientist, a researcher, or a business analyst, mastering Pandas can significantly enhance your capabilities in handling, cleaning, and analyzing data efficiently.

What is Pandas?

Pandas, developed by Wes McKinney in 2008, builds upon the strengths of NumPy. The name "Pandas" has a reference to both "Panel Data". Pandas is a Python library used for working with data sets. It is used for analyzing, cleaning, exploring, and manipulating data. Pandas allow us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

Pandas help you to find answers about the data. Like:

  • Is there a correlation between two or more columns?

  • What is the average value?

  • Max value?

  • Min value?

There are main data structures in pandas:

  • Series — 1D. ( A Pandas Series is like a column in a table.)

  • DataFrame — 2D. (Table-like structure contains rows and columns.)

Installation of pandas

Install it using this command:

C:\Users\Your Name\>pip install pandas

Import Pandas

Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

Create a dataframe from a dictionary

import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)

Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

Create Labels

With the index argument, you can name your own labels.

import pandas as pd

a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)     # output 
#x    1
#y    7
#z    2
#dtype: int64

Pandas use the loc attribute to return one or more specified row(s)

#use a list of indexes:
print(df.loc[[0, 1]])

Load Files Into a DataFrame

1.CSV file

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

Tip: use to_string() to print the entire DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')
print(df.to_string())

2. JSON file

JSON is plain text, but has the format of an object.

import pandas as pd

df = pd.read_json('data.json')
print(df.to_string())

Conclusion:

In the world of data analysis with Python, mastering Pandas is a must for anyone dealing with structured data. Its intuitive syntax, powerful functionalities, and seamless integration with other Python libraries make it the go-to choice for data manipulation and analysis tasks. So, dive into the world of Pandas and unlock the true potential of your data analysis skills.

0
Subscribe to my newsletter

Read articles from Srushti directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Srushti
Srushti