Mastering Basic Operations with Pandas
Pandas, an open-source data analysis and manipulation library, provides high-performance, easy-to-use data structures and tools for working with structured data. Whether you're a data scientist, a researcher, or a business analyst, mastering Pandas can significantly enhance your capabilities in handling, cleaning, and analyzing data efficiently.
What is Pandas?
Pandas, developed by Wes McKinney in 2008, builds upon the strengths of NumPy. The name "Pandas" has a reference to both "Panel Data". Pandas is a Python library used for working with data sets. It is used for analyzing, cleaning, exploring, and manipulating data. Pandas allow us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Pandas help you to find answers about the data. Like:
Is there a correlation between two or more columns?
What is the average value?
Max value?
Min value?
There are main data structures in pandas:
Series — 1D. ( A Pandas Series is like a column in a table.)
DataFrame — 2D. (Table-like structure contains rows and columns.)
Installation of pandas
Install it using this command:
C:\Users\Your Name\>pip install pandas
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas as pd
Now the Pandas package can be referred to as pd
instead of pandas
.
Create a dataframe from a dictionary
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Create a simple Pandas Series from a list:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Create Labels
With the index argument, you can name your own labels.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar) # output
#x 1
#y 7
#z 2
#dtype: int64
Pandas use the loc attribute to return one or more specified row(s)
#use a list of indexes:
print(df.loc[[0, 1]])
Load Files Into a DataFrame
1.CSV file
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Tip: use to_string() to print the entire DataFrame.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
2. JSON file
JSON is plain text, but has the format of an object.
import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())
Conclusion:
In the world of data analysis with Python, mastering Pandas is a must for anyone dealing with structured data. Its intuitive syntax, powerful functionalities, and seamless integration with other Python libraries make it the go-to choice for data manipulation and analysis tasks. So, dive into the world of Pandas and unlock the true potential of your data analysis skills.
Subscribe to my newsletter
Read articles from Srushti directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by