Pandas cheatsheet

David singhDavid singh
3 min read

Basic Pandas Commands for Data Loading, Exploration, and Manipulation

  1. import pandas as pd #To import pandas library in notebook

  2. df = pd.read_csv('filename.csv') or pd.read_csv(r’location/folder/filename.csv’) or pd.read_excel(’filename.xlsx’) #To Load data from a CSV file or Excel file into a DataFrame

  3. df.head() or df.head(n) #Display the first 5 rows (by default) or size = n number of rows of the DataFrame

  4. df.info() #Get a concise summary of the DataFrame

  5. df.describe() #Generate descriptive statistics of the DataFrame

  6. df['column_name'] or df.column_name OR df[’column_name][index_number] #Access a specific column OR access specific column’s row value

  7. df[['col1', 'col2']] #Access multiple columns

  8. df.loc[row_index] or df.loc[ : , 3] or df.loc[0, ‘column_name’] #Access a row by index or all “rows” and “4rth” column or 1st row and column_name column intersecting value

  9. df.iloc[row_index] or df.iloc[[0,1,2],0] or df.iloc[-5:] #Access a row by integer position or 1st 2nd 3rd row and 1st column or last 5 rows with all columns

Choosing between loc and iloc

  • iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9.

  • loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10. loc can index any stdlib type: strings

  1. df.isnull().sum() #Check for missing values in the DataFrame

  2. df.set_index(”Title”) #This is useful if you can come up with an index for the dataset which is better than the current one

  3. df.dropna() #Remove rows with missing values

  4. df.fillna(value) #Fill missing values with a specific value

  5. df['column'] = df['column'].astype('int') # Convert column data type

  6. df.groupby('column').mean() # Group by column and calculate the mean

  7. df.sort_values('column') # Sort DataFrame by a specific column

  8. df.merge(other_df, on='key') # Merge DataFrames on a key column

  9. pd.concat([df1, df2]) # Concatenate DataFrames

  10. df['date'] = pd.to_datetime(df['date']) # Convert column to datetime

  11. df.resample('M').mean() # Resample time-series data to monthly frequency

  12. df.shape #(number of rows x number of columns)

  13. df.duplicated() and df.duplicated().sum()

  14. df.drop_duplicates() #Drops/deletes duplicate rows

CONDITIONAL SELECTION

  1. df.loc[df.Country == “Italy”] #df.Country == “Italy” This operation produced a Series of True/False booleans based on the country of each record. This result can then be used inside of loc to select the relevant data.

  2. df.loc[(df.Country == “Italy”) & (df.points≥90)] # and &, or |

CONDITIONAL SELECTORS

  1. isin() #isin lets you select data whose value "is in" a list of values . For eg df.loc[df.Country.isin([”Italy”,”France”])] #select wines only from Italy or France

  2. isnull() ,notnull() #These methods let you highlight values which are (or are not) empty (NaN). For eg df.loc[df.price.notnull()]

ASSIGNING DATA

  1. df[”Country”] =”India” #Every value of column is changed to “India”

SUMMARY FUNCTIONS

  1. df.country.describe() #This method generates a high-level summary of the attributes of the given column. for only NUMERICAL DATA

  2. df.column_name.mean()

  3. df.col_name.unique() #To see a list of unique values we can use the unique() function

  4. df.col_name.value_counts() #To see a list of unique values and how often they occur in the dataset, we can use the value_counts() method:

0
Subscribe to my newsletter

Read articles from David singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

David singh
David singh

Aspiring Data Scientist , Proficient in Python & its libraries (pandas, numpy, matplotlib/seaborn, scikit-learn), SQL, and Advanced Excel for data analysis and manipulation. Skilled in data visualization and making interactive dashboards using Tableau and Power BI. Experienced in MERN stack for full-stack development and Java for programming. Passionate about AI/ML and working with Linux-based systems. Actively contributing to open-source projects and eager to leverage data-driven insights to solve real-world problems.