Kickstart Your Machine Learning Journey: A Summary of Kaggle's Intro Course


I recently completed the "Intro to Machine Learning" course on Kaggle, which you can find here. I believe it's an excellent starting point for those interested in machine learning, particularly with Python libraries like pandas or scikit-learn. In this and next articles, I aim to summarize the most important knowledge gained during a course.
In general, our goal in machine learning is to create a model that can predict specific values based on data.
The first important step in data analysis is to understand the data you are working with.
import pandas as pd
titanic_file_path = './data/Titanic.csv'
# read the data and store data in DataFrame titled melbourne_data
titanic_data: DataFrame = pd.read_csv(titanic_file_path)
# print a summary of the data in titanic data
print(titanic_data.describe())
print(titanic_data.head(20))
In this example I am using Titanic passengers data downloaded from https://www.kaggle.com/datasets
Here is script output of describe method
It's important to understand the meaning of each row.
count - shows number of rows in each column without missing values. In count column we can check quickly how many row misses data
mean - average value e.g in PassengerId it is 500 as max value is 1000
std - this is standard deviation which measures how numerically spread out the values are
min - the smallest value in column
25% - 25th percentile - number that is bigger than 25 % of the values and smaller than 75 % of values
50 % - 50th percentile - number that is bigger than 50 % of the values and smaller than 50% of values
75% - 75th percentile - number that is bigger than 75% of the values and smaller than 25% of values
max - max value in column
IIt's also beneficial to use the head
method, which allows us to specify the number of initial rows we want to view in the output. This is useful for directly inspecting the data within our data frame, helping us identify null values or data that may require format changes, such as dates.
Subscribe to my newsletter
Read articles from Jakub Sokolowski directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
