Getting Started With Machine Learning: What You Need to Know
What is Machine Learning?
Machine Learning can be considered as a child learning from their daily observations or from the mistakes they make and the feedback he receive from parents. We train a child based on their experience, we train a computer based on some data.
Machine learning is programming computers to optimize a performance criterion using example data or past experience. Machine learning can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest.
Machine learning, based on statistics, is basically attempting to find the relationship between input and output variables.
For example, a real estate agent who wants to price a particular property will have:
Output variable: Price of property (Y)
Input variables: Area covered (X1), Number of bedrooms (X2), proximity to a landmark (X3), proximity to market (X4), recent sale price of a neighborhood property (X5) and so on
The real estate wants to find out Y = f(X1, X2, X3, X4, X5…) So that whenever s/he gives a value of the input variables to this function, s/he can get the price of the property.
f(x) defines the relationship between dependent and independent variables.
Supervised vs Unsupervised learning
Supervised learning is how you learned alphabets or fruit names in childhood, where you have input variables (x= some fruit) and an output variable (Y= name of that fruit) and you use an algorithm to learn the mapping function from the input to the output. In this learning we provide labelled data to train our model.
Unsupervised learning is how you learned to differentiate humans from animals where you only have input data (X) and no corresponding output variables, hence the data is not labelled. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
we will see different models of ML to predict outcome from given input in next blogs of this series. now let's gather some prerequisites required for ML.
Prerequisites:
statistics
Before diving into the world of machine learning you must know some basic concepts of statistics. I recommend you to quickly go through basic of statistics discussed below.
Statistic consist of two major types, the descriptive statistic and inferential statistic. Descriptive statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures. descriptive statistic is about "measures of central tendency" which includes mean, median, mode of a sample data; "measure of dispersion" i.e. range, standard deviation and variance. Inferential statistics consists of methods that use sample results to help make decisions or predictions about a population. It includes hypothesis testing, ANOVA, Chi-Squared Test and regression.
These concepts will help you to understand ML models and the logic behind them. before moving further lets talk about the types of data you will see in ML journey.
Types of data
while dealing with dataset for training ML models you will find two types of data Quantitative and Qualitative/ Categorical data.
Data that can be measured numerically is called a quantitative data. There are two types of Quantitative variables:
Discrete Variables - A variable whose values are countable is called a discrete variable. In other words, a discrete variable can assume only certain values with no intermediate values. Example: Number of heads in 10 tosses etc.
Continuous Variables - A variable that can assume any numerical value over a certain interval or intervals is called a continuous variable. Example: Height of person etc.
A variable that cannot assume a numerical value but can be classified into two or more nonnumeric categories is called a qualitative or categorical variable. There are two types of Qualitative variables:
Nominal Variables The values are not ordered. Example: Nationality, Gender etc.
Ordinal Variables - The values are ordered or ranked. Example: Satisfaction score (Not satisfied, Satisfied, Delighted), Spiciness of food (Less spicy, mild & Hot
This knowledge will help you to understand and classify your data accordingly. now take a look at python libraries you need to know for ML.
Python Libraries
For data processing , data visualization and using various ML models you need a sufficient knowledge about these python libraries.
NumPy : NumPy is a fundamental library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently.
here's how we import and use NumPy for array creation
import numpy as np # Create a NumPy array arr = np.array([1, 2, 3, 4, 5]) print("NumPy Array:", arr) # output #NumPy Array: [1 2 3 4 5]
Pandas: we mostly use pandas for data manipulation and analysis. It offers data structures like Series and DataFrame, making it easy to handle and analyze structured data, perform operations like merging, filtering, and grouping, and quickly gain insights.
import pandas as pd # Creating a DataFrame from a dictionary data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28], 'City': ['New York', 'Los Angeles', 'Chicago']} df = pd.DataFrame(data) print(df) # read data from csv file df = pd.read_csv('path/filename.csv')
Seaborn : Seaborn is a Python data visualization library built on top of Matplotlib, designed for creating attractive and informative statistical graphics. It simplifies the process of making complex visualizations like heatmaps, violin plots, and categorical plots, making data analysis more intuitive.
import seaborn as sns
Understanding the basics of machine learning and its prerequisites, such as fundamental concepts and essential statistics, is crucial for building a solid foundation in this field. Tools like NumPy, pandas, and Seaborn will be invaluable as you dive deeper into data analysis and model building.
As we continue this journey into the world of machine learning, these skills and tools will become even more essential. Stay tuned for our next post, where we'll explore the core algorithms of machine learning and how to apply them.
I'm providing here the GitHub repository in which you will get python codes of all the discussion in this series.
Happy learning!
Subscribe to my newsletter
Read articles from Omkar Kasture directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Omkar Kasture
Omkar Kasture
MERN Stack Developer, Machine learning & Deep Learning