Building My First Image Classifier with fast.ai

Keerthi K VKeerthi K V
5 min read

I built an image classifier in under 15 mins without any knowledge about the internal architecture. Trust me this is so cool.

I have been going through the fast.ai deep learning course for a few days now, I have binge watched all of the part 1 videos, I did run few notebooks from chapters and tried few things along the way, I wanted to get the big picture at once, so I kept going. Now I am trying to deep dive into each chapter and learn concepts in depth. In this article I am going to explain the things I have learned from chapter 1, and also share a simple variation project from the “is it a bird“ classifier.

What is a model ?

Typically we have been writing computer programs by giving step by step instructions on what action is to be performed. For example to add 2 numbers, we write code which takes 2 numbers as inputs, write code to add those and write code to output. Bigger the problem, bigger the code.

Now assume we want to write a program which can take an image as an input and tell us if the image has a bird in it. How can we do this ?. Programming such a thing using traditional methods would require huge amount of work and consume lot of time. Even then I am not sure we would be able to get good results.

What if we had a way to just show the computer some images of a bird and some images which doesn’t have a bird and let it figure out the way to tell us if there is a bird in an image. Isn’t this how humans learn to recognize things ? We see things and try to memorize it next time, of course humans don’t need to see 100 images of birds to say something is a bird. But could computers really learn this way? Turns out — yes. A computer scientist named Arthur Samuel came up with such an idea and this is the basis of our modern machine learning advancements.

So unlike for a program, for a model we provide some labelled data and ask it to understand the data. This is called training a model. its important to note that we provide labelled data, so that computer can understand what exactly it is. Once we have trained the model, this model can be used just like a program. Pass the input and get the output.

Thought process behind models

Mathematically speaking, models are things that fit function to data. What do i mean by fitting a function to data ? If we look at the world around us, we can see that most of it has some kind of pattern in it, stock market, weather, human psychology and even crime has patterns. So if we were to collect the data about anything and plot it on a graph, there must be some kind of pattern visible. Since real world data has multiple dimensions it may be difficult to actually plot and visualize the data on a graph, but I hope you get the point. Now a mathematical function can be thought of as a rule which establishes a relation with input to the output. For example you might have learnt about the function f(x)=x² which gives a curved line plot on a 2d graph. This function has only one variable x and value of f(x) is the square of x. If there was some data which upon plotting on a 2d graph created a pattern which resembles graph of f(x)=x², then we could just use this function to the predict the data points in future.

But, real world data will have many more variables and complex patterns, how do we find the perfect function which draws the graph in a pattern similar to the data pattern ? To do this we use a very flexible function which can have any number of parameters, meaning it can almost represent any kind of pattern and fit this very flexible function to mimic the data pattern.

Key terminologies

  1. Overfitting: When the model learns about the training data too well that it almost memorizes the data, even the noise and fluctuations are memorized. This seems good while training since model knows about the data, but when model is used on real world scenario it fails. This is called overfitting and we should be careful not to over fit the function.

  2. Loss: it is the quantified value of the difference between model’s predicted and actual value, we use this loss to make the function fit better. Ultimately our goal is to reduce the loss which means that the difference between predicted and actual values will shrink, indicating the model getting better!!

  3. Epoch: This is a number that we set which tells the model, how many times should one data point be viewed, in case of image models, n epoch means the model views each image n times.

  4. Validation set: While training the model, we should keep certain portion of data hidden from the model, we call this the validation set. After each epoch we validate the model performance using this data. Since this data is kept hidden, we can get an idea of how the model reacts to new data.

  5. Metric: is the measure of quality of model’s predictions, we calculate this metric by testing the model on validation set at the end of each epoch. E.g. Accuracy, Error rate.

  6. Transfer learning: For tasks like image recognition we can use a pretrained model and fine tune it to work for our use case. This way of modifying an already existing model for our purpose is known as transfer learning.

Hands on work

fast.ai has a lot of very simple apis which make building simple models as easy as pie. Here are the links to notebooks which use fast.ai to build image classifiers.

  1. https://www.kaggle.com/code/jhoward/is-it-a-bird-creating-a-model-from-your-own-data: This is the original notebook from Jeremy Howard, which shows how to build an “is it a bird“ model.

  2. https://www.kaggle.com/code/neuralkeerthi/what-is-it: This is my simple variation of the above notebook to classify multiple categories.

References:

https://course.fast.ai/Lessons/lesson1.html

https://github.com/fastai/fastbook/blob/master/01_intro.ipynb

0
Subscribe to my newsletter

Read articles from Keerthi K V directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Keerthi K V
Keerthi K V