#87 Machine Learning & Data Science Challenge 87

What do you understand by TF-IDF?

TF-IDF:

It stands for the term frequency-inverse document frequency.

TF-IDF weight:

  • It is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.

  • The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.

  1. Term Frequency (TF):
  • It is a scoring of the frequency of the word in the current document.

  • Since every document is different in length, it is possible that a term would appear much more times in long documents than in shorter ones. The term frequency is often divided by the document length to normalize

  1. Inverse Document Frequency (IDF):
  • It is a scoring of how rare the word is across the documents. It is a measure of how rare a term is, the Rarer the term, and more is the IDF score.

Thus,

0
Subscribe to my newsletter

Read articles from Bhagirath Deshani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bhagirath Deshani
Bhagirath Deshani

Hello everyone! I am Machine Learning Engineer. I am from India. I have been interested in machine learning since my engineering days. I have completed Andrew NG’s original Machine Learning course from Stanford University at Coursera and also completed the IBM course on Machine Learning and Deep Learning. Currently, I am working on Machine Learning and Data Science project. My goal is to use the skills I have acquired to solve real-world problems and make a positive impact on the world.