decision tree boosting and xgboost
Boosting is a way of making decision tree ensembles both more efficient as well as more accurate. It derives from a practice that you may be already familiar with known as deliberate practice. When preparing for a test, it is often a smart practice to not revise the whole syllabus from start to finish over and over again but rather focus on only a handful of areas that you are not good at in order to get better. Boosting is essentially the same thing but for decision trees. Instead of creating new trees based on all m examples picked using equal probability (1/m), we’ll make examples that were misclassified by previously trained trees have a higher probability of being picked.
The most widely used boosting implementation is called XGBoost (eXtreme Gradient Boosting). It is an extremely fast and efficient implementation that has a good choice of default splitting criteria and criteria for when to stop splitting. It also has built-in regularization to prevent overfitting. Rather than using random sampling with replacement, it assigns different weights to different training examples, so it doesn’t require generating lots of randomly chosen training sets, which makes it even more efficient.
Subscribe to my newsletter
Read articles from Sanika Nandpure directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Sanika Nandpure
Sanika Nandpure
I'm a second-year student at the University of Texas at Austin with an interest in engineering, math, and machine learning.