XGBoost
Published:
From the latest post about Machine Learning, I dealt with boosting and bagging. Here, in this post, I want to introduce XGBoost, one of boosting algorithms.
As the name of XGBoost comes from Extreme Gradient Boosting, it is a kind of boosting, which dominates winning algorithms from reent Kaggle competition.
Goodness of XGBoost
Using XGBoost is not only easy to implement but also is fast and convenient. Now, let’s look at what makes XGBoost to get better performance and usability.
Continuously evolving XGBoost fixes itself as it trains the data. Let \(F_{0}\) is initial model and it occurs error \(H_{0}\). Then next model \(F_{1}\) is derived from \(F_{0}\) and \(H_{0}\), so below formula can be adjusted as data is trained. \(F_{n} = F_{n-1} + H_{n-1}\) Whit this continuous step, it makes user worry about overfitting. There is second advantage of XGBoost.
Regularization While iteratively adding trees to XGBoost model, it keeps learning rate small from new trees, so overfitted trees don’t effect the model that much.
Pruning XGBoost grows the tree upto max depth and then prune the tree backward until the improvement in loss fuction is below a threshold. (※ For decision tree algorithms, it uses Pruning, which stops once a negative loss is encountered)
Deal with missing values Based on tree algorithm, in each branch XGBoost deals missing values with default direction which has learnt from the data. However, it would be better to preprocess the NA values before using model.
How to use XGBoost
Then how to use this XGBoost. We can easily use in R or Python by simply install and inlude library. And we can control several parameters when using this. Let me introduce how and which parameters we can control.
General Parameters
- Booster Booster type setting : You can choose gbtree, gblinear or dart as the problem
- nthread Activate parallel computation and if you do not change this, you can use maximum cores
Booster Parameters
- nrounds Maximum number of iterations, and if the problem is for classification, it is about the number of trees.
- eta Learning rate. So lower eta makes slower computation
- gamma About regularization. Higher the gamma, penalizes large coefficients which don’t improve the performance.
- max_depth Depth of tree
- min_child_weight Minimum number of instances required in each child, it blocks the potential feature interactions to prevent overfitting.
- lambda & alpha Controls L2 & L1 regularization for each.
Summary
This is what I learned and used with XGBoost. Python has similar parameters to adjust XGBoost. I refered from below paper and post. Paper : XGBoost: A Scalable Tree Boosting System - Tianqi Chen Post : XGBoost beginners tutorial