Gradient Boosting (GBM)

In theory are just as fast to train as random forests, but in practice you will have to try lots of different hyperparameters. They can overfit. At inference time they will be less fast, because they cannot operate in parallel. But they are often a little bit more accurate than random forests.

🔧 Hyperparameters

Random Forest
Gradient Boosting
Gradient Boosting
🔷 Number of trees N_estimators num_round 💡 num_iterations 💡 [10,…,1000]
🔷 Max depth of the tree max_depth max_depth max_depth 3,…,10
🔶 Min cases per final tree leaf min_samples_leaf min_child_weight min_data_in_leaf  
🔷 % of rows used to build the tree max_samples subsample bagging_fraction 0.8
🔷 % of feats used to build the tree max_features colsample_bytree feature_fraction  
🔷 Speed of training NOT IN FOREST eta learning_rate  
🔶 L1 regularization NOT IN FOREST lambda lambda_l1  
🔶 L2 regularization NOT IN FOREST alpha lambda_l2  
Random seed random_state seed _seed  
  • 🔷: Increase parameter for overfit, decrease for underfit.
  • 🔶: Increase parameter for underfit, decrease for overfit. (regularization)
  • 💡: For Gradient Boosting maybe is better to do early stopping rather than set a fixed number of trees.