Most people using XGBoost got the experience of model over-fitting. I earlier wrote a blog about how cross-validation can be misleading and the importance of prediction patterns https://wordpress.com/view/tomatofox.wordpress.com . This time I just want to note down some very practical tips, something that cross-validation (e.g. with grid search) can’t tell us.
- The model overfitting is likely caused by the learning rate being too high. The default, 0.3 is usually too high. You can try 0.005, with a dataset of more than 300 observations.
- A very misleading statement in many publications and tutorials is that too many trees in XGBoost (or boosting in general) causes over-fitting. This is ambiguous. Many trees will NOT decrease your model performance. Increasing from the optimum setting of trees will only adding to your computational burden, but it is as safe as trees in random forest! With as many trees you can imagine, the model is as general as with less trees because the gradient just got stuck! The cross-validation result won’t change, neither will the prediction pattern.
- So the advice is to use a very low learning rate, say 0.001, and set as many trees as you like, say 3000, for the best fitting. Then for faster achievement of the results, reducing the number of trees. If you use a learning rate of 0.001, 1000 trees should be enough to find the global minimum, as the 0.001*1000 is already 1, the same as doing the gradient descent once with learning rate 1.