Why a very large number of trees won’t overfit Boosting?

We know that boosting fit residuals from each of the previous trees subsequently, then here comes with a question– is boosting then resembles a very large tree in some sense as it is growing vertically? And if yes, it would be affected by the number of trees, i.e. too many trees would cause overfitting. We can do a small experiments: I know 1000 trees give me an optimal model, then I grow 10,000 trees and found the results almost the same, just like random forest.

If you think about the problem as it origins — “a gradient descent solution”, then it seems quite straight forward: Boosting each time use residuals from all of the observations to build the next tree, if the gradient does not descend any more (get stuck in a minimum), then the predictions stay the same. This is the main difference of it from a very large tree, which do not descend the gradient but keep splitting at each nodes using a “local optimiser, i.e. find the split that lead to the least variance in each segments”. The segments are becoming smaller and smaller, until you completely overfit.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s