Do you know 40 trees already can brings a big leap to the prediction with complex dataset and predictor-response relationships? I recently tested this for random forest and XGboost, and am surprised by the huge progress it could made with a small increase in the number of trees.
Another lesson I learned is about the learning rate for boosting. Though the from the cross-validation process can look slow enough, it could still be too fast and be the cause of artefacts, or strange patterns in the spatial prediction. I often spend a long time to tune the XGBoost, using both grid-search cross-validation and manual tuning, but despite the prediction accuracy in the end could look completely rational, the prediction patterns always shows too much edges and inconsistency that an expert can tell they are impossible. For example, very low values on the roads but high values next to the road. I found plotting the spatial patterns extremely helpful in the hyper-parameter tuning process, as opposing to only based on a cross-validation accuracy matrix (RMSE, R2, IQR, MAE, etc.).
For this, I made an R shiny page for playing around different hyper-parameters, check it out :-).
https://lumeng0312.shinyapps.io/xgboost/?_ga=2.229717724.1995623365.1592166857-2130394652.1592166857