Why a very large number of trees won’t overfit Boosting?

We know that boosting fit residuals from each of the previous trees subsequently, then here comes with a question– is boosting then resembles a very large tree in some sense as it is growing vertically? And if yes, it would be affected by the number of trees, i.e. too many trees would cause overfitting. We can do a small experiments: I know 1000 trees give me an optimal model, then I grow 10,000 trees and found the results almost the same, just like random forest.

If you think about the problem as it origins — “a gradient descent solution”, then it seems quite straight forward: Boosting each time use residuals from all of the observations to build the next tree, if the gradient does not descend any more (get stuck in a minimum), then the predictions stay the same. This is the main difference of it from a very large tree, which do not descend the gradient but keep splitting at each nodes using a “local optimiser, i.e. find the split that lead to the least variance in each segments”. The segments are becoming smaller and smaller, until you completely overfit.

Niche in Geoscience? No.

“Finding a niche” is sort of a “holy grail” that a senior researcher would mentor a young researcher. Many professors believed that being able to find their niche led to their success. But they forgot that that probably held decades ago, and in the modern information time, a “niche” doesn’t exist in Geoscience and should not exist. Anyone can and should be able to build on top of other’s work. Open-science told us this trend lead to the fastest acceleration of science. Twitter (a pioneer to completely open their development platform at the development stage) demonstrated it with how it becomes a giant today.

I was mentored by professors I truly trusted, respected, and appreciated that I should find my niche and I wrote this short blog because I heard people telling others “you should find your niche” or “we should find our niche” a few times recently. I know they are sharing their precious experience and out of the most sincereness, but useful experience has an expiration date. Trying to find a niche, one may go to an extreme of doing things others won’t, invest in the opposite side of open-science, or simply be discouraged and loses the motivation.

Better ways — I just draw from what I saw and want to say to myself– don’t be afraid to choose a very hot topic, sit down but heads up, keep eyes open and keep moving on, surpass the years-long hard-works from others and let others does it in return.

Deep learning resources

There is no better era to self-teach deep learning! Besides the well-known resource platforms such as Kaggle, the machine learning roadmap, I recommend several resources that I found really amazing and the course sequences to follow. The very initial start is still the courses offered on Coursera, or the Standford cn231n (see below the item 4) on Youtube, great accelerators!

  1. A dive into deep learning https://d2l.ai/: “Interactive deep learning book with code, math, and discussions.” This learning material is classi! I got to know this late but anyone can benefit from it at any stage. All the scripts can be ran in google colab. The interpretation is amazing. The Chinese version of it is the top1 seller in the Chinese bookstore market. The Chinese version is great, read as originates from Chinese authors, not as many books with very rough translation. Many university used it for classes already and an AWS space can be applied for free for teaching purposes. This book, interactive as it suggests, may be a better start compared to the two classical deep learning books, namely DEEP LEARNING with PYTHON and DEEP LEARNING, as it is up-to-date, very practical with real-life scripts, and enables discussions.
  2. https://distill.pub/ A fantastic online journal with great visualisations.
  3. https://paperswithcode.com/: Paper and code as the names suggests, this is the great trend pushing by the field of machine learning. In the same vein is the OpenReview.
  4. Courses on Youtube: The sequence to watch I recommend is (1) standford cn231 (the winter or summer semester), which is the most detailed and classical course; (2) MIT 6.S191 which is quite a good introduction of the deep learning realms, less detailed; (3) Unsupervised deep learning by Pieter Abbeel at UC Berkeley for people interested in deep learning or would like to dive deeper. (4) DeepMind x UCL | Deep Learning Lectures, which is more fast-space and advanced, and let the audience glimpse into the newest developments till 2020.
  5. For people who can read Chinese, the CSDN for numerous insightful blogs and resources. The CSDN has been around for ages, but I just got to know it, the articles published deepens my understandings greatly!! I am inspired by the enthusiasm of the community.
  6. Maybe needless to mention, following people’s researchgate, github, twitter, linkins, subscribe to the youtube channels so that you will always be updated.

I will keep updating the list, enjoy learning!

Sharing a great explanation of PCA

PCA analysis was the beginning of my spatiotemporal data analysis journey and went all the way through my PhD study. It can be understood simply as an orthogonal, eigen-decomposition of covaraince matrix, with the variance of each component arranged in decreasing order, however, the links between it and linear regression, ANOVA, etc. are not imprinted in mind and it turned out I kept feeling not understanding it completely and trying to demystify it. Now I found the best illustration that explains my confusion, enjoy reading!

https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

Deep learning in remote sensing image segmentation problems 1 — boundary delineation

Deep learning has been used in building footprint delineation, road extraction, coastline delineation, among others, and the focus is on accurate boundary delineation. Below are the main-stream directions I am aware of, several of them appear in the SOA, some of them in daily experiments. Convincingly suggesting the most optimal methods or the combination of them for an ultimate solution may come soon.

Credit: figure from Su and Zhange 2017, ISPRS journal of remote sensing and photogrammetry.

1. Use Learning Attraction Field Representation.

The method is proposed for line segments detection Learning Attraction Field Representation for Robust Line Segment Detection, which reformulate the problem as a “coupled region colouring problem” [1].

2. Use more boundary-specific loss function.
Loss functions play an essential role in machine learning, lots of loss functions have been proposed, but it is still needed to comprehensive evaluate them:

As a first attempt and for binary segmentation, one can try RMSE on distance metric. Boundary-specific loss functions are proposed in:

2.1 Boundary Loss for Remote Sensing Imagery Semantic

2.2 Boundary loss for highly unbalanced segmentation

3. Extract boundary first with a conventional edge detection algorithm, use it as a feature input for training.

This simple addition has been proposed by a colleague and he obtained an incredible improvement in IOU, from around 55% to 62% in his study case of building detection. This really calls for a comparison between all the other more complex methods: what are the REAL reasons behind the improvements? Many people get increasingly disappointed by current publications as new methods are published with improvements seemingly a matter of chance and without linking to other possibilities.

4. Binary segmentation as an edge detection problem

Current deep learning applications in remote sensing image classification is mostly with image segmentation. Vector labels are commonly rasterised for training, this does NOT have to be the case! For binary problems such as building footprint delineation, one can turn the problem back to the edge detection solutions, this opens a new door of opportunities. For example, crisp edge detection below:

Credit: figure from Huan et al., Unmixing convolusional features for crisp edge detection.