R, Python, or both in Machine Learning?

As an R user for almost ten years, I’m gradually switching to Python for machine learning, and pretty much everything. Not to mention deep learning, which has the community almost exclusively in Python, for other data science methods python sees a community growing faster than R. There are no doubt lots of developments in R that are not available yet in Python, like distributional forest, empirical fluctuation process-based time series structural change detection methods (party, efp, bfast, etc.) and the ggplot is extremely powerful. But that’s becoming less and less. Relatively more recent methods, such as catboost or XGBoost, have better Python APIs. classical geospatial analysis methods such as GWR (geographically weighted regression) and Gaussian processes also see lots of developments in Python. The tidyr tools also come naturally in Python (I.e. the pipes are not really needed as the programming is already fully object-based).

The Python array, data frame, geodata handling are becoming more powerful every day, but in R slower. I quite often implement things in R first as I’m most accustomed to it, but always I found a solution in Python that is more neat and simpler.

I won’t say bye to R though as there are still lots of complementary tools, it is convenient for me sometimes and several of my collaborations are still based in R, the community in R is still growing and upcoming. Programming languages function as communication tools. Emerging ideas from R are as exciting as in Python. Just to say for anyone serious about machine learning and is still a faithful R user, you will benefit from being bilingual.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s