random forest

18 Feb 2022

Tackling the limitations of tree-based algorithms

Tree-based algorithms suffer from severe limitations when applied to forecasting problems. They can't predict beyond observed training data points values. However, not everything is lost. There are some alternative approaches to improve the performance of the tree-based algorithm under such scenarios. [5min read]

12 Sep 2021

SHAP values with examples applied to a multi-classification problem.

We can not continue treating our models as black boxes anymore. Remember, nobody trusts computers for making a very important decision (yet!). That's why the interpretation of Machine Learning models has become a major research topic. SHAP is a very robust approach for providing interpretability to any machine learning model. For multi-classification problems, however, documentation and examples are not very clear. [8min read]

03 Aug 2021

How confident is Random Forest about its predictions?

Given a prediction on a particular example, how sure is Random Forest about it? For answering this question it is necessary to look beyond usual performance metrics and dive into the swampy waters of the confidence interval estimation for statistical learning algorithms 😖. [6 min read] (updated 11/21/22)

29 Jun 2021

Deploying a simple ML model with Plumber 101

Sometimes notebooks are not enough and you will need to deploy your machine learning model into company infrastructre. The task involves a lot of Software Ingenieering knowledge, BUT with Plumber package for R you can do the basics with not so much pain 😉. [6 min read]

14 Dec 2020

Feature Selection Strategies

Feature selection is a topic any machine learning practicioner should master. There are plenty strategies for performing feature selection. Some more useful than others. Some with more limitation than benefits. Here, I mention the most common approaches for feature selection using information collected from articles, books and research papers. [5 min read]

06 Aug 2020

Features Selection Resources

Beware of Random Forest GINI index for feature importance. Some other resources related with feature selection such as how to use PCA and the problems (or not) behind colinearity.