hastie

03 Aug 2021

How confident is Random Forest about its predictions?

Given a prediction on a particular example, how sure is Random Forest about it? For answering this question it is necessary to look beyond usual performance metrics and dive into the swampy waters of the confidence interval estimation for statistical learning algorithms 😖. [6 min read] (updated 11/21/22)

14 May 2021

Thoughts about differences in ML evaluation for Academia and Industry

The processeses and the methods followed in Academia for evaluating a Machine Learning Model are different from the approaches used by the Industry. Why? [4min read]

14 Dec 2020

Feature Selection Strategies

Feature selection is a topic any machine learning practicioner should master. There are plenty strategies for performing feature selection. Some more useful than others. Some with more limitation than benefits. Here, I mention the most common approaches for feature selection using information collected from articles, books and research papers. [5 min read]

16 Nov 2020

Three Common Ways for Comparing Two Dataset Distributions

From time to time you will need to compare the distribution of two datasets. There are plenty of information about this topic in statistics books and all over the Internet. In this post I discuss three very practical approaches coming from different perspectives. [3.5 min read](updated 04/01/2021)