algorithms

19 Mar 2023

Using CNN for a Domain name Generation Algorithm (2)

DGA is a mechanism used by malware for establishing contact with the C2 channel. This is the second post of the series for creating a simple DGA using techniques for text generation. In particular, CNN uses Keras and Tensorflow for R. [6 min read]

12 Mar 2023

Don't be afraid of AI. Embrace it

The use of artificial intelligence (AI) algorithms in various fields are becoming an integral part of our lives. While some people are opposed to their use others have embraced the technology and are using it. I am one of them. [6 min read]

27 Feb 2023

Using CNN for a Domain name Generation Algorithm (1)

DGA is a mechanism used by malware for establishing contact with the C2 channel. The idea behind this post is to show how to create a simple DGA using techniques for text generation. In particular, CNN using Keras and Tensorflow for R. This is the first part of a series of two. [6 min read]

31 Oct 2022

Clustering techniques for time series

The good old clustering analysis techniques present some differences when applied to time series. So many to discuss in one simple post. However, I will do my best to provide some examples of two basic approaches for doing time series analysis [6min read].

28 Sep 2022

Paninimania!!

Nobody has doubts about the importance for humankind of the PANINI sticker album for the FIFA World Cup. From a mathematical point of view, several interesting questions arise. How much money do they need to spend? How many other collectors do they need to interact with? What if a sticker pack had 6 stickers instead of 4? Rodralez, from LABSIN developed an app for answering these and other questions [3min read].

31 Jul 2022

Art with Data

The idea of making art with code is not new, but what about Data? Can data be a work of art? Well, the truth is that thanks to conceptualism, it is possible. Trust me! [4min read]

18 Feb 2022

Tackling the limitations of tree-based algorithms

Tree-based algorithms suffer from severe limitations when applied to forecasting problems. They can't predict beyond observed training data points values. However, not everything is lost. There are some alternative approaches to improve the performance of the tree-based algorithm under such scenarios. [5min read]

03 Aug 2021

How confident is Random Forest about its predictions?

Given a prediction on a particular example, how sure is Random Forest about it? For answering this question it is necessary to look beyond usual performance metrics and dive into the swampy waters of the confidence interval estimation for statistical learning algorithms 😖. [6 min read] (updated 11/21/22)

06 Sep 2020

Are Boosting Algorithms the new baseline model for your Tabular data? Part 1

Neural networks rule the world of machine learning IFF, you have a lot of data, and just for a reduced set of problems. The fact is that for heterogeneous (numerical and categorical) tabular data, decision trees are still one of the best options. Also, they have the benefit of being (more) explainable to the customer. Boosting decision trees are among the most successful algorithms in data science competitions, but could they replace Random Forest? The absolute leader, when you try a first model in your data.[updated]

26 Jul 2020

Tools of the Week.

UMAP, SHAP Values among other links to interesting stuff I run into.