DGA is a mechanism used by malware for establishing contact with the C2 channel. This is the second post of the series for creating a simple DGA using techniques for text generation. In particular, CNN uses Keras and Tensorflow for R. [6 min read]
The use of artificial intelligence (AI) algorithms in various fields are becoming an integral part of our lives. While some people are opposed to their use others have embraced the technology and are using it. I am one of them. [6 min read]
DGA is a mechanism used by malware for establishing contact with the C2 channel. The idea behind this post is to show how to create a simple DGA using techniques for text generation. In particular, CNN using Keras and Tensorflow for R. This is the first part of a series of two. [6 min read]
The good old clustering analysis techniques present some differences when applied to time series. So many to discuss in one simple post. However, I will do my best to provide some examples of two basic approaches for doing time series analysis [6min read].
Nobody has doubts about the importance for humankind of the PANINI sticker album for the FIFA World Cup. From a mathematical point of view, several interesting questions arise. How much money do they need to spend? How many other collectors do they need to interact with? What if a sticker pack had 6 stickers instead of 4? Rodralez, from LABSIN developed an app for answering these and other questions [3min read].
The idea of making art with code is not new, but what about Data? Can data be a work of art? Well, the truth is that thanks to conceptualism, it is possible. Trust me! [4min read]
Tree-based algorithms suffer from severe limitations when applied to forecasting problems. They can't predict beyond observed training data points values. However, not everything is lost. There are some alternative approaches to improve the performance of the tree-based algorithm under such scenarios. [5min read]
Given a prediction on a particular example, how sure is Random Forest about it? For answering this question it is necessary to look beyond usual performance metrics and dive into the swampy waters of the confidence interval estimation for statistical learning algorithms 😖. [6 min read] (updated 11/21/22)
Neural networks rule the world of machine learning IFF, you have a lot of data, and just for a reduced set of problems. The fact is that for heterogeneous (numerical and categorical) tabular data, decision trees are still one of the best options. Also, they have the benefit of being (more) explainable to the customer. Boosting decision trees are among the most successful algorithms in data science competitions, but could they replace Random Forest? The absolute leader, when you try a first model in your data.[updated]