Resources for a Gentle Introduction to Machine Learning
(Updated 09/24/2021)
thanks to LABSIN’s members Franco and Gabriel for your help building the list.
From time to time, people ask me about recommended resources for starting with Machine Learning. The fact is that there are a lot of right places with plenty of help. So many places that you need some guide to finding the right one for you. Recently, I read a twitter post about Jurgen Schimdhuber (known for his work on LSTM) recommended reading to new members of his labs. At LASBIN, we also have a modest reading list for new members and interns. I decided to publish a portion of the material we recommend to our new members.
First aproximation to Machine Learning
You have an MS Excel background and need to start with essential Machine Learning topics.
- Use WEKA. A well-known data mining/machine learning tool from WAIKATO University. Old, algorithms are somewhat outdated, but it still gets the work done if you need something easy to start.
- The book Data Mining, Practical Machine Learning Tools, and Techniques. The people behind WEKA wrote a book about the framework. They included an excellent introduction to Machine Learning Topics. We recommend reading chapters 1 to 4. There, you will find a snapshot of the most common algorithms and machine learning techniques. Also, the book includes a series of chapters describing WEKA. So you can give it a try to them if you want to.
- Rapidminer Studio. If you feel comfortable with your WEKA skills, then you can give Rapidminer Studio a try. Rapidminer Studio has some similarities with WEKA, but it is has grown to become a potent tool for designing your machine learning workflow.
- The Hundred-page Machine Learning Book was a best seller. Despite its size, the book provides a brief explanation of many of the fundamental machine learning algorithms. The book is code-agnostic but has an official repo with the codes in Python.
- In 2017, I gave a talk about Data Science. Most of the concepts were stolen from the book Data Science for Business. Videos of the talk are available on youtube Part 1 and Part 2 (in Spanish).
You want to Be Proficient with Machine Learning
You have a decent level in some high-level programming language (Matlab, R, Python) and want to learn Machine Learning.
Andrew NG’s Introduction to Machine Learning course. Today is a mandatory introduction course if you want to get proficient with machine learning. We recommend the first 6 weeks. You can avoid the topics about Linear Algebra and Octave (An open-source MatLab) if you already have knowledge about them.
The Book An Introduction to Statistical Learning with Applications in R (2nd Edition) provides a very practical introduction to several machine learning topics. The book also includes examples in R language. Chapters 2, 3, and 4 cover the basics of regression and classification problems. Chapter 5 is very important since it provides information about model evaluation using resampling techniques. Finally, chapter 8 is also a mandatory one, since it provides an introduction to tree-based algorithms. (Bagging and Boosting approaches). The second edition has added several important topics including Deep Learning, Survival Analysis and Bayesian additive regression trees, among others. The book has an asociated MOOC hosted in EDX for FREE. Highly recommended.
The book Data Science for R is a very useful resource if you want to be proficient in R. Chapters 3, 4, 5, and 7 have a very excellent introduction to essential topics such as visualization and data mangling.
Pedro Domingo’s article A few useful things to know about machine learning is an article very easy to read, with some handy tips when applying machine learning to different problems.
The book Applied Predictive Modeling from Max Kuhn (the creator of the R caret package). This book is highly recommended. I would say that the whole book should be read. Sorry 😛
If you feel more conformable with Python, you definitively should read the Book Hands on Machine Learning with Scikit-learn and Tensorflow
Maybe more related to the Data Science field, but still interesting. The book Practical Statistics for Data Scientists discusses, besides classical statistical toolsm, some fundamental topics required for Machine Learning experimental design. There is also a github repo with the code.
You want to Get Serious with Machine Learning.
You are interested in understanding the statistical and mathematical roots of Machine Learning.
Mathematics for Machine Learning is an excellent book for introducing yourself to machine learning concepts understanding the math behind them. The book has two main sections: the first discusses mathematical foundations such as Matrix decomposition, linear algebra, etc. The second section details some machine learning algorithms using the previously discussed math concepts.
Pattern Recognition and Machine Learning is one of the most recognized books on the machine learning field. It provides detailed theoretical explanations behind all the concepts applied to machine learning algorithms. The first 4 chapters are mandatory for anybody wanting to get serious with Machine learning. I would say that the book is something you should always need to have in your shelf for reference. Chapter 11 gives a very good introduction to sampling strategies.
The Elements of Statistical Learning is another classical book on the topic with a statistical perspective. Trevor Hastie, and Robert Tibshirani are among the authors of the book. Chapters 7, 8 9, and 10 are a must-read and certainly you will be revisiting them from time to time. Besides, chapter 5 discusses great improvements to the classical to linear regressor/classifier such as Lasso, Ridge and Elastic nets
Computer Age Statistical Inference: Algorithms, Evidence and Data Science is a relatively new book from Trevor Hastie. The idea of the book is to revisit several machine learning and statistical concepts adapted to the big data era.
Extra Bonus
Beyond Ng’s classical MOOC on Machine learning, there are plenty of MOOCs available for improving your machine learning skills. Among then we can recommend two well-known python-oriented courses. Unlike Ng’s MOOC, these are NOT Free.
- The University of Michigan has a recommended Applied Data Science course. It is not only a course but a complete specialization
- Jose Portilla’s Python for Data Science and Machine Learning is a applied course where you can learn about Python Universe tools for Data Science such as NumPy, Pandas, Seaborn, Matplotlib , Plotly , Scikit-Learn and Tensorflow among others.