Selected Papers for the Week 13/2021
This week, during my twitter journey, I found three very interesting papers I want to save for future reference. So I decided to do a little post about them.
 Understanding Deep Learning (Still) Requires Rethinking Generalization.. There is a huge interest about the generalization capabilities of neural networks. The fact is that we used to think that a neural network predicting correctly the 100% of the training data wont generalize very well on unseen examples. However, it seems not to be the case. The more parameters a Neural network has, the better its generalize to new examples. This behavior is related with the double descent situation observer some time ago. In the book PATTERNS, PREDICTIONS, AND ACTIONS. A story about machine learning there is a chapter discussing the overparametrization phenomena.
In this chapter, we discuss the interplay between representation, optimization, and generalization, again focusing on models with more parameters than seen data points. We examine the intriguing empirical phenomena related to overparameterization and generalization in today’s machine learning practice

Automatic Differentiation in Machine Learning: a Survey. If you are in the Machine Learning world, You probably have hear about Automatic differentiation(AD). There is a big hype about it. This survey explain in a very detailed way the key concepts behind AD and the differences with manual and numerical differentiation.

Crossvalidation: what does it estimate and how well does it do it?. This week in twitter Rob Tibshinari has published the draft of this very interesting paper discussing the estimation capabilities of Crossvalidation methods. This article proves that Crossvalidation estimates the average prediction error of models fit on other unseen training sets drawn from the same population. This seems to be related with the concepts of Expected and Conditional generalization errors discussed in In the ESL book.
The following is an extract of the article:
We prove a finitesample conditional independence result (Theorem 1) with a supporting asymptotic result (Theorem 2) that together show that CV does not estimate the error of the specific model fit on the observed training set, but is instead estimating the average error over many training sets (Corollary 2 and Corollary 3)