Regularization (Ridge, Lasso, ElasticNet)

Both linear and logistic regression have a tendancy to overfit when there are a large number of features. Therefore it is important that we choose the features which have the most predictive power but how do we choose these features? We can use our EDA to a certain extent but that only goes so far.

This is where ridge and lasso regularization techniques come into play! Both of these techniques can be used to identify which features explain the most variance and should therefore be kept in the model.

Visualmente, los coeficientes (w) solo pueden tomar un valor de su eje dentro de la región azul que más se acerque al mínimo.

L1 (LASSO)	L2 (Ridge)	Elastic Net

Acerca coefficientes a 0. Good for variable selection	Más usado. Makes coefficients smaller	Tradeoff between variable selection and small coefficients
Penalizes the sum of absolute weights	Penalizes the sum of squared weights	Combination of 2 before
`loss + wd * weights.abs().sum()`	`loss + wd * weights.pow(2).sum()`

Más información:

Complete tutorial on ridge and lasso regression in python: A broad tutorial explaining why we use regularization techniques, touching on the mathematics behind the algorithms and giving a few examples in python.

An Introduction to Statistical Learning, Chapter 6.2: A comprehensive explanation of both Lasso and Ridge and their application in the context of statistical learning.