Both linear and logistic regression have a tendancy to overfit when there are a large number of features. Therefore it is important that we choose the features which have the most predictive power but how do we choose these features? We can use our EDA to a certain extent but that only goes so far.

This is where ridge and lasso regularization techniques come into play! Both of these techniques can be used to identify which features explain the most variance and should therefore be kept in the model.

Visualmente, los coeficientes (w) solo pueden tomar un valor de su eje dentro de la región azul que más se acerque al mínimo.

L1 (LASSO) L2 (Ridge) Elastic Net
Acerca coefficientes a 0. Good for variable selection Más usado. Makes coefficients smaller Tradeoff between variable selection and small coefficients
Penalizes the sum of absolute weights Penalizes the sum of squared weights Combination of 2 before
loss + wd * weights.abs().sum() loss + wd * weights.pow(2).sum()  

Más información: