La selección del conjunto de validación es una de la cosas más importantes. Recuerda: NUNCA USES LOS DATOS DE ENTRENAMIENTO PARA MEDIR LO BUENO QUE ES TU MODELO.

  • Train test split (Holdout)
  • Cross validation (K-Fold)
    • Stratified K-Fold
    • Grouped K-Fold
    • Repeated K-Fold
  • Leave One Out (LOO)
  • Leave P Out (LPO)

Train test split

Split data into x, y for training and testing

from sklearn.model_selection import train_test_split
## make a train test split
x_train, x_valid, y_train, y_valid = train_test_split(x, y)

Cross Validation

Check: https://scikit-learn.org/stable/modules/cross_validation.html

Stratified Cross Validation

from sklearn.model_selection import StratifiedKFold

cv = StratifiedKFold(n_splits=5)
for train_idx, test_idx, in cv.split(x, y):
    x_train, y_train = x[train_idx], y[train_idx]
    x_valid, y_valid = x[valid_idx], y[valid_idx]
    ...