Obtener datos de entrenamiento y validación
La selección del conjunto de validación es una de la cosas más importantes. Recuerda: NUNCA USES LOS DATOS DE ENTRENAMIENTO PARA MEDIR LO BUENO QUE ES TU MODELO.
- Train test split (Holdout)
- Cross validation (K-Fold)
- Stratified K-Fold
- Grouped K-Fold
- Repeated K-Fold
- Leave One Out (LOO)
- Leave P Out (LPO)
Train test split
Split data into x, y for training and testing
from sklearn.model_selection import train_test_split
## make a train test split
x_train, x_valid, y_train, y_valid = train_test_split(x, y)
Cross Validation
Check: https://scikit-learn.org/stable/modules/cross_validation.html
Stratified Cross Validation
from sklearn.model_selection import StratifiedKFold
cv = StratifiedKFold(n_splits=5)
for train_idx, test_idx, in cv.split(x, y):
x_train, y_train = x[train_idx], y[train_idx]
x_valid, y_valid = x[valid_idx], y[valid_idx]
...