Data augmenttion (High impact)

RandAugment MixUp CutMix

Input data representation

Hyperparameter Impact Examples
Numerical encoding High Normalization: StandardScaler, QuantileTransformer, RankGauss
Categorical encoding High Onehot with embedding layer (squre root od number of categories)
Image enc. (from scratch) High StandardScaling: Compute manually (per RGB channel) the mean and std
Image enc. (transfer learning) High StandardScaling: Use the original mean and std !!!
Image data agumentation High See …
Image size (resolution) Medium 120x120, 224x224, 512x512 (the higher the better)
CoordConv Low Encode sequence order for CNNs or tranformers (useful for spectograms)
Positional Encoding Low Encode sequence order for CNNs or tranformers

Model hyperparameters

Hyperparameter Impact Examples
Layer size High  
Num of layers (depth) Medium  
Weight Initialization Medium Xavier (Dense+tanh), Kaiming He (Dense+ReLU)
Transfer Learning High Pretrained model frozen, new layers unfrozen
Nonlinearity (act fn) Low ReLU, GELU, Swish, Mish, GLU
Residual connections Low Needed if there are a lot of layers (>3)
- Stochastic Depth Low If there are residual cons, add this. paper
Regularization Medium  
- Dropout Medium Scale after dropout to maintain std=1
- Dropconect Low  
Inner normalization    
- Batch normalization    
- Layer normalization   Usually before each layer
Weght tiying   If same input & output: Langmodel, Autoencoder
Squeeze and Excitation   paper

Training hyperparameters

Hyperparameter Impact Examples
Epochs High 3..6 if transfer learning, 20..40 if from scratch
BS Medium 16,32,64
LR High 0.001 is usally good but use Learning Rate Finder
LR sched Medium Constant, linear warmup + cosine decay
Opimizer Low Adam or LAMB are good
Early stopping Low Not neccesary usaully
loss (Regression) High MAE, MSE, MAPE, MSLE, Huber
loss (Classification) High BinaryCrossentropy, CategCrossentropy, Hinge, Focal
loss + weight decay Low  
loss + label smoothing Low  

LR Finder

There are sevaral ways of computing the lr automatically:

  Developed by Task in mind
Steep (green) Leslie Smith (Fastai) All tasks
Minimun (orange) Leslie Smith (Fastai) All tasks
Valley (red) ESRI (defualt Fastai) Vision tasks
Slide (purple) Novetta NLP tasks
  • https://www.novetta.com/2021/03/learning-rate/
  • https://forums.fast.ai/t/automated-learning-rate-suggester/44199

References