Convolutional Neural Network (CNN, ResNet)

Variable image size -> use global pooling:

Option 1: GlobalPool2d -> Linear(num_features, num_classes) (less computation)
Option 2: Conv2d(num_features, num_classes, 3, padding=1) -> GlobalPool2d

Note in Pytorch. global pooling is done by AdaptiveAvgPool2d

Separable convolution (less computation)

MobileNet style	Xception style

Sota CNNs

	Description	Paper
Inception v3		Dec 2015
Resnet	After 2 convs (3x3->3x3) sum block input	Dec 2015
SqueezeNet		Feb 2016
Densenet	Concatenate previous layers	Aug 2016
Xception	Depthwise Separable Convolutions	Oct 2016
ResNext		Nov 2016
DPN	Dual Path Network	Jul 2017
SENet	Squeeze and Excitation (channels weights)	Sep 2017
EfficientNet	Rethinking Model Scaling	May 2019
Noisy Student	Self-training	Nov 2019
NFNet	Normalization Free Convnets	Feb 2021
EfficientNetV2	Smaller Models and Faster Training	Apr 2021
ResNet strikes back	An improved training procedure in timm	Oct 2021
ConvNeXt	A ConvNet for the 2020s	Jan 2022

Check TIMM bencmark results

Code of the Interactive google colab:

! pip install pandas duckdb plotly
! git clone https://github.com/rwightman/pytorch-image-models.git
%cd pytorch-image-models/results

import pandas as pd
import plotly.express as px
import duckdb

db = duckdb.connect()
data = db.execute("""
SELECT * 
FROM 'model_benchmark_amp_nhwc_rtx3090.csv' b 
  JOIN 'results-imagenet-real.csv' r
  ON b.model = r.model
WHERE b.infer_batch_size = 256;
""").fetch_df()

data['family'] = data.model.map(lambda name: sorted(name.split('_'), key=len)[-1])

px.scatter(
    # data,
    data[data.infer_step_time < 250], 
    x='infer_step_time', 
    y='top1', 
    color='family',
    size='param_count',
    width=1200, 
    height=1000, 
    hover_name='model',
    hover_data=['infer_samples_per_sec', 'infer_img_size']
)