Time Series

🔮 Time Series

OPTION A: Generic models

  1. Feature engeeering: Generate a lot of possible useful features
    • Generating 500 or 1000 features is normal
    • Tsfresh: Automatic calculates time series features
    • Trane
  2. Feature selecetion: Pick only useful features
    • Recursive Feature Elimination (RFE)
    • Shapley feature importances
    • shap-hypetune
  3. Train any ML/DL model
    • Gradient Boosting (LightGBM)
    • Neural Network (FeedFoward, TabNet)

OPTION B: TimeSeries-specific models (see Darts)

🛠 Feature engeeering del tiempo

# Simple
def featEng_date(df, varName):
    df['year']         = df[varName].dt.year.astype(np.int16)
    df['month']        = df[varName].dt.month.astype(np.int8)
    df['week']         = df[varName].dt.weekofyear.astype(np.int8)
    df['day_of_year']  = df[varName].dt.dayofyear.astype(np.int16)
    df['day_of_month'] = df[varName].dt.day.astype(np.int8)
    df['day_of_week']  = df[varName].dt.dayofweek.astype(np.int8)
    df['hour']         = df[varName].dt.hour.astype(np.int8)
    df['minute']       = df[varName].dt.minute.astype(np.int8)
    df['is_weekend']   = # To do
    df['is_vacation']  = # To do

# Advanced: Agregregates
periods   = ["15T", "1H", "3H"]
agregates = ["count", "mean", "std", "min", "max", "sum", "median"]

Validación

Aprende a extrapolar

Kaggle discussion

import numpy as np
import matplotlib.pyplot as plt

x = np.random.uniform(low=-10, high=10, size=1000) # np.arange(-10, 10, 0.1)
y = np.sin(x) + np.random.normal(scale=0.2, size=x.shape)
plt.scatter(x,y, s=5)

Read this article: Caution with Random Forest

References