Time Series
🔮 Time Series
OPTION A: Generic models
- Feature engeeering: Generate a lot of possible useful features
- Feature selecetion: Pick only useful features
- Recursive Feature Elimination (RFE)
- Shapley feature importances
- shap-hypetune
- Train any ML/DL model
- Gradient Boosting (LightGBM)
- Neural Network (FeedFoward, TabNet)
OPTION B: TimeSeries-specific models (see Darts)
- Classic models (Univariate)
- ARIMA
- ARIMAX
- Fractal Analysis
- Facebook Prophet
- Deep Learning
- LSTM
- N-BEATS
🛠 Feature engeeering del tiempo
# Simple
def featEng_date(df, varName):
df['year'] = df[varName].dt.year.astype(np.int16)
df['month'] = df[varName].dt.month.astype(np.int8)
df['week'] = df[varName].dt.weekofyear.astype(np.int8)
df['day_of_year'] = df[varName].dt.dayofyear.astype(np.int16)
df['day_of_month'] = df[varName].dt.day.astype(np.int8)
df['day_of_week'] = df[varName].dt.dayofweek.astype(np.int8)
df['hour'] = df[varName].dt.hour.astype(np.int8)
df['minute'] = df[varName].dt.minute.astype(np.int8)
df['is_weekend'] = # To do
df['is_vacation'] = # To do
# Advanced: Agregregates
periods = ["15T", "1H", "3H"]
agregates = ["count", "mean", "std", "min", "max", "sum", "median"]
Validación
Aprende a extrapolar
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(low=-10, high=10, size=1000) # np.arange(-10, 10, 0.1)
y = np.sin(x) + np.random.normal(scale=0.2, size=x.shape)
plt.scatter(x,y, s=5)
Read this article: Caution with Random Forest
References
- Kaggle competition: Optiver Realized Volatility Prediction