neural data-to-text,nlp
Use neural models for data verbalization.
265 technical terms and definitions
Use neural models for data verbalization.
Neural architecture encoders convert graph structures into fixed-dimensional vectors for predictor training.
Specialized hardware for on-device ML (Apple's).
Neural fabrics represent search spaces as trellis structures where NAS learns to select paths through pre-defined computational building blocks.
# Neural Hawkes Process and Time Series Models ## Neural Hawkes Process ## 1. Introduction A **Hawkes process** is a self-exciting point process used to model events that occur randomly in continuous time, where the occurrence of past events increases the likelihood of future events. **Key characteristics:** - Events occur at random times $t_1, t_2, t_3, \ldots$ - Past events "excite" or increase the probability of future events - The process has memory—history matters - Widely used in finance, seismology, social networks, and neuroscience ## 2. Classical Hawkes Process ### 2.1 Intensity Function The **conditional intensity function** $\lambda(t)$ represents the instantaneous rate of event occurrence: $$ \lambda(t) = \mu + \sum_{t_i < t} \phi(t - t_i) $$ **Where:** - $\mu > 0$ — Base intensity (background rate) - $\phi(\cdot)$ — Triggering kernel (excitation function) - $t_i$ — Times of past events - $\sum_{t_i < t}$ — Sum over all events before time $t$ ### 2.2 Common Triggering Kernels **Exponential kernel (most common):** $$ \phi(\tau) = \alpha \cdot e^{-\beta \tau} $$ - $\alpha > 0$ — Excitation magnitude - $\beta > 0$ — Decay rate - Constraint: $\frac{\alpha}{\beta} < 1$ for stationarity **Power-law kernel:** $$ \phi(\tau) = \frac{\alpha}{(\tau + c)^{(1+\omega)}} $$ - Used in seismology (Omori's law) - Heavier tails than exponential ### 2.3 Likelihood Function For a sequence of events $\{t_1, t_2, \ldots, t_n\}$ in interval $[0, T]$: $$ \mathcal{L} = \prod_{i=1}^{n} \lambda(t_i) \cdot \exp\left( -\int_0^T \lambda(s) \, ds \right) $$ **Log-likelihood:** $$ \log \mathcal{L} = \sum_{i=1}^{n} \log \lambda(t_i) - \int_0^T \lambda(s) \, ds $$ ### 2.4 Branching Structure The Hawkes process has a **branching interpretation:** - **Immigrants:** Events from the background rate $\mu$ - **Offspring:** Events triggered by previous events - **Branching ratio:** $n^* = \int_0^\infty \phi(\tau) \, d\tau$ - If $n^* < 1$: Process is subcritical (stationary) - If $n^* = 1$: Process is critical - If $n^* > 1$: Process is supercritical (explosive) ## 3. Neural Hawkes Process ### 3.1 Motivation **Limitations of classical Hawkes processes:** - Parametric kernels may not capture complex dynamics - Difficult to model **inhibition** (events reducing future probability) - Limited expressiveness for multi-type event interactions - Manual feature engineering required **Solution:** Replace parametric components with neural networks. ### 3.2 Continuous-Time LSTM (CT-LSTM) The Neural Hawkes Process (Mei & Eisner, 2017) uses a **continuous-time LSTM** where the hidden state evolves between events. **Standard LSTM update at event $t_i$:** $$ \begin{aligned} i_i &= \sigma(W_i x_i + U_i h_{i-1} + b_i) \\ f_i &= \sigma(W_f x_i + U_f h_{i-1} + b_f) \\ o_i &= \sigma(W_o x_i + U_o h_{i-1} + b_o) \\ \tilde{c}_i &= \tanh(W_c x_i + U_c h_{i-1} + b_c) \\ c_i &= f_i \odot c_{i-1} + i_i \odot \tilde{c}_i \\ h_i &= o_i \odot \tanh(c_i) \end{aligned} $$ **Where:** - $i_i$ — Input gate - $f_i$ — Forget gate - $o_i$ — Output gate - $c_i$ — Cell state - $h_i$ — Hidden state - $\sigma(\cdot)$ — Sigmoid function - $\odot$ — Element-wise multiplication ### 3.3 Continuous-Time Dynamics **Key innovation:** Cell state decays continuously between events. **Cell state at time $t$ (between events $t_i$ and $t_{i+1}$):** $$ c(t) = \bar{c}_i + (c_i - \bar{c}_i) \cdot e^{-\delta_i (t - t_i)} $$ **Where:** - $c_i$ — Cell state immediately after event $t_i$ - $\bar{c}_i$ — Target cell state (what $c(t)$ decays toward) - $\delta_i > 0$ — Decay rate (learned) - $t - t_i$ — Time elapsed since last event **Target cell state:** $$ \bar{c}_i = \bar{f}_i \odot \bar{c}_{i-1} + \bar{i}_i \odot \tilde{c}_i $$ **Hidden state at time $t$:** $$ h(t) = o_i \odot \tanh(c(t)) $$ ### 3.4 Intensity Function The intensity for event type $k$ at time $t$: $$ \lambda_k(t) = f_k(h(t)) = \text{softplus}(w_k^\top h(t) + b_k) $$ **Softplus function:** $$ \text{softplus}(x) = \log(1 + e^x) $$ **Properties:** - Ensures $\lambda_k(t) > 0$ (intensity must be positive) - Smooth approximation to ReLU - Allows for both excitation and inhibition ### 3.5 Training Objective **Negative log-likelihood:** $$ \mathcal{L} = -\sum_{i=1}^{n} \log \lambda_{k_i}(t_i) + \sum_{k=1}^{K} \int_0^T \lambda_k(s) \, ds $$ **Where:** - $k_i$ — Type of the $i$-th event - $K$ — Total number of event types - The integral is computed via Monte Carlo sampling or numerical integration ### 3.6 Architecture Summary ``` Input: Event sequence {(t_1, k_1), (t_2, k_2), ..., (t_n, k_n)} │ ▼ ┌───────────────────┐ │ Event Embedding │ │ x_i = embed(k_i)│ └───────────────────┘ │ ▼ ┌───────────────────┐ │ CT-LSTM Cell │ │ c(t), h(t) │ └───────────────────┘ │ ▼ ┌───────────────────┐ │ Intensity Layer │ │ λ_k(t) = softplus│ └───────────────────┘ │ ▼ Output: λ(t) for prediction, NLL for training ``` ## 4. Relationship to Time Series Models ### 4.1 Comparison Table | Aspect | Traditional Time Series | Point Processes | |:-------|:-----------------------|:----------------| | **Data** | Regular samples $y_1, y_2, \ldots$ | Event times $t_1, t_2, \ldots$ | | **Question** | What is $y$ at time $t$? | When does next event occur? | | **Spacing** | Fixed intervals $\Delta t$ | Irregular, continuous | | **Models** | ARIMA, GARCH, RNN | Poisson, Hawkes, Neural TPP | ### 4.2 Key Differences **Time series models:** - Observations at fixed time intervals: $y_t, y_{t+1}, y_{t+2}, \ldots$ - Model the value/magnitude of observations - Examples: stock prices, temperature, sensor readings **Point processes:** - Events at irregular, continuous times: $t_1, t_2, t_3, \ldots$ - Model **when** events occur (and optionally what type) - Examples: transactions, earthquakes, social media posts ### 4.3 Connections **Converting between representations:** - **Point process → Time series:** Count events in fixed bins $$N_t = \#\{t_i : t_i \in [t, t+\Delta t)\}$$ - **Time series → Point process:** Treat threshold crossings as events **Shared neural architectures:** - Both use RNNs, LSTMs, Transformers - Attention mechanisms applicable to both - Encoder-decoder frameworks common ## 5. Modern Extensions ### 5.1 Transformer Hawkes Process **Reference:** Zuo et al., 2020 **Key idea:** Replace RNN with self-attention mechanism. **Advantages:** - Parallelizable training (no sequential dependency) - Better long-range dependency modeling - Scales to longer sequences **Self-attention for events:** $$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V $$ **Temporal encoding:** $$ \text{PE}(t, 2i) = \sin\left(\frac{t}{10000^{2i/d}}\right) $$ $$ \text{PE}(t, 2i+1) = \cos\left(\frac{t}{10000^{2i/d}}\right) $$ ### 5.2 Neural Jump SDEs **Combines:** - Continuous diffusion dynamics (SDEs) - Discrete jumps (point processes) **Formulation:** $$ dX_t = f(X_t) \, dt + g(X_t) \, dW_t + h(X_t) \, dN_t $$ **Where:** - $f(X_t) \, dt$ — Drift term - $g(X_t) \, dW_t$ — Diffusion (Brownian motion) - $h(X_t) \, dN_t$ — Jump term (point process) ### 5.3 Variational Approaches **Variational Autoencoder for Point Processes:** $$ \mathcal{L}_{\text{ELBO}} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{\text{KL}}(q(z|x) \| p(z)) $$ **Benefits:** - Uncertainty quantification - Latent structure discovery - Generative modeling ### 5.4 Marked Temporal Point Processes Events carry additional information (**marks**): $$ \{(t_1, m_1), (t_2, m_2), \ldots, (t_n, m_n)\} $$ **Joint intensity:** $$ \lambda(t, m) = \lambda_g(t) \cdot f(m | t, \mathcal{H}_t) $$ - $\lambda_g(t)$ — Ground intensity (when) - $f(m | t, \mathcal{H}_t)$ — Mark distribution (what) ## 6. Applications ### 6.1 When to Use Neural Hawkes **Good fit:** - Event data with self-exciting patterns - Multiple interacting event types - Complex, nonlinear dependencies - Large datasets where neural networks can generalize **Specific domains:** - **Finance:** High-frequency trading, order book dynamics - **Social networks:** Information cascades, retweets, viral content - **Healthcare:** Patient events, hospital admissions, disease outbreaks - **Criminology:** Crime prediction, recidivism modeling - **Seismology:** Earthquake aftershock prediction - **Neuroscience:** Neural spike train modeling ### 6.2 When to Consider Alternatives | Scenario | Recommended Alternative | |:---------|:-----------------------| | Regularly sampled data | Standard time series (ARIMA, LSTM) | | Need interpretability | Classical Hawkes with explicit kernels | | Very sparse data | Simple parametric models | | Real-time constraints | Lightweight models, online learning | ### 6.3 Implementation Resources **Libraries:** - `tick` (Python) — Classical point processes - `PtPack` (Python) — Neural temporal point processes - `pytorch-transformer-hawkes` — Transformer-based models **Key papers:** - Mei & Eisner (2017): "The Neural Hawkes Process" - Zuo et al. (2020): "Transformer Hawkes Process" - Du et al. (2016): "Recurrent Marked Temporal Point Processes" ## 7. Mathematical ### Core Equations Reference **Classical Hawkes intensity:** $$ \lambda(t) = \mu + \sum_{t_i < t} \alpha e^{-\beta(t - t_i)} $$ **Neural Hawkes continuous-time cell:** $$ c(t) = \bar{c}_i + (c_i - \bar{c}_i) e^{-\delta_i(t - t_i)} $$ **Neural intensity function:** $$ \lambda_k(t) = \text{softplus}(w_k^\top h(t) + b_k) $$ **Log-likelihood:** $$ \log \mathcal{L} = \sum_{i=1}^{n} \log \lambda_{k_i}(t_i) - \int_0^T \sum_{k=1}^{K} \lambda_k(s) \, ds $$
Networks representing shapes.
Learn surfaces as zero level sets.
Encode mesh with networks.
Neural meshes combine explicit mesh topology with learned vertex features and deformations.
Dynamically compose modules for reasoning.
Compose neural modules for reasoning.
Learn system dynamics with neural networks.
Limiting behavior of infinitely wide networks.
ML models for potential energy surfaces.
Remove weights for edge deployment.
Modify trained networks.
Use NNs to optimize recipes.
Neural ODEs for graphs model continuous-time graph dynamics through differential equations.
Neural networks defined by differential equations.
Learn operators between function spaces.
Model dynamics as continuous-time ODEs learned by neural networks.
Graph neural network-based neural predictors encode architectures as computation graphs for performance estimation.
Neural predictors are meta-models trained to estimate architecture performance from encoded representations enabling efficient search space exploration.
Use neural networks to generate programs.
Neural radiance fields represent 3D scenes as neural networks mapping coordinates to color and density.
Represent scenes as neural fields.
Detailed NeRF techniques.
NeRF for video.
Use neural networks for rendering.
Mathematical relationships between loss and model size/data/compute.
Learn motion in neural representation.
Neural scene graphs represent scenes as compositional graphs of objects with neural appearance.
Encode 3D scenes in neural networks.
Learn stochastic dynamics with neural networks.
Use style transfer to understand representations.
Apply artistic style to content.
Neural tangent kernel theory informs NAS by predicting training dynamics from architecture initialization properties.
Analyze neural networks via kernel methods.
Neural networks for automated theorem proving.
Neural Transducer generalizes RNN-T with different encoder and predictor architectures for streaming ASR.
Networks with differentiable external memory and read/write heads.
Convert acoustic features back into audio waveforms.
Volumetric representation for video.
Elon Musk's brain-computer interface company.
# NeuralProphet and Time Series Models A comprehensive guide to NeuralProphet and modern time series forecasting approaches. ## 1. Introduction to NeuralProphet NeuralProphet is a **neural network-based time series forecasting library** built on PyTorch, designed as a successor to Facebook's Prophet. ### Key Characteristics - **Hybrid Architecture**: Combines classical decomposition with deep learning - **Interpretability**: Maintains component-wise explainability - **Flexibility**: Supports auto-regression, lagged regressors, and future covariates - **Scalability**: GPU-accelerated training via PyTorch backend ## 2. Mathematical Foundations ### 2.1 General Time Series Decomposition A time series $y_t$ can be decomposed as: $$ y_t = T_t + S_t + H_t + A_t + F_t + L_t + \epsilon_t $$ Where: - $T_t$ — Trend component - $S_t$ — Seasonal component - $H_t$ — Holiday/event effects - $A_t$ — Auto-regressive component - $F_t$ — Future regressors - $L_t$ — Lagged regressors - $\epsilon_t$ — Residual error (noise) ### 2.2 Trend Modeling #### Linear Trend $$ T_t = k + \delta^T \cdot \mathbf{a}(t) \cdot t $$ Where: - $k$ — Base growth rate - $\delta$ — Vector of changepoint adjustments - $\mathbf{a}(t)$ — Indicator function for changepoints #### Logistic Growth (Saturating Trend) $$ T_t = \frac{C(t)}{1 + \exp\left(-k(t - m)\right)} $$ Where: - $C(t)$ — Time-varying capacity (carrying capacity) - $k$ — Growth rate - $m$ — Offset parameter ### 2.3 Seasonality via Fourier Series Seasonal patterns are modeled using Fourier terms: $$ S_t = \sum_{n=1}^{N} \left( a_n \cos\left(\frac{2\pi n t}{P}\right) + b_n \sin\left(\frac{2\pi n t}{P}\right) \right) $$ Where: - $P$ — Period (e.g., 365.25 for yearly, 7 for weekly) - $N$ — Number of Fourier terms (controls smoothness) - $a_n, b_n$ — Learned Fourier coefficients **Common Configurations:** | Seasonality | Period $P$ | Typical $N$ | |-------------|------------|-------------| | Yearly | 365.25 | 10 | | Weekly | 7 | 3 | | Daily | 1 | 4 | ### 2.4 Auto-Regressive Component (AR-Net) The AR component uses a feed-forward neural network: $$ A_t = f_{\theta}\left( y_{t-1}, y_{t-2}, \ldots, y_{t-p} \right) $$ Where: - $p$ — Number of lags (lookback window) - $f_{\theta}$ — Neural network with parameters $\theta$ The network architecture: $$ \mathbf{h}^{(1)} = \text{ReLU}\left( \mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)} \right) $$ $$ \mathbf{h}^{(l)} = \text{ReLU}\left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right) $$ $$ A_t = \mathbf{W}^{(\text{out})} \mathbf{h}^{(L)} + b^{(\text{out})} $$ ### 2.5 Holiday/Event Effects $$ H_t = \sum_{i=1}^{M} \kappa_i \cdot \mathbf{1}_{[t \in D_i]} $$ Where: - $M$ — Number of distinct holidays/events - $\kappa_i$ — Effect magnitude for holiday $i$ - $D_i$ — Set of dates for holiday $i$ - $\mathbf{1}_{[\cdot]}$ — Indicator function With window effects (days before/after): $$ H_t = \sum_{i=1}^{M} \sum_{w=-W^-}^{W^+} \kappa_{i,w} \cdot \mathbf{1}_{[t + w \in D_i]} $$ ### 2.6 Lagged Regressors $$ L_t = \sum_{j=1}^{J} g_{\phi_j}\left( x_{j,t-1}, x_{j,t-2}, \ldots, x_{j,t-q_j} \right) $$ Where: - $J$ — Number of lagged regressor variables - $q_j$ — Lag depth for regressor $j$ - $g_{\phi_j}$ — Neural network for regressor $j$ ### 2.7 Future Regressors $$ F_t = \sum_{k=1}^{K} \beta_k \cdot z_{k,t} $$ Where: - $z_{k,t}$ — Known future value of regressor $k$ at time $t$ - $\beta_k$ — Learned coefficient ### 2.8 Loss Function Training minimizes the loss: $$ \mathcal{L} = \frac{1}{T} \sum_{t=1}^{T} \ell\left( y_t, \hat{y}_t \right) + \lambda \cdot \mathcal{R}(\theta) $$ Common loss functions: - **MSE (Mean Squared Error)**: $$\ell(y, \hat{y}) = (y - \hat{y})^2$$ - **Huber Loss**: $$\ell_{\delta}(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\ \delta \left( |y - \hat{y}| - \frac{1}{2}\delta \right) & \text{otherwise} \end{cases}$$ - **MAE (Mean Absolute Error)**: $$\ell(y, \hat{y}) = |y - \hat{y}|$$ Regularization term: $$ \mathcal{R}(\theta) = \|\theta\|_2^2 \quad \text{(L2 regularization)} $$ ## 3. Core ### 3.1 Component | Component | Symbol | Description | Learnable Parameters | |-----------|--------|-------------|----------------------| | Trend | $T_t$ | Long-term growth pattern | $k, \delta, m$ | | Seasonality | $S_t$ | Periodic patterns | $a_n, b_n$ | | Holidays | $H_t$ | Event-based effects | $\kappa_i$ | | AR-Net | $A_t$ | Auto-regressive dependencies | $\mathbf{W}, \mathbf{b}$ | | Future Regressors | $F_t$ | Known future covariates | $\beta_k$ | | Lagged Regressors | $L_t$ | Past covariate effects | $\phi_j$ | ### 3.2 Additive vs Multiplicative Modes **Additive Model:** $$ y_t = T_t + S_t + H_t + \epsilon_t $$ **Multiplicative Model:** $$ y_t = T_t \cdot (1 + S_t) \cdot (1 + H_t) + \epsilon_t $$ Or equivalently in log-space: $$ \log(y_t) = \log(T_t) + \log(1 + S_t) + \log(1 + H_t) + \epsilon_t $$ ### 3.3 Uncertainty Quantification NeuralProphet uses **conformal prediction** for prediction intervals: $$ \hat{C}_{1-\alpha}(x) = \left[ \hat{y}(x) - q_{1-\alpha}, \; \hat{y}(x) + q_{1-\alpha} \right] $$ Where: - $q_{1-\alpha}$ — Quantile of residuals on calibration set - $\alpha$ — Significance level (e.g., 0.05 for 95% intervals) ## 4. Model Comparison ### 4.1 Classical Statistical Models #### ARIMA (AutoRegressive Integrated Moving Average) $$ \phi(B)(1-B)^d y_t = \theta(B) \epsilon_t $$ Where: - $B$ — Backshift operator: $B y_t = y_{t-1}$ - $\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p$ — AR polynomial - $\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q$ — MA polynomial - $d$ — Differencing order #### Exponential Smoothing (ETS) **Simple Exponential Smoothing:** $$ \hat{y}_{t+1} = \alpha y_t + (1 - \alpha) \hat{y}_t $$ **Holt-Winters (Additive):** $$ \begin{aligned} \ell_t &= \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_t &= \beta (\ell_t - \ell_{t-1}) + (1 - \beta) b_{t-1} \\ s_t &= \gamma (y_t - \ell_t) + (1 - \gamma) s_{t-m} \\ \hat{y}_{t+h} &= \ell_t + h b_t + s_{t+h-m} \end{aligned} $$ ### 4.2 Deep Learning Models #### LSTM (Long Short-Term Memory) $$ \begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\ C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t \odot \tanh(C_t) \end{aligned} $$ #### Transformer (Self-Attention) $$ \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d_k}} \right) V $$ Multi-head attention: $$ \text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h) W^O $$ Where: $$ \text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V) $$ ### 4.3 Comparison Matrix | Model | Auto-Regressive | Exogenous Variables | Interpretability | Data Requirement | |-------|-----------------|---------------------|------------------|------------------| | ARIMA | $\checkmark$ | Limited | High | Low | | Prophet | $\times$ | $\checkmark$ | High | Medium | | NeuralProphet | $\checkmark$ | $\checkmark$ | High | Medium | | LSTM | $\checkmark$ | $\checkmark$ | Low | High | | Transformer | $\checkmark$ | $\checkmark$ | Low | Very High | | N-BEATS | $\checkmark$ | $\times$ | Medium | High | ## 5. Implementation Guide ### 5.1 Installation ```bash Install via pip pip install neuralprophet With additional dependencies pip install neuralprophet[live] ``` ### 5.2 Basic Usage ```python from neuralprophet import NeuralProphet import pandas as pd Load data (must have 'ds' and 'y' columns) df = pd.read_csv('timeseries_data.csv') df['ds'] = pd.to_datetime(df['ds']) Initialize model model = NeuralProphet( growth='linear', # 'linear' or 'discontinuous' seasonality_mode='multiplicative', # 'additive' or 'multiplicative' yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False, n_lags=14, # AR lookback n_forecasts=7, # Multi-step forecast learning_rate=0.1, epochs=100, batch_size=64, ) Fit model metrics = model.fit(df, freq='D') Create future dataframe future = model.make_future_dataframe(df, periods=30) Predict forecast = model.predict(future) Visualize fig_forecast = model.plot(forecast) fig_components = model.plot_components(forecast) ``` ### 5.3 Adding Regressors ```python Future regressors (known ahead of time) model = NeuralProphet() model.add_future_regressor('temperature') model.add_future_regressor('promotion', mode='multiplicative') Lagged regressors (only past values used) model = NeuralProphet(n_lags=7) model.add_lagged_regressor('competitor_sales', n_lags=7) Fit with regressor columns in dataframe model.fit(df_with_regressors, freq='D') ``` ### 5.4 Custom Seasonality ```python model = NeuralProphet( yearly_seasonality=False, # Disable default weekly_seasonality=False, ) # Add custom seasonality model.add_seasonality( name='monthly', period=30.5, fourier_order=5, ) model.add_seasonality( name='quarterly', period=91.25, fourier_order=3, ) ``` ### 5.5 Holidays and Events ```python Define holidays playoffs = pd.DataFrame({ 'event': 'playoff', 'ds': pd.to_datetime(['2024-01-13', '2024-01-14', '2024-01-20']), }) superbowl = pd.DataFrame({ 'event': 'superbowl', 'ds': pd.to_datetime(['2024-02-11']), }) events_df = pd.concat([playoffs, superbowl]) Add to model model = NeuralProphet() model.add_events(['playoff', 'superbowl']) Merge events with training data df_with_events = model.create_df_with_events(df, events_df) Fit model.fit(df_with_events, freq='D') ``` ### 5.6 Hyperparameter Tuning ```python from neuralprophet import NeuralProphet, set_random_seed set_random_seed(42) Grid search example param_grid = { 'n_lags': [7, 14, 28], 'learning_rate': [0.01, 0.1, 0.5], 'seasonality_mode': ['additive', 'multiplicative'], } best_mae = float('inf') best_params = None for n_lags in param_grid['n_lags']: for lr in param_grid['learning_rate']: for mode in param_grid['seasonality_mode']: model = NeuralProphet( n_lags=n_lags, learning_rate=lr, seasonality_mode=mode, epochs=50, ) # Split data df_train, df_test = model.split_df(df, valid_p=0.2) # Fit and evaluate metrics = model.fit(df_train, freq='D', validation_df=df_test) if metrics['MAE_val'].iloc[-1] < best_mae: best_mae = metrics['MAE_val'].iloc[-1] best_params = {'n_lags': n_lags, 'lr': lr, 'mode': mode} print(f"Best MAE: {best_mae:.4f}") print(f"Best params: {best_params}") ``` ## 6. Advanced Topics ### 6.1 Multi-Step Forecasting For $h$-step ahead forecasting: $$ \hat{y}_{t+h|t} = f\left( y_t, y_{t-1}, \ldots, y_{t-p+1}; \mathbf{x}_t; \theta \right) $$ **Strategies:** - **Direct**: Train separate models for each horizon $h$ - **Recursive**: Use $\hat{y}_{t+1}$ to predict $\hat{y}_{t+2}$, etc. - **Multi-output**: Single model outputs $[\hat{y}_{t+1}, \ldots, \hat{y}_{t+h}]$ NeuralProphet uses **multi-output** via `n_forecasts` parameter. ### 6.2 Metrics for Evaluation | Metric | Formula | Properties | |--------|---------|------------| | MAE | Mean Absolute Error: (1/n) × Sum of absolute differences | Robust to outliers | | MSE | Mean Squared Error: (1/n) × Sum of squared differences | Penalizes large errors | | RMSE | Root Mean Squared Error: Square root of MSE | Same units as y | | MAPE | Mean Absolute Percentage Error: (100/n) × Sum of percentage errors | Scale-independent | | sMAPE | Symmetric MAPE: (200/n) × Symmetric percentage calculation | Symmetric | | MASE | Mean Absolute Scaled Error: MAE divided by naive forecast MAE | Scaled by naive forecast | ### 6.3 Cross-Validation for Time Series **Time Series Split (Walk-Forward Validation):** ``` Fold 1: [Train: ████████] [Test: ██] Fold 2: [Train: ██████████] [Test: ██] Fold 3: [Train: ████████████] [Test: ██] Fold 4: [Train: ██████████████] [Test: ██] ``` ```python from neuralprophet import NeuralProphet model = NeuralProphet() Perform cross-validation cv_results = model.crossvalidation_split_df( df, freq='D', k=5, # Number of folds fold_pct=0.1, # Percentage of data per fold fold_overlap_pct=0.5, ) ``` ### 6.4 Handling Missing Data $$ y_t^{\text{imputed}} = \begin{cases} y_t & \text{if observed} \\ \hat{y}_t^{\text{interpolated}} & \text{if missing} \end{cases} $$ Common strategies: - **Forward fill**: $y_t^{\text{missing}} = y_{t-1}$ - **Linear interpolation**: $y_t^{\text{missing}} = \frac{y_{t-k} + y_{t+j}}{2}$ - **Seasonal interpolation**: $y_t^{\text{missing}} = y_{t-P}$ ```python NeuralProphet handles missing values automatically But you can also preprocess: df['y'] = df['y'].interpolate(method='linear') ``` ### 6.5 Trend Changepoints Automatic detection of trend changes at times $s_j$: $$ T_t = \left( k + \sum_{j: s_j < t} \delta_j \right) t + \left( m + \sum_{j: s_j < t} \gamma_j \right) $$ ```python model = NeuralProphet( n_changepoints=10, # Number of potential changepoints changepoints_range=0.8, # Proportion of history for changepoints trend_reg=0.1, # Regularization on trend changes ) ``` ## 7. Quick Reference ### Key Equations Summary | Concept | Equation | |---------|----------| | Full Model | $y_t = T_t + S_t + H_t + A_t + F_t + L_t + \epsilon_t$ | | Fourier Seasonality | $S_t = \sum_{n=1}^{N} \left( a_n \cos\left(\frac{2\pi n t}{P}\right) + b_n \sin\left(\frac{2\pi n t}{P}\right) \right)$ | | AR-Net | $A_t = f_{\theta}(y_{t-1}, \ldots, y_{t-p})$ | | Trend (Linear) | $T_t = k + \delta^T \mathbf{a}(t) \cdot t$ | | Trend (Logistic) | $T_t = \frac{C}{1 + e^{-k(t-m)}}$ | | Loss | $\mathcal{L} = \frac{1}{T}\sum_t \ell(y_t, \hat{y}_t) + \lambda \mathcal{R}(\theta)$ | ### Recommended Hyperparameters | Parameter | Default | Range | Notes | |-----------|---------|-------|-------| | `n_lags` | 0 | 0-60 | Higher for more AR dependency | | `n_forecasts` | 1 | 1-30 | Multi-step output | | `learning_rate` | auto | 0.001-1.0 | Start with 0.1 | | `epochs` | auto | 50-500 | Monitor validation loss | | `batch_size` | auto | 16-256 | Larger for more data | | `yearly_seasonality` | auto | True/False/int | int = Fourier order | | `weekly_seasonality` | auto | True/False/int | int = Fourier order |
Combine neural nets with symbolic reasoning.
Brain-inspired architectures.
Hardware inspired by biological neural networks.
Brain-inspired vision processing.
Neuromorphic chips mimic brain with spiking neurons. Very low power. Research stage for AI applications.