neural mesh, multimodal ai
Neural meshes combine explicit mesh topology with learned vertex features and deformations.
122 technical terms and definitions
Neural meshes combine explicit mesh topology with learned vertex features and deformations.
Dynamically compose modules for reasoning.
Compose neural modules for reasoning.
Learn system dynamics with neural networks.
Limiting behavior of infinitely wide networks.
ML models for potential energy surfaces.
Remove weights for edge deployment.
Modify trained networks.
Use NNs to optimize recipes.
Neural ODEs for graphs model continuous-time graph dynamics through differential equations.
Neural networks defined by differential equations.
Learn operators between function spaces.
Model dynamics as continuous-time ODEs learned by neural networks.
Graph neural network-based neural predictors encode architectures as computation graphs for performance estimation.
Neural predictors are meta-models trained to estimate architecture performance from encoded representations enabling efficient search space exploration.
Use neural networks to generate programs.
Neural radiance fields represent 3D scenes as neural networks mapping coordinates to color and density.
Represent scenes as neural fields.
Detailed NeRF techniques.
NeRF for video.
Use neural networks for rendering.
Mathematical relationships between loss and model size/data/compute.
Learn motion in neural representation.
Neural scene graphs represent scenes as compositional graphs of objects with neural appearance.
Encode 3D scenes in neural networks.
Learn stochastic dynamics with neural networks.
Use style transfer to understand representations.
Apply artistic style to content.
Neural tangent kernel theory informs NAS by predicting training dynamics from architecture initialization properties.
Analyze neural networks via kernel methods.
Neural networks for automated theorem proving.
Neural Transducer generalizes RNN-T with different encoder and predictor architectures for streaming ASR.
Networks with differentiable external memory and read/write heads.
Convert acoustic features back into audio waveforms.
Volumetric representation for video.
Elon Musk's brain-computer interface company.
# NeuralProphet and Time Series Models A comprehensive guide to NeuralProphet and modern time series forecasting approaches. ## 1. Introduction to NeuralProphet NeuralProphet is a **neural network-based time series forecasting library** built on PyTorch, designed as a successor to Facebook's Prophet. ### Key Characteristics - **Hybrid Architecture**: Combines classical decomposition with deep learning - **Interpretability**: Maintains component-wise explainability - **Flexibility**: Supports auto-regression, lagged regressors, and future covariates - **Scalability**: GPU-accelerated training via PyTorch backend ## 2. Mathematical Foundations ### 2.1 General Time Series Decomposition A time series $y_t$ can be decomposed as: $$ y_t = T_t + S_t + H_t + A_t + F_t + L_t + \epsilon_t $$ Where: - $T_t$ — Trend component - $S_t$ — Seasonal component - $H_t$ — Holiday/event effects - $A_t$ — Auto-regressive component - $F_t$ — Future regressors - $L_t$ — Lagged regressors - $\epsilon_t$ — Residual error (noise) ### 2.2 Trend Modeling #### Linear Trend $$ T_t = k + \delta^T \cdot \mathbf{a}(t) \cdot t $$ Where: - $k$ — Base growth rate - $\delta$ — Vector of changepoint adjustments - $\mathbf{a}(t)$ — Indicator function for changepoints #### Logistic Growth (Saturating Trend) $$ T_t = \frac{C(t)}{1 + \exp\left(-k(t - m)\right)} $$ Where: - $C(t)$ — Time-varying capacity (carrying capacity) - $k$ — Growth rate - $m$ — Offset parameter ### 2.3 Seasonality via Fourier Series Seasonal patterns are modeled using Fourier terms: $$ S_t = \sum_{n=1}^{N} \left( a_n \cos\left(\frac{2\pi n t}{P}\right) + b_n \sin\left(\frac{2\pi n t}{P}\right) \right) $$ Where: - $P$ — Period (e.g., 365.25 for yearly, 7 for weekly) - $N$ — Number of Fourier terms (controls smoothness) - $a_n, b_n$ — Learned Fourier coefficients **Common Configurations:** | Seasonality | Period $P$ | Typical $N$ | |-------------|------------|-------------| | Yearly | 365.25 | 10 | | Weekly | 7 | 3 | | Daily | 1 | 4 | ### 2.4 Auto-Regressive Component (AR-Net) The AR component uses a feed-forward neural network: $$ A_t = f_{\theta}\left( y_{t-1}, y_{t-2}, \ldots, y_{t-p} \right) $$ Where: - $p$ — Number of lags (lookback window) - $f_{\theta}$ — Neural network with parameters $\theta$ The network architecture: $$ \mathbf{h}^{(1)} = \text{ReLU}\left( \mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)} \right) $$ $$ \mathbf{h}^{(l)} = \text{ReLU}\left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right) $$ $$ A_t = \mathbf{W}^{(\text{out})} \mathbf{h}^{(L)} + b^{(\text{out})} $$ ### 2.5 Holiday/Event Effects $$ H_t = \sum_{i=1}^{M} \kappa_i \cdot \mathbf{1}_{[t \in D_i]} $$ Where: - $M$ — Number of distinct holidays/events - $\kappa_i$ — Effect magnitude for holiday $i$ - $D_i$ — Set of dates for holiday $i$ - $\mathbf{1}_{[\cdot]}$ — Indicator function With window effects (days before/after): $$ H_t = \sum_{i=1}^{M} \sum_{w=-W^-}^{W^+} \kappa_{i,w} \cdot \mathbf{1}_{[t + w \in D_i]} $$ ### 2.6 Lagged Regressors $$ L_t = \sum_{j=1}^{J} g_{\phi_j}\left( x_{j,t-1}, x_{j,t-2}, \ldots, x_{j,t-q_j} \right) $$ Where: - $J$ — Number of lagged regressor variables - $q_j$ — Lag depth for regressor $j$ - $g_{\phi_j}$ — Neural network for regressor $j$ ### 2.7 Future Regressors $$ F_t = \sum_{k=1}^{K} \beta_k \cdot z_{k,t} $$ Where: - $z_{k,t}$ — Known future value of regressor $k$ at time $t$ - $\beta_k$ — Learned coefficient ### 2.8 Loss Function Training minimizes the loss: $$ \mathcal{L} = \frac{1}{T} \sum_{t=1}^{T} \ell\left( y_t, \hat{y}_t \right) + \lambda \cdot \mathcal{R}(\theta) $$ Common loss functions: - **MSE (Mean Squared Error)**: $$\ell(y, \hat{y}) = (y - \hat{y})^2$$ - **Huber Loss**: $$\ell_{\delta}(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\ \delta \left( |y - \hat{y}| - \frac{1}{2}\delta \right) & \text{otherwise} \end{cases}$$ - **MAE (Mean Absolute Error)**: $$\ell(y, \hat{y}) = |y - \hat{y}|$$ Regularization term: $$ \mathcal{R}(\theta) = \|\theta\|_2^2 \quad \text{(L2 regularization)} $$ ## 3. Core ### 3.1 Component | Component | Symbol | Description | Learnable Parameters | |-----------|--------|-------------|----------------------| | Trend | $T_t$ | Long-term growth pattern | $k, \delta, m$ | | Seasonality | $S_t$ | Periodic patterns | $a_n, b_n$ | | Holidays | $H_t$ | Event-based effects | $\kappa_i$ | | AR-Net | $A_t$ | Auto-regressive dependencies | $\mathbf{W}, \mathbf{b}$ | | Future Regressors | $F_t$ | Known future covariates | $\beta_k$ | | Lagged Regressors | $L_t$ | Past covariate effects | $\phi_j$ | ### 3.2 Additive vs Multiplicative Modes **Additive Model:** $$ y_t = T_t + S_t + H_t + \epsilon_t $$ **Multiplicative Model:** $$ y_t = T_t \cdot (1 + S_t) \cdot (1 + H_t) + \epsilon_t $$ Or equivalently in log-space: $$ \log(y_t) = \log(T_t) + \log(1 + S_t) + \log(1 + H_t) + \epsilon_t $$ ### 3.3 Uncertainty Quantification NeuralProphet uses **conformal prediction** for prediction intervals: $$ \hat{C}_{1-\alpha}(x) = \left[ \hat{y}(x) - q_{1-\alpha}, \; \hat{y}(x) + q_{1-\alpha} \right] $$ Where: - $q_{1-\alpha}$ — Quantile of residuals on calibration set - $\alpha$ — Significance level (e.g., 0.05 for 95% intervals) ## 4. Model Comparison ### 4.1 Classical Statistical Models #### ARIMA (AutoRegressive Integrated Moving Average) $$ \phi(B)(1-B)^d y_t = \theta(B) \epsilon_t $$ Where: - $B$ — Backshift operator: $B y_t = y_{t-1}$ - $\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p$ — AR polynomial - $\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_q B^q$ — MA polynomial - $d$ — Differencing order #### Exponential Smoothing (ETS) **Simple Exponential Smoothing:** $$ \hat{y}_{t+1} = \alpha y_t + (1 - \alpha) \hat{y}_t $$ **Holt-Winters (Additive):** $$ \begin{aligned} \ell_t &= \alpha (y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_t &= \beta (\ell_t - \ell_{t-1}) + (1 - \beta) b_{t-1} \\ s_t &= \gamma (y_t - \ell_t) + (1 - \gamma) s_{t-m} \\ \hat{y}_{t+h} &= \ell_t + h b_t + s_{t+h-m} \end{aligned} $$ ### 4.2 Deep Learning Models #### LSTM (Long Short-Term Memory) $$ \begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\ C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t \odot \tanh(C_t) \end{aligned} $$ #### Transformer (Self-Attention) $$ \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d_k}} \right) V $$ Multi-head attention: $$ \text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h) W^O $$ Where: $$ \text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V) $$ ### 4.3 Comparison Matrix | Model | Auto-Regressive | Exogenous Variables | Interpretability | Data Requirement | |-------|-----------------|---------------------|------------------|------------------| | ARIMA | $\checkmark$ | Limited | High | Low | | Prophet | $\times$ | $\checkmark$ | High | Medium | | NeuralProphet | $\checkmark$ | $\checkmark$ | High | Medium | | LSTM | $\checkmark$ | $\checkmark$ | Low | High | | Transformer | $\checkmark$ | $\checkmark$ | Low | Very High | | N-BEATS | $\checkmark$ | $\times$ | Medium | High | ## 5. Implementation Guide ### 5.1 Installation ```bash Install via pip pip install neuralprophet With additional dependencies pip install neuralprophet[live] ``` ### 5.2 Basic Usage ```python from neuralprophet import NeuralProphet import pandas as pd Load data (must have 'ds' and 'y' columns) df = pd.read_csv('timeseries_data.csv') df['ds'] = pd.to_datetime(df['ds']) Initialize model model = NeuralProphet( growth='linear', # 'linear' or 'discontinuous' seasonality_mode='multiplicative', # 'additive' or 'multiplicative' yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False, n_lags=14, # AR lookback n_forecasts=7, # Multi-step forecast learning_rate=0.1, epochs=100, batch_size=64, ) Fit model metrics = model.fit(df, freq='D') Create future dataframe future = model.make_future_dataframe(df, periods=30) Predict forecast = model.predict(future) Visualize fig_forecast = model.plot(forecast) fig_components = model.plot_components(forecast) ``` ### 5.3 Adding Regressors ```python Future regressors (known ahead of time) model = NeuralProphet() model.add_future_regressor('temperature') model.add_future_regressor('promotion', mode='multiplicative') Lagged regressors (only past values used) model = NeuralProphet(n_lags=7) model.add_lagged_regressor('competitor_sales', n_lags=7) Fit with regressor columns in dataframe model.fit(df_with_regressors, freq='D') ``` ### 5.4 Custom Seasonality ```python model = NeuralProphet( yearly_seasonality=False, # Disable default weekly_seasonality=False, ) # Add custom seasonality model.add_seasonality( name='monthly', period=30.5, fourier_order=5, ) model.add_seasonality( name='quarterly', period=91.25, fourier_order=3, ) ``` ### 5.5 Holidays and Events ```python Define holidays playoffs = pd.DataFrame({ 'event': 'playoff', 'ds': pd.to_datetime(['2024-01-13', '2024-01-14', '2024-01-20']), }) superbowl = pd.DataFrame({ 'event': 'superbowl', 'ds': pd.to_datetime(['2024-02-11']), }) events_df = pd.concat([playoffs, superbowl]) Add to model model = NeuralProphet() model.add_events(['playoff', 'superbowl']) Merge events with training data df_with_events = model.create_df_with_events(df, events_df) Fit model.fit(df_with_events, freq='D') ``` ### 5.6 Hyperparameter Tuning ```python from neuralprophet import NeuralProphet, set_random_seed set_random_seed(42) Grid search example param_grid = { 'n_lags': [7, 14, 28], 'learning_rate': [0.01, 0.1, 0.5], 'seasonality_mode': ['additive', 'multiplicative'], } best_mae = float('inf') best_params = None for n_lags in param_grid['n_lags']: for lr in param_grid['learning_rate']: for mode in param_grid['seasonality_mode']: model = NeuralProphet( n_lags=n_lags, learning_rate=lr, seasonality_mode=mode, epochs=50, ) # Split data df_train, df_test = model.split_df(df, valid_p=0.2) # Fit and evaluate metrics = model.fit(df_train, freq='D', validation_df=df_test) if metrics['MAE_val'].iloc[-1] < best_mae: best_mae = metrics['MAE_val'].iloc[-1] best_params = {'n_lags': n_lags, 'lr': lr, 'mode': mode} print(f"Best MAE: {best_mae:.4f}") print(f"Best params: {best_params}") ``` ## 6. Advanced Topics ### 6.1 Multi-Step Forecasting For $h$-step ahead forecasting: $$ \hat{y}_{t+h|t} = f\left( y_t, y_{t-1}, \ldots, y_{t-p+1}; \mathbf{x}_t; \theta \right) $$ **Strategies:** - **Direct**: Train separate models for each horizon $h$ - **Recursive**: Use $\hat{y}_{t+1}$ to predict $\hat{y}_{t+2}$, etc. - **Multi-output**: Single model outputs $[\hat{y}_{t+1}, \ldots, \hat{y}_{t+h}]$ NeuralProphet uses **multi-output** via `n_forecasts` parameter. ### 6.2 Metrics for Evaluation | Metric | Formula | Properties | |--------|---------|------------| | MAE | Mean Absolute Error: (1/n) × Sum of absolute differences | Robust to outliers | | MSE | Mean Squared Error: (1/n) × Sum of squared differences | Penalizes large errors | | RMSE | Root Mean Squared Error: Square root of MSE | Same units as y | | MAPE | Mean Absolute Percentage Error: (100/n) × Sum of percentage errors | Scale-independent | | sMAPE | Symmetric MAPE: (200/n) × Symmetric percentage calculation | Symmetric | | MASE | Mean Absolute Scaled Error: MAE divided by naive forecast MAE | Scaled by naive forecast | ### 6.3 Cross-Validation for Time Series **Time Series Split (Walk-Forward Validation):** ``` Fold 1: [Train: ████████] [Test: ██] Fold 2: [Train: ██████████] [Test: ██] Fold 3: [Train: ████████████] [Test: ██] Fold 4: [Train: ██████████████] [Test: ██] ``` ```python from neuralprophet import NeuralProphet model = NeuralProphet() Perform cross-validation cv_results = model.crossvalidation_split_df( df, freq='D', k=5, # Number of folds fold_pct=0.1, # Percentage of data per fold fold_overlap_pct=0.5, ) ``` ### 6.4 Handling Missing Data $$ y_t^{\text{imputed}} = \begin{cases} y_t & \text{if observed} \\ \hat{y}_t^{\text{interpolated}} & \text{if missing} \end{cases} $$ Common strategies: - **Forward fill**: $y_t^{\text{missing}} = y_{t-1}$ - **Linear interpolation**: $y_t^{\text{missing}} = \frac{y_{t-k} + y_{t+j}}{2}$ - **Seasonal interpolation**: $y_t^{\text{missing}} = y_{t-P}$ ```python NeuralProphet handles missing values automatically But you can also preprocess: df['y'] = df['y'].interpolate(method='linear') ``` ### 6.5 Trend Changepoints Automatic detection of trend changes at times $s_j$: $$ T_t = \left( k + \sum_{j: s_j < t} \delta_j \right) t + \left( m + \sum_{j: s_j < t} \gamma_j \right) $$ ```python model = NeuralProphet( n_changepoints=10, # Number of potential changepoints changepoints_range=0.8, # Proportion of history for changepoints trend_reg=0.1, # Regularization on trend changes ) ``` ## 7. Quick Reference ### Key Equations Summary | Concept | Equation | |---------|----------| | Full Model | $y_t = T_t + S_t + H_t + A_t + F_t + L_t + \epsilon_t$ | | Fourier Seasonality | $S_t = \sum_{n=1}^{N} \left( a_n \cos\left(\frac{2\pi n t}{P}\right) + b_n \sin\left(\frac{2\pi n t}{P}\right) \right)$ | | AR-Net | $A_t = f_{\theta}(y_{t-1}, \ldots, y_{t-p})$ | | Trend (Linear) | $T_t = k + \delta^T \mathbf{a}(t) \cdot t$ | | Trend (Logistic) | $T_t = \frac{C}{1 + e^{-k(t-m)}}$ | | Loss | $\mathcal{L} = \frac{1}{T}\sum_t \ell(y_t, \hat{y}_t) + \lambda \mathcal{R}(\theta)$ | ### Recommended Hyperparameters | Parameter | Default | Range | Notes | |-----------|---------|-------|-------| | `n_lags` | 0 | 0-60 | Higher for more AR dependency | | `n_forecasts` | 1 | 1-30 | Multi-step output | | `learning_rate` | auto | 0.001-1.0 | Start with 0.1 | | `epochs` | auto | 50-500 | Monitor validation loss | | `batch_size` | auto | 16-256 | Larger for more data | | `yearly_seasonality` | auto | True/False/int | int = Fourier order | | `weekly_seasonality` | auto | True/False/int | int = Fourier order |
Combine neural nets with symbolic reasoning.
Neuromorphic chips mimic brain with spiking neurons. Very low power. Research stage for AI applications.
Study individual neurons.
NeVAE generates molecules by combining VAE with neural networks for structure generation.
NHWC layout stores tensors in batch-height-width-channel order optimizing certain operations.
Current generation of quantum computers.
Algorithms for noisy intermediate-scale quantum.
Incorporate nitrogen into oxide to improve reliability.
Table-based timing model.
No-repeat n-gram blocking prevents exact phrase repetition.
Learn node embeddings via random walks.
Train by contrasting data and noise distributions.
Learn by contrasting data with noise.