← Back to AI Factory Chat

AI Factory Glossary

122 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 3 (122 entries)

n-beats, n-beats, time series models

N-BEATS is a neural basis expansion analysis architecture for interpretable time series forecasting using forward and backward forecast stacks.

naive bayes,probabilistic,simple

Naive Bayes is simple probabilistic classifier. Fast, baseline.

name substitution, fairness

Replace names to test bias.

nas cell search, nas, neural architecture search

Cell-based neural architecture search discovers repeatable computational blocks that are stacked for full networks.

nas-bench, neural architecture search

NAS-Bench provides standardized benchmarks with pre-computed architecture performance metrics to enable reproducible and efficient NAS research.

nas-rl agent, nas-rl, neural architecture search

Reinforcement learning agents for NAS explore architecture spaces using policy gradients to maximize validation performance.

naswot, naswot, neural architecture search

Neural Architecture Search Without Training uses gradient magnitude statistics at initialization to predict architecture performance.

nbti modeling, nbti, reliability

Model NBTI degradation.

nchw layout, nchw, model optimization

NCHW layout stores tensors in batch-channel-height-width order preferred by some accelerators.

ndcg (normalized discounted cumulative gain),ndcg,normalized discounted cumulative gain,evaluation

Ranking quality metric.

negative binomial yield model,manufacturing

More realistic model with defect clustering.

negative prompting, generative models

Specify what to avoid.

negative prompting, multimodal ai

Negative prompting specifies undesired attributes guiding generation away from certain features.

neighborhood sampling, graph neural networks

Neighborhood sampling limits aggregation to random subsets of neighbors reducing computational cost in large graphs.

nemo guardrails,programmable,nvidia

NeMo Guardrails is programmable safety layer. Colang language. NVIDIA open source.

neptune.ai, mlops

ML experiment management.

nequip, chemistry ai

E(3)-equivariant network for atomistic simulations.

nequip, graph neural networks

Neural Equivariant Interatomic Potentials combine E(3) equivariance with message passing for molecular dynamics.

nerf training process, 3d vision

Optimize NeRF from images.

nerf, multimodal ai

NeRF synthesizes novel views by volumetrically rendering scenes from learned implicit representations.

net zero emissions, environmental & sustainability

Net zero emissions balance residual greenhouse gas releases with equivalent removals or offsets.

network morphism,neural architecture

Transform networks while preserving function.

network pruning structured,model optimization

Remove entire structures like filters.

network pruning unstructured,model optimization

Remove individual weights.

neural additive models, nam, explainable ai

Additive models using neural shape functions.

neural architecture distillation, model optimization

Neural architecture distillation transfers architectural knowledge not just parameter values.

neural architecture generator,neural architecture

Generate architectures automatically.

neural architecture search (nas),neural architecture search,nas,model architecture

Automatically discover optimal model architectures.

neural architecture search advanced, nas, neural architecture

Sophisticated AutoML for architecture discovery.

neural architecture search for edge, edge ai

Find architectures suitable for edge.

neural architecture search,nas,automl

NAS automatically finds optimal architectures. Search space, search algorithm, evaluation. Expensive but effective.

neural architecture transfer, neural architecture

Transfer architectures across tasks.

neural articulation, multimodal ai

Neural articulation represents articulated objects with learned kinematic structures.

neural beamforming, audio & speech

Neural beamforming learns spatial filters through deep learning rather than classical signal processing.

neural cache, model optimization

Neural caching stores and reuses intermediate computations for similar inputs reducing redundancy.

neural cf, recommendation systems

Neural Collaborative Filtering replaces inner products with multi-layer perceptrons to learn complex non-linear user-item interactions.

neural chat,intel neural chat,neural chat model

Neural Chat is Intel-optimized chat model. Runs well on Intel hardware.

neural circuit policies, ncp, reinforcement learning

Control policies implemented as interpretable neural circuits with liquid time constants.

neural circuit policies,reinforcement learning

Policies implemented as neural circuits.

neural codec, multimodal ai

Neural codecs compress audio or images using learned representations for efficient transmission.

neural constituency, structured prediction

Neural constituency parsing uses recursive or tree-structured neural networks to predict hierarchical phrase structure trees.

neural controlled differential equations, neural architecture

Use neural networks to parameterize CDEs.

neural data-to-text,nlp

Use neural models for data verbalization.

neural encoding, neural architecture search

Neural architecture encoders convert graph structures into fixed-dimensional vectors for predictor training.

neural engine,edge ai

Specialized hardware for on-device ML (Apple's).

neural fabrics, neural architecture search

Neural fabrics represent search spaces as trellis structures where NAS learns to select paths through pre-defined computational building blocks.

neural hawkes process, time series models, point processes, event modeling, temporal dynamics, neural networks, hawkes process

# Neural Hawkes Process and Time Series Models ## Neural Hawkes Process ## 1. Introduction A **Hawkes process** is a self-exciting point process used to model events that occur randomly in continuous time, where the occurrence of past events increases the likelihood of future events. **Key characteristics:** - Events occur at random times $t_1, t_2, t_3, \ldots$ - Past events "excite" or increase the probability of future events - The process has memory—history matters - Widely used in finance, seismology, social networks, and neuroscience ## 2. Classical Hawkes Process ### 2.1 Intensity Function The **conditional intensity function** $\lambda(t)$ represents the instantaneous rate of event occurrence: $$ \lambda(t) = \mu + \sum_{t_i < t} \phi(t - t_i) $$ **Where:** - $\mu > 0$ — Base intensity (background rate) - $\phi(\cdot)$ — Triggering kernel (excitation function) - $t_i$ — Times of past events - $\sum_{t_i < t}$ — Sum over all events before time $t$ ### 2.2 Common Triggering Kernels **Exponential kernel (most common):** $$ \phi(\tau) = \alpha \cdot e^{-\beta \tau} $$ - $\alpha > 0$ — Excitation magnitude - $\beta > 0$ — Decay rate - Constraint: $\frac{\alpha}{\beta} < 1$ for stationarity **Power-law kernel:** $$ \phi(\tau) = \frac{\alpha}{(\tau + c)^{(1+\omega)}} $$ - Used in seismology (Omori's law) - Heavier tails than exponential ### 2.3 Likelihood Function For a sequence of events $\{t_1, t_2, \ldots, t_n\}$ in interval $[0, T]$: $$ \mathcal{L} = \prod_{i=1}^{n} \lambda(t_i) \cdot \exp\left( -\int_0^T \lambda(s) \, ds \right) $$ **Log-likelihood:** $$ \log \mathcal{L} = \sum_{i=1}^{n} \log \lambda(t_i) - \int_0^T \lambda(s) \, ds $$ ### 2.4 Branching Structure The Hawkes process has a **branching interpretation:** - **Immigrants:** Events from the background rate $\mu$ - **Offspring:** Events triggered by previous events - **Branching ratio:** $n^* = \int_0^\infty \phi(\tau) \, d\tau$ - If $n^* < 1$: Process is subcritical (stationary) - If $n^* = 1$: Process is critical - If $n^* > 1$: Process is supercritical (explosive) ## 3. Neural Hawkes Process ### 3.1 Motivation **Limitations of classical Hawkes processes:** - Parametric kernels may not capture complex dynamics - Difficult to model **inhibition** (events reducing future probability) - Limited expressiveness for multi-type event interactions - Manual feature engineering required **Solution:** Replace parametric components with neural networks. ### 3.2 Continuous-Time LSTM (CT-LSTM) The Neural Hawkes Process (Mei & Eisner, 2017) uses a **continuous-time LSTM** where the hidden state evolves between events. **Standard LSTM update at event $t_i$:** $$ \begin{aligned} i_i &= \sigma(W_i x_i + U_i h_{i-1} + b_i) \\ f_i &= \sigma(W_f x_i + U_f h_{i-1} + b_f) \\ o_i &= \sigma(W_o x_i + U_o h_{i-1} + b_o) \\ \tilde{c}_i &= \tanh(W_c x_i + U_c h_{i-1} + b_c) \\ c_i &= f_i \odot c_{i-1} + i_i \odot \tilde{c}_i \\ h_i &= o_i \odot \tanh(c_i) \end{aligned} $$ **Where:** - $i_i$ — Input gate - $f_i$ — Forget gate - $o_i$ — Output gate - $c_i$ — Cell state - $h_i$ — Hidden state - $\sigma(\cdot)$ — Sigmoid function - $\odot$ — Element-wise multiplication ### 3.3 Continuous-Time Dynamics **Key innovation:** Cell state decays continuously between events. **Cell state at time $t$ (between events $t_i$ and $t_{i+1}$):** $$ c(t) = \bar{c}_i + (c_i - \bar{c}_i) \cdot e^{-\delta_i (t - t_i)} $$ **Where:** - $c_i$ — Cell state immediately after event $t_i$ - $\bar{c}_i$ — Target cell state (what $c(t)$ decays toward) - $\delta_i > 0$ — Decay rate (learned) - $t - t_i$ — Time elapsed since last event **Target cell state:** $$ \bar{c}_i = \bar{f}_i \odot \bar{c}_{i-1} + \bar{i}_i \odot \tilde{c}_i $$ **Hidden state at time $t$:** $$ h(t) = o_i \odot \tanh(c(t)) $$ ### 3.4 Intensity Function The intensity for event type $k$ at time $t$: $$ \lambda_k(t) = f_k(h(t)) = \text{softplus}(w_k^\top h(t) + b_k) $$ **Softplus function:** $$ \text{softplus}(x) = \log(1 + e^x) $$ **Properties:** - Ensures $\lambda_k(t) > 0$ (intensity must be positive) - Smooth approximation to ReLU - Allows for both excitation and inhibition ### 3.5 Training Objective **Negative log-likelihood:** $$ \mathcal{L} = -\sum_{i=1}^{n} \log \lambda_{k_i}(t_i) + \sum_{k=1}^{K} \int_0^T \lambda_k(s) \, ds $$ **Where:** - $k_i$ — Type of the $i$-th event - $K$ — Total number of event types - The integral is computed via Monte Carlo sampling or numerical integration ### 3.6 Architecture Summary ``` Input: Event sequence {(t_1, k_1), (t_2, k_2), ..., (t_n, k_n)} │ ▼ ┌───────────────────┐ │ Event Embedding │ │ x_i = embed(k_i)│ └───────────────────┘ │ ▼ ┌───────────────────┐ │ CT-LSTM Cell │ │ c(t), h(t) │ └───────────────────┘ │ ▼ ┌───────────────────┐ │ Intensity Layer │ │ λ_k(t) = softplus│ └───────────────────┘ │ ▼ Output: λ(t) for prediction, NLL for training ``` ## 4. Relationship to Time Series Models ### 4.1 Comparison Table | Aspect | Traditional Time Series | Point Processes | |:-------|:-----------------------|:----------------| | **Data** | Regular samples $y_1, y_2, \ldots$ | Event times $t_1, t_2, \ldots$ | | **Question** | What is $y$ at time $t$? | When does next event occur? | | **Spacing** | Fixed intervals $\Delta t$ | Irregular, continuous | | **Models** | ARIMA, GARCH, RNN | Poisson, Hawkes, Neural TPP | ### 4.2 Key Differences **Time series models:** - Observations at fixed time intervals: $y_t, y_{t+1}, y_{t+2}, \ldots$ - Model the value/magnitude of observations - Examples: stock prices, temperature, sensor readings **Point processes:** - Events at irregular, continuous times: $t_1, t_2, t_3, \ldots$ - Model **when** events occur (and optionally what type) - Examples: transactions, earthquakes, social media posts ### 4.3 Connections **Converting between representations:** - **Point process → Time series:** Count events in fixed bins $$N_t = \#\{t_i : t_i \in [t, t+\Delta t)\}$$ - **Time series → Point process:** Treat threshold crossings as events **Shared neural architectures:** - Both use RNNs, LSTMs, Transformers - Attention mechanisms applicable to both - Encoder-decoder frameworks common ## 5. Modern Extensions ### 5.1 Transformer Hawkes Process **Reference:** Zuo et al., 2020 **Key idea:** Replace RNN with self-attention mechanism. **Advantages:** - Parallelizable training (no sequential dependency) - Better long-range dependency modeling - Scales to longer sequences **Self-attention for events:** $$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V $$ **Temporal encoding:** $$ \text{PE}(t, 2i) = \sin\left(\frac{t}{10000^{2i/d}}\right) $$ $$ \text{PE}(t, 2i+1) = \cos\left(\frac{t}{10000^{2i/d}}\right) $$ ### 5.2 Neural Jump SDEs **Combines:** - Continuous diffusion dynamics (SDEs) - Discrete jumps (point processes) **Formulation:** $$ dX_t = f(X_t) \, dt + g(X_t) \, dW_t + h(X_t) \, dN_t $$ **Where:** - $f(X_t) \, dt$ — Drift term - $g(X_t) \, dW_t$ — Diffusion (Brownian motion) - $h(X_t) \, dN_t$ — Jump term (point process) ### 5.3 Variational Approaches **Variational Autoencoder for Point Processes:** $$ \mathcal{L}_{\text{ELBO}} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{\text{KL}}(q(z|x) \| p(z)) $$ **Benefits:** - Uncertainty quantification - Latent structure discovery - Generative modeling ### 5.4 Marked Temporal Point Processes Events carry additional information (**marks**): $$ \{(t_1, m_1), (t_2, m_2), \ldots, (t_n, m_n)\} $$ **Joint intensity:** $$ \lambda(t, m) = \lambda_g(t) \cdot f(m | t, \mathcal{H}_t) $$ - $\lambda_g(t)$ — Ground intensity (when) - $f(m | t, \mathcal{H}_t)$ — Mark distribution (what) ## 6. Applications ### 6.1 When to Use Neural Hawkes **Good fit:** - Event data with self-exciting patterns - Multiple interacting event types - Complex, nonlinear dependencies - Large datasets where neural networks can generalize **Specific domains:** - **Finance:** High-frequency trading, order book dynamics - **Social networks:** Information cascades, retweets, viral content - **Healthcare:** Patient events, hospital admissions, disease outbreaks - **Criminology:** Crime prediction, recidivism modeling - **Seismology:** Earthquake aftershock prediction - **Neuroscience:** Neural spike train modeling ### 6.2 When to Consider Alternatives | Scenario | Recommended Alternative | |:---------|:-----------------------| | Regularly sampled data | Standard time series (ARIMA, LSTM) | | Need interpretability | Classical Hawkes with explicit kernels | | Very sparse data | Simple parametric models | | Real-time constraints | Lightweight models, online learning | ### 6.3 Implementation Resources **Libraries:** - `tick` (Python) — Classical point processes - `PtPack` (Python) — Neural temporal point processes - `pytorch-transformer-hawkes` — Transformer-based models **Key papers:** - Mei & Eisner (2017): "The Neural Hawkes Process" - Zuo et al. (2020): "Transformer Hawkes Process" - Du et al. (2016): "Recurrent Marked Temporal Point Processes" ## 7. Mathematical ### Core Equations Reference **Classical Hawkes intensity:** $$ \lambda(t) = \mu + \sum_{t_i < t} \alpha e^{-\beta(t - t_i)} $$ **Neural Hawkes continuous-time cell:** $$ c(t) = \bar{c}_i + (c_i - \bar{c}_i) e^{-\delta_i(t - t_i)} $$ **Neural intensity function:** $$ \lambda_k(t) = \text{softplus}(w_k^\top h(t) + b_k) $$ **Log-likelihood:** $$ \log \mathcal{L} = \sum_{i=1}^{n} \log \lambda_{k_i}(t_i) - \int_0^T \sum_{k=1}^{K} \lambda_k(s) \, ds $$

neural implicit functions, 3d vision

Networks representing shapes.

neural implicit surfaces,computer vision

Learn surfaces as zero level sets.

neural mesh representation, 3d vision

Encode mesh with networks.