streaming llm, llm architecture
Process infinite sequences.
1,106 technical terms and definitions
Process infinite sequences.
GPU building blocks.
Streaming sends tokens as generated via SSE or WebSocket. User sees output immediately, feels faster.
Streamlit builds Python web apps fast. Great for ML demos. Interactive widgets.
Stress engineering introduces mechanical strain in channels to modulate mobility and enhance drive current.
Deliberately apply stress to enhance performance.
Induce stress then remove stressor.
Copper movement from thermal stress.
Model thermal stress effects.
Metal movement due to thermal stress.
Reduce stress in thinned wafer.
Expose defects before shipment.
Compute mechanical stress from processing.
Test model under distribution shift.
Stress-induced voids form in interconnects when mechanical stress gradients exceed critical thresholds.
Relate Raman shift to stress.
Attend to every k-th position.
Two types of hard-to-change factors.
Stripe provides payment processing. Easy integration.
# Structural Time Series Models ## STS Structural time series (STS) models, also called **state space models** or **unobserved components models**, decompose a time series into interpretable components—each representing a distinct source of variation. ## 1. Core Components A structural time series model decomposes an observed series $y_t$ into additive components: $$ y_t = \mu_t + \gamma_t + \psi_t + X_t\beta + \varepsilon_t $$ Where: - $\mu_t$ — Trend component - $\gamma_t$ — Seasonal component - $\psi_t$ — Cyclical component - $X_t\beta$ — Regression/explanatory effects - $\varepsilon_t$ — Irregular (white noise) component ## 2. Component Specifications ### 2.1 Trend Component ($\mu_t$) The trend captures the underlying level and growth pattern of the series. #### Local Level Model (Random Walk) $$ \mu_t = \mu_{t-1} + \eta_t, \quad \eta_t \sim N(0, \sigma_\eta^2) $$ - Level evolves as a random walk - No slope/growth rate component - Suitable for series without systematic growth #### Local Linear Trend Model $$ \begin{aligned} \mu_t &= \mu_{t-1} + \nu_{t-1} + \eta_t, \quad \eta_t \sim N(0, \sigma_\eta^2) \\ \nu_t &= \nu_{t-1} + \zeta_t, \quad \zeta_t \sim N(0, \sigma_\zeta^2) \end{aligned} $$ - $\mu_t$ — Stochastic level - $\nu_t$ — Stochastic slope (growth rate) - Both level and slope evolve over time - When $\sigma_\zeta^2 = 0$: slope is fixed (deterministic growth) - When $\sigma_\eta^2 = 0$: smooth trend (integrated random walk) #### Smooth Trend (Integrated Random Walk) $$ \begin{aligned} \mu_t &= \mu_{t-1} + \nu_{t-1} \\ \nu_t &= \nu_{t-1} + \zeta_t, \quad \zeta_t \sim N(0, \sigma_\zeta^2) \end{aligned} $$ - Level changes are smooth (no level disturbance) - Only slope receives stochastic shocks #### Deterministic Trend $$ \mu_t = \alpha + \beta t $$ - Fixed intercept $\alpha$ and slope $\beta$ - No stochastic evolution ### 2.2 Seasonal Component ($\gamma_t$) Captures recurring patterns at fixed intervals. #### Dummy Variable Form $$ \gamma_t = -\sum_{j=1}^{s-1} \gamma_{t-j} + \omega_t, \quad \omega_t \sim N(0, \sigma_\omega^2) $$ - $s$ — Number of seasons (e.g., $s=12$ for monthly data) - Seasonal effects sum to zero over a complete cycle - When $\sigma_\omega^2 = 0$: deterministic (fixed) seasonality #### Trigonometric/Fourier Form $$ \gamma_t = \sum_{j=1}^{[s/2]} \gamma_{j,t} $$ Each harmonic $j$ follows: $$ \begin{bmatrix} \gamma_{j,t} \\ \gamma_{j,t}^* \end{bmatrix} = \begin{bmatrix} \cos \lambda_j & \sin \lambda_j \\ -\sin \lambda_j & \cos \lambda_j \end{bmatrix} \begin{bmatrix} \gamma_{j,t-1} \\ \gamma_{j,t-1}^* \end{bmatrix} + \begin{bmatrix} \omega_{j,t} \\ \omega_{j,t}^* \end{bmatrix} $$ Where: - $\lambda_j = \frac{2\pi j}{s}$ — Frequency of harmonic $j$ - $\omega_{j,t}, \omega_{j,t}^* \sim N(0, \sigma_\omega^2)$ - Allows different variances for different harmonics - More parsimonious when few harmonics are needed ### 2.3 Cyclical Component ($\psi_t$) Captures medium-term fluctuations not tied to fixed calendar periods. $$ \begin{bmatrix} \psi_t \\ \psi_t^* \end{bmatrix} = \rho \begin{bmatrix} \cos \lambda_c & \sin \lambda_c \\ -\sin \lambda_c & \cos \lambda_c \end{bmatrix} \begin{bmatrix} \psi_{t-1} \\ \psi_{t-1}^* \end{bmatrix} + \begin{bmatrix} \kappa_t \\ \kappa_t^* \end{bmatrix} $$ Where: - $\lambda_c \in (0, \pi)$ — Cycle frequency - $\rho \in (0, 1)$ — Damping factor (ensures stationarity) - $\kappa_t, \kappa_t^* \sim N(0, \sigma_\kappa^2)$ - Period of cycle: $\frac{2\pi}{\lambda_c}$ time units ### 2.4 Regression Component ($X_t\beta$) Incorporates explanatory variables: $$ \text{Regression effect} = \sum_{k=1}^{K} \beta_k x_{k,t} $$ Common applications: - **Intervention effects**: Step functions, pulse dummies, ramp effects - **Calendar effects**: Trading days, holidays, leap years - **Explanatory variables**: Economic indicators, weather, etc. #### Time-Varying Coefficients (Optional) $$ \beta_t = \beta_{t-1} + \xi_t, \quad \xi_t \sim N(0, \sigma_\xi^2) $$ ### 2.5 Irregular Component ($\varepsilon_t$) $$ \varepsilon_t \sim N(0, \sigma_\varepsilon^2) $$ - White noise (serially uncorrelated) - Captures measurement error and short-term fluctuations - Also called "observation noise" ## 3. State Space Representation ### 3.1 General Form Any structural time series model can be written in state space form: **Observation Equation:** $$ y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t) $$ **State Equation:** $$ \alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t) $$ Where: - $y_t$ — Observed data (scalar or vector) - $\alpha_t$ — State vector (unobserved components) - $Z_t$ — Observation matrix (links states to observations) - $T_t$ — Transition matrix (governs state evolution) - $R_t$ — Selection matrix - $H_t$ — Observation noise variance - $Q_t$ — State noise covariance matrix ### 3.2 Example: Local Linear Trend + Seasonal State vector: $$ \alpha_t = \begin{bmatrix} \mu_t \\ \nu_t \\ \gamma_t \\ \gamma_{t-1} \\ \vdots \\ \gamma_{t-s+2} \end{bmatrix} $$ ## 4. Estimation via Kalman Filter ### 4.1 Kalman Filter Recursions **Prediction Step:** $$ \begin{aligned} \alpha_{t|t-1} &= T_t \alpha_{t-1|t-1} \\ P_{t|t-1} &= T_t P_{t-1|t-1} T_t' + R_t Q_t R_t' \end{aligned} $$ **Update Step:** $$ \begin{aligned} v_t &= y_t - Z_t \alpha_{t|t-1} \quad \text{(prediction error)} \\ F_t &= Z_t P_{t|t-1} Z_t' + H_t \quad \text{(prediction error variance)} \\ K_t &= P_{t|t-1} Z_t' F_t^{-1} \quad \text{(Kalman gain)} \\ \alpha_{t|t} &= \alpha_{t|t-1} + K_t v_t \\ P_{t|t} &= (I - K_t Z_t) P_{t|t-1} \end{aligned} $$ Where: - $\alpha_{t|t-1}$ — Predicted state (prior) - $\alpha_{t|t}$ — Filtered state (posterior) - $P_{t|t-1}$ — Predicted state covariance - $P_{t|t}$ — Filtered state covariance ### 4.2 Kalman Smoother Refines estimates using full sample (backward pass): $$ \begin{aligned} \alpha_{t|n} &= \alpha_{t|t} + P_{t|t} T_{t+1}' P_{t+1|t}^{-1} (\alpha_{t+1|n} - \alpha_{t+1|t}) \\ P_{t|n} &= P_{t|t} + P_{t|t} T_{t+1}' P_{t+1|t}^{-1} (P_{t+1|n} - P_{t+1|t}) P_{t+1|t}^{-1} T_{t+1} P_{t|t} \end{aligned} $$ Where $n$ is the total number of observations. ## 5. Hyperparameter Estimation ### 5.1 Maximum Likelihood The log-likelihood is computed via prediction error decomposition: $$ \log L(\theta) = -\frac{n}{2} \log(2\pi) - \frac{1}{2} \sum_{t=1}^{n} \left( \log |F_t| + v_t' F_t^{-1} v_t \right) $$ Where: - $\theta$ — Vector of hyperparameters (variance terms) - $v_t$ — Prediction errors from Kalman filter - $F_t$ — Prediction error variances Optimization methods: - Quasi-Newton (BFGS, L-BFGS) - EM algorithm - Scoring algorithms ### 5.2 Bayesian Estimation $$ p(\theta | y_{1:n}) \propto p(y_{1:n} | \theta) \cdot p(\theta) $$ Common approaches: - **MCMC**: Gibbs sampling, Hamiltonian Monte Carlo - **Variational inference**: Faster approximation - **Integrated nested Laplace approximation (INLA)** Common priors: - Inverse-gamma for variance parameters - Half-Cauchy or half-normal for scale parameters ## 6. Model Selection and Diagnostics ### 6.1 Information Criteria $$ \begin{aligned} \text{AIC} &= -2 \log L + 2k \\ \text{BIC} &= -2 \log L + k \log n \\ \text{AICc} &= \text{AIC} + \frac{2k(k+1)}{n-k-1} \end{aligned} $$ Where $k$ is the number of hyperparameters. ### 6.2 Diagnostic Checks Standardized prediction errors should be: - **Zero mean**: $E[v_t / \sqrt{F_t}] = 0$ - **Unit variance**: $\text{Var}[v_t / \sqrt{F_t}] = 1$ - **Serially uncorrelated**: Check with Ljung-Box test - **Normally distributed**: Check with Jarque-Bera test ### 6.3 Auxiliary Residuals - **Observation residuals**: Detect outliers - **State residuals**: Detect structural breaks $$ \begin{aligned} e_t &= \frac{y_t - Z_t \alpha_{t|n}}{\sqrt{\text{Var}(y_t - Z_t \alpha_{t|n})}} \\ r_t &= \frac{\eta_t}{\sqrt{\text{Var}(\eta_t)}} \end{aligned} $$ ## 7. Comparison | Approach | Philosophy | Strengths | Limitations | |:---------|:-----------|:----------|:------------| | **ARIMA** | Reduced-form; models stationary transformations | Parsimonious, well-understood | Components not interpretable | | **Exponential Smoothing** | Weighted averages with decay | Simple, effective | Less flexible seasonality | | **Structural TS** | Explicit component decomposition | Interpretable, handles missing data | More parameters | | **Prophet** | Additive trend + seasonality + holidays | User-friendly | Less rigorous uncertainty | | **Deep Learning** | Learn patterns from data | Powerful with big data | Black box, data hungry | ## 8. Topics ### 8.1 Handling Missing Data The Kalman filter naturally handles missing observations: - When $y_t$ is missing, skip the update step - Prediction step proceeds normally - Smoother propagates information through gaps ### 8.2 Multivariate Extensions For vector $y_t \in \mathbb{R}^p$: $$ y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t) $$ Applications: - Common trends across multiple series - Factor models - Dynamic factor analysis ### 8.3 Non-Gaussian Extensions - **Student-t errors**: Heavy tails, robust to outliers - **Mixture models**: Regime switching - **Non-linear state space**: Extended Kalman filter, particle filters ## 9. Software Implementations ### R Packages ```r KFAS - Kalman Filter and Smoother library(KFAS) model <- SSModel(y ~ SSMtrend(2, Q = list(NA, NA)) + SSMseasonal(12, Q = NA), H = NA) fit <- fitSSM(model, inits = rep(0, 4)) bsts - Bayesian Structural Time Series library(bsts) ss <- AddLocalLinearTrend(list(), y) ss <- AddSeasonal(ss, y, nseasons = 12) model <- bsts(y, state.specification = ss, niter = 1000) dlm - Dynamic Linear Models library(dlm) build <- function(theta) { dlmModPoly(2, dV = exp(theta[1]), dW = exp(theta[2:3])) + dlmModSeas(12, dV = 0, dW = exp(theta[4])) } fit <- dlmMLE(y, parm = rep(0, 4), build = build) ``` ### Python ```python statsmodels from statsmodels.tsa.statespace.structural import UnobservedComponents model = UnobservedComponents( y, level='local linear trend', seasonal=12, stochastic_seasonal=True ) results = model.fit() TensorFlow Probability import tensorflow_probability as tfp trend = tfp.sts.LocalLinearTrend(observed_time_series=y) seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=y) model = tfp.sts.Sum([trend, seasonal], observed_time_series=y) ``` ## 11. Structural time series models Structural time series models provide: - **Interpretability**: Each component has clear economic/statistical meaning - **Flexibility**: Add/remove components based on domain knowledge - **Robustness**: Natural handling of missing data and irregular spacing - **Uncertainty quantification**: Full probability distributions for components and forecasts - **Intervention analysis**: Easy incorporation of known breaks and policy changes The state space framework unifies estimation, filtering, smoothing, and forecasting within a coherent probabilistic structure, making structural time series models a powerful tool for understanding and predicting temporal phenomena.
Recover 3D structure from 2D images.
Reconstruct 3D from video.
Geometric and topological descriptors.
Predefined sparsity patterns.
Generate outputs in specific formats (JSON XML code) using grammar constraints.
Structured logs (JSON) are searchable and parseable. Include context, metrics, request IDs.
Extract structured data from generation.
Structured output constrains generation to follow specified formats like JSON or schemas.
Structured perceptron extends the perceptron algorithm to structured prediction by updating on structured outputs rather than class labels.
Structured pruning removes entire channels layers or blocks enabling hardware-efficient acceleration.
Structured pruning removes entire channels or heads. Hardware-friendly. Less flexible than unstructured.
Remove entire channels layers or attention heads.
Represent data with explicit structure.
Structured Support Vector Machines extend SVM framework to structured outputs using max-margin training with structured loss functions.
Stuck-at faults model defects where circuit nodes are permanently fixed at logic zero or one independent of circuit inputs.
Signal permanently stuck at 0 or 1.
Stuck-open faults represent transistors failing to conduct detected through elevated IDDQ or two-pattern testing.
Student is smaller model learning from larger teacher. Teacher provides richer signal than hard labels.
EMA teacher for student training.
AI tutoring explains concepts. Personalized pace, examples.
Self-Test Using MISR and Parallel SRSG provides BIST architecture.
Match style statistics.
Combine styles from different images.
Style mixing combines latent codes at different scales creating hybrid generations.
Match style of reference image.
Style transfer in diffusion models adapts content to reference styles through conditioning.
Apply artistic style from one image to content of another.
Generate content in specific artistic styles.
Style instructions control output: formal/casual, brief/detailed, bullet/prose. Customize communication.
Style-based generator.