← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 200 (9,967 entries)

{311} defects, process

Rod-like defects from implant damage.

1-bit sgd, distributed training

Extreme quantization for gradients.

2.5d extraction, 2.5d, signal & power integrity

2.5D field solvers approximate 3D effects with efficiency suitable for large interconnect extraction.

2.5d packaging, 2.5d, advanced packaging

Use interposer for multiple dies.

2d sinusoidal position encoding, 2d, computer vision

Extend 1D sinusoidal to 2D.

3-sigma yield,manufacturing

99.7% of parts within 3 standard deviations.

3d afm, 3d, metrology

Three-dimensional atomic force microscopy.

3d cnns for video, 3d, video understanding

Convolutional networks with temporal dimension.

3d field solver, 3d, signal & power integrity

3D field solvers accurately compute electromagnetic fields in complex structures at higher computational cost.

3d gaussian primitives, 3d, 3d vision

Explicit 3D scene representation.

3d gaussian splatting,computer vision

Efficient 3D scene representation.

3d gaussian, 3d, multimodal ai

3D Gaussians are primitives with position scale rotation and color rendered through splatting.

3d generation,nerf,gaussian

AI 3D generation from images. NeRFs, Gaussian Splatting. Photorealistic 3D scenes.

3d integration,advanced packaging

Stack multiple dies vertically with interconnects.

3d scene reconstruction,computer vision

Build 3D models from images.

3d shape generation, 3d, 3d vision

Generate 3D objects.

3d stacking via bonding, 3d, advanced packaging

Build 3D structures by bonding multiple wafers.

3d-aware generation, 3d vision

Generate with 3D understanding.

3d,mesh,point cloud,nerf

3D generation from text/images is emerging. NeRFs, point clouds, mesh generation. Early but advancing fast.

4d scene understanding, 4d, 3d vision

3D space plus time.

4d-stem, 4d-stem, metrology

Record full diffraction pattern at each position.

5 why analysis,quality

Root cause technique asking why repeatedly.

5 whys for equipment, production

Iterative questioning technique.

5s methodology, 5s, manufacturing operations

5S organizes workplaces through Sort Set in order Shine Standardize and Sustain improving efficiency.

6-sigma yield,manufacturing

99.99966% yield target.

8d problem solving, 8d, quality

Structured problem-solving method.

8d problem solving, 8d, quality & reliability

8D methodology provides structured approach to problem solving with team-based root cause analysis.

8d report (eight disciplines),8d report,eight disciplines,quality

Problem-solving methodology.

a-optimal design, doe

Minimize average variance.

a/b test generation,content creation

Create variants for testing.

a/b testing for models,mlops

Deploy multiple model versions and compare performance.

a/b testing,evaluation

Compare two model versions by showing different outputs to users.

a3 problem solving, a3, quality

One-page problem-solving report.

a3c, a3c, reinforcement learning

Parallel asynchronous RL.

a3c, asynchronous advantage actor critic, reinforcement learning advanced, actor critic, asynchronous rl, deepmind, advanced rl

# A3C Asynchronous Advantage Actor-Critic ## A3C A3C (Asynchronous Advantage Actor-Critic) was introduced by DeepMind in 2016 and represented a paradigm shift in deep reinforcement learning by solving several fundamental problems simultaneously: - Sample inefficiency - Training instability - The need for expensive hardware (GPUs with large replay buffers) ## Core Architecture ### The Key Insight Instead of using experience replay (as in DQN), A3C achieves decorrelated training data through **parallelism**: - Multiple agents interact with separate environment instances simultaneously - Each agent contributes gradients to a shared global network - Temporal correlation is broken without requiring replay buffers ## The Three A's Explained ### 1. Asynchronous Multiple worker threads run in parallel, each with: - Its own copy of the environment - A local copy of the policy network - Independent exploration trajectories Workers periodically sync with a global network: - **Push**: Send computed gradients to global network - **Pull**: Receive updated parameters from global network ### 2. Advantage Rather than using raw returns or Q-values, A3C uses the **advantage function**: $$ A(s, a) = Q(s, a) - V(s) $$ In practice, the advantage is estimated using n-step returns: $$ A_t = \sum_{i=0}^{k-1} \gamma^i r_{t+i} + \gamma^k V(s_{t+k}) - V(s_t) $$ Where: - $\gamma$ = discount factor - $r_{t+i}$ = reward at timestep $t+i$ - $V(s)$ = value function estimate - $k$ = number of steps (typically 5 or 20) **Why use advantage?** - Reduces variance significantly compared to REINFORCE - Maintains unbiased gradients - Tells us: "How much better was this action than expected on average?" ### 3. Actor-Critic Two components share a neural network backbone: | Component | Output | Role | |-----------|--------|------| | **Actor** ($\pi$) | Action probabilities | Policy improvement | | **Critic** ($V$) | State value estimate | Variance reduction via baseline | The shared representation allows feature learning to benefit both objectives. ## Loss Function The total loss combines three terms: $$ L_{total} = L_{policy} + c_1 \cdot L_{value} + c_2 \cdot L_{entropy} $$ ### Policy Loss (Actor) $$ L_{policy} = -\log \pi(a_t | s_t) \cdot A_t $$ Where: - $\pi(a_t | s_t)$ = probability of taking action $a_t$ in state $s_t$ - $A_t$ = advantage estimate at time $t$ ### Value Loss (Critic) $$ L_{value} = \frac{1}{2}(R_t - V(s_t))^2 $$ Where: - $R_t$ = discounted return (target) - $V(s_t)$ = predicted state value ### Entropy Bonus (Exploration) $$ L_{entropy} = -\sum_{a} \pi(a | s) \log \pi(a | s) $$ Purpose: - Prevents premature convergence to deterministic policies - Encourages exploration - Typical coefficient: $c_2 = 0.01$ ## N-Step Returns A3C uses n-step bootstrapping for return estimation: $$ R_t = \sum_{i=0}^{k-1} \gamma^i r_{t+i} + \gamma^k V(s_{t+k}) $$ This balances: - **Bias**: From bootstrapping with imperfect $V$ - **Variance**: From Monte Carlo-style long rollouts Common choices: - $n = 5$ for faster updates - $n = 20$ for more accurate returns ## Algorithm Pseudocode ``` Global shared parameters: θ (policy), θ_v (value) Global shared counter: T = 0 Maximum timesteps: T_max For each worker thread: Initialize thread step counter: t = 1 Initialize local parameters: θ' = θ, θ'_v = θ_v Repeat: Reset gradients: dθ = 0, dθ_v = 0 Synchronize: θ' = θ, θ'_v = θ_v t_start = t Get state: s_t Repeat: Perform a_t according to π(a_t | s_t; θ') Receive reward r_t and new state s_{t+1} t = t + 1 T = T + 1 Until terminal s_t OR t - t_start == t_max R = 0 if terminal else V(s_t; θ'_v) For i in {t-1, ..., t_start}: R = r_i + γ * R Accumulate gradients for π: dθ += ∇_{θ'} log π(a_i | s_i; θ') * (R - V(s_i; θ'_v)) Accumulate gradients for V: dθ_v += ∂(R - V(s_i; θ'_v))² / ∂θ'_v Perform asynchronous update of θ using dθ Perform asynchronous update of θ_v using dθ_v Until T > T_max ``` ## Network Architecture ### Shared Backbone ``` - Input State s │ ▼ ┌─────────────────┐ │ Conv Layer 1 │ (if image input) │ 32 filters 8x8 │ │ stride 4, ReLU │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Conv Layer 2 │ │ 64 filters 4x4 │ │ stride 2, ReLU │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Conv Layer 3 │ │ 64 filters 3x3 │ │ stride 1, ReLU │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Fully Connected│ │ 512 units, ReLU│ └────────┬────────┘ │ ┌────┴────┐ │ │ ▼ ▼ ┌───────┐ ┌───────┐ │ Actor │ │Critic │ │ π(s) │ │ V(s) │ │Softmax│ │Linear │ └───────┘ └───────┘ ``` ### Output Specifications **Actor (Policy) Head:** $$ \pi(a | s) = \text{softmax}(W_\pi \cdot h + b_\pi) $$ **Critic (Value) Head:** $$ V(s) = W_v \cdot h + b_v $$ Where $h$ is the shared hidden representation. ## Continuous Action Spaces For continuous control, the actor outputs parameters of a Gaussian distribution: $$ \pi(a | s) = \mathcal{N}(\mu(s), \sigma(s)^2) $$ Where: - $\mu(s)$ = mean action (network output) - $\sigma(s)$ = standard deviation (can be learned or fixed) **Sampling:** $$ a = \mu(s) + \sigma(s) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1) $$ **Log probability:** $$ \log \pi(a | s) = -\frac{(a - \mu(s))^2}{2\sigma(s)^2} - \log(\sigma(s)) - \frac{1}{2}\log(2\pi) $$ ## Hyperparameters | Parameter | Typical Value | Description | |-----------|---------------|-------------| | $\gamma$ | 0.99 | Discount factor | | Learning rate | $10^{-4}$ to $7 \times 10^{-4}$ | Step size for optimization | | $n$-step | 5 or 20 | Steps before bootstrapping | | Entropy coef ($c_2$) | 0.01 | Exploration encouragement | | Value coef ($c_1$) | 0.5 | Value loss weight | | Max grad norm | 40 | Gradient clipping threshold | | Workers | 16 | Number of parallel threads | ## Comparison: Before and After A3C ### Before A3C (DQN Era) - Required massive replay buffers (millions of transitions) - GPU-bound, single environment - Off-policy complications - Discrete actions only (without modifications) - Memory intensive: $O(10^6)$ transitions stored ### A3C's Contributions - **CPU-friendly**: Runs efficiently on multi-core CPUs - **On-policy**: Simpler, more stable gradients - **Continuous actions**: Natural extension via Gaussian policies - **Faster wall-clock training**: Parallelism compensates for sample inefficiency - **Memory efficient**: No replay buffer needed ## Limitations and Evolution | Limitation | Successor | Solution | |------------|-----------|----------| | Noisy gradients from async updates | A2C | Synchronous updates | | Sample inefficiency (on-policy) | IMPALA | V-trace off-policy correction | | No trust region | PPO/TRPO | Clipped/constrained updates | | Hyperparameter sensitivity | PPO | Robust clipped objective | ### A2C (Synchronous Version) - Waits for all workers to finish before updating - Enables GPU batching - Often matches or beats A3C - Simpler to implement and debug ### PPO (Proximal Policy Optimization) Clipped surrogate objective: $$ L^{CLIP}(\theta) = \mathbb{E}_t \left[ \min \left( r_t(\theta) \hat{A}_t, \text{clip}(r_t(\theta), 1-\epsilon, 1+\epsilon) \hat{A}_t \right) \right] $$ Where: $$ r_t(\theta) = \frac{\pi_\theta(a_t | s_t)}{\pi_{\theta_{old}}(a_t | s_t)} $$ ## Generalized Advantage Estimation (GAE) An improvement often used with A3C/A2C: $$ \hat{A}_t^{GAE(\gamma, \lambda)} = \sum_{l=0}^{\infty} (\gamma \lambda)^l \delta_{t+l} $$ Where the TD residual is: $$ \delta_t = r_t + \gamma V(s_{t+1}) - V(s_t) $$ Properties: - $\lambda = 0$: One-step TD (high bias, low variance) - $\lambda = 1$: Monte Carlo (low bias, high variance) - $\lambda = 0.95$: Common choice balancing bias-variance ## Implementation Considerations ### Gradient Clipping Essential for stability: ```python # Clip by global norm grad_norm = torch.nn.utils.clip_grad_norm_( model.parameters(), max_norm=40.0 ) ``` ### Shared Optimizer State Use optimizers that handle asynchronous updates: - **RMSprop** (original paper) - **Adam** with shared statistics ### Environment Normalization Normalize observations and rewards: $$ \hat{s} = \frac{s - \mu_s}{\sigma_s + \epsilon} $$ $$ \hat{r} = \frac{r}{\sigma_r + \epsilon} $$ ## When to Use A3C Today ### Good Use Cases - Many CPU cores but limited GPU - Environment simulation is the bottleneck - Need continuous control with minimal tuning - Teaching/learning RL fundamentals - Rapid prototyping ### Better Alternatives - **PPO**: Most practical applications (robust, simple) - **SAC**: Continuous control with sample efficiency - **IMPALA**: Large-scale distributed training - **DreamerV3**: Model-based with better sample efficiency ## Mathematical ### Core Equations **Policy Gradient Theorem:** $$ \nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \nabla_\theta \log \pi_\theta(a|s) \cdot A^{\pi_\theta}(s, a) \right] $$ **Advantage Function:** $$ A^\pi(s, a) = Q^\pi(s, a) - V^\pi(s) $$ **Bellman Equation for V:** $$ V^\pi(s) = \mathbb{E}_{a \sim \pi} \left[ r(s, a) + \gamma V^\pi(s') \right] $$ **Total Loss:** $$ L(\theta) = -\mathbb{E}_t \left[ \log \pi_\theta(a_t|s_t) A_t \right] + c_1 \mathbb{E}_t \left[ (V_\theta(s_t) - R_t)^2 \right] - c_2 H(\pi_\theta(\cdot|s_t)) $$ ## Card ``` - ┌─────────────────────────────────────────────────────────┐ │ A3C QUICK REFERENCE │ ├─────────────────────────────────────────────────────────┤ │ Algorithm Type: On-policy, Actor-Critic │ │ Action Space: Discrete or Continuous │ │ Parallelization: Asynchronous multi-threading │ │ Memory: No replay buffer │ │ Hardware: CPU-friendly │ ├─────────────────────────────────────────────────────────┤ │ Key Hyperparameters: │ │ γ (discount) = 0.99 │ │ lr = 1e-4 to 7e-4 │ │ n-step = 5 or 20 │ │ entropy_coef = 0.01 │ │ value_coef = 0.5 │ │ max_grad_norm = 40 │ │ num_workers = 16 │ ├─────────────────────────────────────────────────────────┤ │ Loss = -log(π) * A + 0.5 * (R - V)² - 0.01 * H(π) │ └─────────────────────────────────────────────────────────┘ ```

ab initio simulation, simulation

First-principles quantum mechanical calculations.

abc analysis, abc, supply chain & logistics

ABC analysis categorizes inventory by value and usage prioritizing management attention on high-value items contributing most to costs.

abductive reasoning,reasoning

Infer most likely explanation.

aberration-corrected tem, metrology

Ultra-high resolution TEM with correctors.

ablation cam, explainable ai

Use ablation to generate activation maps.

ablation study,analysis,what matters

I can help you design ablations to see which components (data, features, modules) really drive performance.

ablation,experiment,study

Ablation studies isolate impact of each component. Remove or modify one thing, measure effect. Scientific rigor.

ablation,remove,contribution

Ablation removes or zeros components to measure contribution. Essential interpretability technique.

absolute grading,evaluation

Score single output on scale.

absorbing state diffusion, generative models

Diffusion where tokens gradually become mask tokens.

abstention,ai safety

Refuse to answer when uncertain.

abstract interpretation for neural networks, ai safety

Sound over-approximation of network behavior.

abstract interpretation,software engineering

Soundly approximate program behavior.

abtest,online eval,rollout,canary

Use A/B tests and canary rollouts to compare models safely. Start with a small traffic slice, check metrics + human feedback, then scale up.

ac parametric, ac, advanced test & probe

AC parametric tests characterize frequency-dependent behavior like capacitance propagation delay and switching characteristics.