← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 5 of 200 (9,967 entries)

agent orchestration,multi-agent

Framework to coordinate multiple specialized agents working on subtasks.

agent protocol, ai agents

Agent protocols standardize interfaces for agent interoperability.

agent stopping criteria, ai agents

Stopping criteria define conditions when agents should terminate execution.

agent-based modeling, digital manufacturing

Model fab using autonomous agents.

agent,tool,use tools,tool calling

An AI agent can call tools (APIs, DBs, code) based on the conversation. LLM plans, picks tools, reads results, responds with updated knowledge.

agentbench, ai agents

AgentBench provides comprehensive evaluation framework for LLM-based agents.

agentic rag,rag

RAG system where agent decides when to retrieve what queries to use and how to synthesize.

aggregate functions, graph neural networks

Aggregate functions in GNNs combine neighbor information using operations like sum mean max or attention.

aggregation strategy, recommendation systems

Aggregation strategies combine individual preferences into group recommendations through averaging or consensus.

aging monitor,reliability

Track degradation over time.

aging-aware timing analysis, design

Include degradation in timing.

agv (automated guided vehicle),agv,automated guided vehicle,automation

Mobile robot that transports wafers or materials on fab floor.

agv routing, agv, facility

Optimize AGV paths.

ai act,regulation,eu

EU AI Act regulates AI by risk level. High-risk requires compliance. Global regulatory trend.

ai bill of rights,ethics

Framework for protecting people from algorithmic harm.

ai feedback, ai, training techniques

AI feedback uses model-generated evaluations to train or align other models.

ai supercomputers, ai, infrastructure

Purpose-built systems for AI training.

aider,pair,programming

Aider is AI pair programming in terminal. Edit files with LLM.

aims, aims, lithography

Tool for aerial image inspection.

air bearing table,metrology

Ultra-stable surface for metrology.

air changes per hour (ach),air changes per hour,ach,facility

Number of times cleanroom air is completely replaced per hour.

air gap,beol

Use air (k=1) as insulator between metal lines for lowest capacitance.

air shower,facility

Enclosed space that blows high-velocity air to remove particles before cleanroom entry.

airborne molecular contamination, amc, contamination

Gaseous contaminants.

airflow,orchestration,dag

Apache Airflow orchestrates data pipelines. DAGs define dependencies. Standard for ETL.

airgap, process integration

Airgaps introduce air k equals one between metal lines providing the lowest possible dielectric constant reducing capacitance and crosstalk.

airl, adversarial inverse reinforcement learning, inverse rl, imitation learning, reward recovery, expert demonstrations, adversarial training

# Adversarial Inverse Reinforcement Learning (AIRL) ## AIRL **AIRL** (Adversarial Inverse Reinforcement Learning) is an advanced algorithm that combines inverse reinforcement learning with adversarial training to recover reward functions from expert demonstrations. ## The Core Problem AIRL Solves Traditional **Inverse Reinforcement Learning (IRL)** aims to recover a reward function from expert demonstrations. The fundamental challenges include: - **Reward ambiguity**: Many different reward functions can explain the same observed behavior - **Computational expense**: Requires solving an RL problem in an inner loop - **Poor scalability**: Struggles with high-dimensional problems - **Dynamics dependence**: Learned rewards often don't transfer to new environments ## Mathematical Formulation ### Discriminator Architecture The discriminator in AIRL has a specifically structured form: $$ D_\theta(s, a, s') = \frac{\exp(f_\theta(s, a, s'))}{\exp(f_\theta(s, a, s')) + \pi(a|s)} $$ Where: - $s$ = current state - $a$ = action taken - $s'$ = next state - $\pi(a|s)$ = policy probability - $f_\theta$ = learned function (detailed below) ### Reward-Shaping Decomposition The function $f_\theta$ is decomposed as: $$ f_\theta(s, a, s') = g_\theta(s, a) + \gamma h_\phi(s') - h_\phi(s) $$ | Component | Description | Role | |-----------|-------------|------| | $g_\theta(s, a)$ | Reward approximator | Transferable reward signal | | $h_\phi(s)$ | Shaping potential | Captures dynamics-dependent info | | $\gamma$ | Discount factor | Temporal discounting (typically 0.99) | ### State-Only Reward Variant For better transfer, use state-only rewards: $$ f_\theta(s, s') = g_\theta(s) + \gamma h_\phi(s') - h_\phi(s) $$ ## Training Algorithm ### Objective Functions **Discriminator Loss** (minimize): $$ \mathcal{L}_D = -\mathbb{E}_{\tau_E}\left[\log D_\theta(s, a, s')\right] - \mathbb{E}_{\tau_\pi}\left[\log(1 - D_\theta(s, a, s'))\right] $$ Where: - $\tau_E$ = expert trajectories - $\tau_\pi$ = policy-generated trajectories **Generator (Policy) Objective** (maximize): $$ \mathcal{L}_\pi = \mathbb{E}_{\tau_\pi}\left[\sum_{t=0}^{T} \gamma^t \log D_\theta(s_t, a_t, s_{t+1})\right] $$ ### Training Loop Pseudocode ```python AIRL Training Loop for iteration in range(max_iterations): Step 1: Sample trajectories from current policy policy_trajectories = sample_trajectories(policy, env, n_samples) Step 2: Update Discriminator for d_step in range(discriminator_steps): expert_batch = sample_batch(expert_demonstrations) policy_batch = sample_batch(policy_trajectories) Discriminator predictions D_expert = discriminator(expert_batch) D_policy = discriminator(policy_batch) Binary cross-entropy loss loss_D = -torch.mean(torch.log(D_expert)) \ -torch.mean(torch.log(1 - D_policy)) optimizer_D.zero_grad() loss_D.backward() optimizer_D.step() Step 3: Compute rewards for policy update rewards = torch.log(D_policy) - torch.log(1 - D_policy) Step 4: Update Policy (using PPO, TRPO, etc.) policy.update(policy_trajectories, rewards) ``` ## Theoretical Properties ### 1. Reward Recovery Guarantees At optimality, under ergodicity and sufficient expressiveness: $$ g_\theta(s, a) \rightarrow A^*(s, a) = Q^*(s, a) - V^*(s) $$ Or for state-only rewards: $$ g_\theta(s) \rightarrow r^*(s) $$ This recovers the **ground-truth reward** up to a constant. ### 2. Disentanglement Theorem The decomposition separates: $$ \underbrace{f_\theta(s, a, s')}_{\text{Full signal}} = \underbrace{g_\theta(s, a)}_{\text{Reward (transferable)}} + \underbrace{\gamma h_\phi(s') - h_\phi(s)}_{\text{Shaping (dynamics-dependent)}} $$ **Key insight**: Potential-based shaping ($\gamma h(s') - h(s)$) does not change the optimal policy, so $g_\theta$ captures the "true" reward. ### 3. Connection to Maximum Entropy IRL AIRL approximates MaxEnt IRL: $$ \max_\theta \mathbb{E}_{\tau_E}\left[\sum_t r_\theta(s_t, a_t)\right] + \mathcal{H}(\pi) $$ Where $\mathcal{H}(\pi)$ is the policy entropy. AIRL achieves this without the expensive inner-loop policy optimization. ## Comparison | Method | Recovers Reward | Dynamics-Invariant | Scalable | Sample Efficiency | |--------|-----------------|-------------------|----------|-------------------| | Behavioral Cloning | ❌ No | N/A | ✅ Yes | ✅ High | | GAIL | ❌ No (policy only) | ❌ No | ✅ Yes | ⚠️ Medium | | MaxEnt IRL | ✅ Yes | ⚠️ Partially | ❌ No | ❌ Low | | **AIRL** | ✅ **Yes** | ✅ **Yes** | ✅ **Yes** | ⚠️ Medium | ### GAIL vs AIRL **GAIL Discriminator**: $$ D_\theta^{GAIL}(s, a) = \sigma(f_\theta(s, a)) $$ **AIRL Discriminator**: $$ D_\theta^{AIRL}(s, a, s') = \frac{\exp(f_\theta(s, a, s'))}{\exp(f_\theta(s, a, s')) + \pi(a|s)} $$ The key difference: AIRL's structure enables reward recovery; GAIL's does not. ## Implementation Details ### Network Architecture ```python import torch import torch.nn as nn class AIRLDiscriminator(nn.Module): """ AIRL Discriminator with reward-shaping decomposition. """ def __init__(self, state_dim, action_dim, hidden_dim=256, gamma=0.99, state_only=True): super().__init__() self.gamma = gamma self.state_only = state_only Reward network g(s) or g(s,a) if state_only: self.g_net = nn.Sequential( nn.Linear(state_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) else: self.g_net = nn.Sequential( nn.Linear(state_dim + action_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) Shaping potential h(s) self.h_net = nn.Sequential( nn.Linear(state_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) def get_reward(self, states, actions=None): """Extract the learned reward g(s) or g(s,a).""" if self.state_only: return self.g_net(states) else: sa = torch.cat([states, actions], dim=-1) return self.g_net(sa) def forward(self, states, actions, next_states, log_pi, dones): """ Compute f(s,a,s') = g(s,a) + gamma*h(s') - h(s) Args: states: Current states [batch, state_dim] actions: Actions taken [batch, action_dim] next_states: Next states [batch, state_dim] log_pi: Log probability of actions [batch, 1] dones: Episode termination flags [batch, 1] Returns: D(s,a,s'): Discriminator output [batch, 1] """ Reward component g = self.get_reward(states, actions) Shaping component h_s = self.h_net(states) h_s_next = self.h_net(next_states) f = g + gamma*h(s') - h(s), with masking for terminal states shaping = self.gamma * (1 - dones) * h_s_next - h_s f = g + shaping D(s,a,s') = exp(f) / (exp(f) + pi(a|s)) In log space: D = sigmoid(f - log_pi) log_D = f - log_pi D = torch.sigmoid(log_D) return D, f, g ``` ### Hyperparameters ```python Recommended hyperparameters for AIRL config = { Environment "gamma": 0.99, # Discount factor Networks "hidden_dim": 256, # Hidden layer size "n_hidden_layers": 2, # Number of hidden layers "state_only_reward": True, # Use g(s) instead of g(s,a) Training "batch_size": 256, # Batch size for updates "discriminator_lr": 3e-4, # Discriminator learning rate "policy_lr": 3e-4, # Policy learning rate "discriminator_steps": 1, # D updates per policy update Regularization "gradient_penalty_coef": 10.0, # Gradient penalty (optional) "entropy_coef": 0.01, # Policy entropy bonus Data "n_expert_trajectories": 50, # Number of expert demos "samples_per_iteration": 2048, # Policy samples per iteration } ``` ## Practical Considerations ### Advantages - **Reward transfer**: Learned $g_\theta$ transfers to new dynamics - **Interpretability**: Explicit reward function for analysis - **Data efficiency**: Better than BC with limited demonstrations - **Theoretical grounding**: Provable reward recovery guarantees ### Challenges - **Training instability**: GAN-like adversarial dynamics - **Hyperparameter sensitivity**: Requires careful tuning - **Discriminator overfitting**: Can memorize expert data - **Absorbing states**: Terminal states need special handling ### Stability Tricks ```python 1. Gradient Penalty (from WGAN-GP) def gradient_penalty(discriminator, expert_data, policy_data): alpha = torch.rand(expert_data.size(0), 1) interpolated = alpha * expert_data + (1 - alpha) * policy_data interpolated.requires_grad_(True) d_interpolated = discriminator(interpolated) gradients = torch.autograd.grad( outputs=d_interpolated, inputs=interpolated, grad_outputs=torch.ones_like(d_interpolated), create_graph=True )[0] gradient_norm = gradients.norm(2, dim=1) penalty = ((gradient_norm - 1) ** 2).mean() return penalty 2. Spectral Normalization from torch.nn.utils import spectral_norm layer = spectral_norm(nn.Linear(256, 256)) 3. Label Smoothing expert_labels = 0.9 # Instead of 1.0 policy_labels = 0.1 # Instead of 0.0 ``` ## Extensions and Variants ### 1. FAIRL (Forward Adversarial IRL) Corrects for state distribution shift: $$ r_{FAIRL}(s, a) = r_{AIRL}(s, a) - \log \pi(a|s) $$ ### 2. Off-Policy AIRL Uses replay buffer for sample efficiency: $$ \mathcal{L}_D = -\mathbb{E}_{\tau_E}[\log D] - \mathbb{E}_{\mathcal{B}}[\rho(s,a) \log(1-D)] $$ Where $\rho(s,a)$ is an importance weight. ### 3. Multi-Task AIRL Learns shared reward structure across tasks: $$ g_\theta(s, a) = g_{shared}(s, a) + g_{task}(s, a) $$ ## When to Use AIRL ### Good Fit ✅ - Need the **reward function**, not just the policy - Want to **transfer behavior** to different dynamics - Have **limited but high-quality** demonstrations - **Interpretability** of learned behavior matters ### Consider Alternatives - Only need to **match behavior** → Use GAIL (simpler) - Have **abundant demonstrations** → BC might suffice - **Reward function is known** → Use standard RL - Need **real-time performance** → BC is faster ## AIRL AIRL provides a principled approach to learning **transferable reward functions** from demonstrations by: 1. Using a **structured discriminator** that separates reward from dynamics 2. Leveraging **adversarial training** for scalability 3. Providing **theoretical guarantees** on reward recovery 4. Enabling **reward transfer** across different environments The key equation to remember: $$ \boxed{f_\theta(s, a, s') = g_\theta(s, a) + \gamma h_\phi(s') - h_\phi(s)} $$ Where $g_\theta$ is your transferable reward signal.

alarm management,automation

Monitor and respond to tool alarms via automation system.

albert,foundation model

Lighter BERT using parameter sharing and factorization.

albumentations,fast,image

Albumentations is fast image augmentation library. Many transforms.

ald (atomic layer deposition),ald,atomic layer deposition,cvd

Sequential self-limiting surface reactions for atomic-level thickness control.

ald cycle,cvd

One precursor pulse + purge + reactant pulse + purge.

aleatoric uncertainty, ai safety

Aleatoric uncertainty comes from inherent randomness irreducible even with perfect knowledge.

aleatoric uncertainty,ai safety

Inherent randomness in data.

alert configuration,monitoring

Set thresholds and notifications for issues.

alerting,pagerduty,oncall

Alerts notify on-call when issues occur. PagerDuty, OpsGenie. Escalation policies.

alias structure,doe

Which effects are confounded.

alias-free gan, multimodal ai

Alias-free GANs eliminate coordinate-dependent artifacts through continuous signal processing.

alibi (attention with linear biases),alibi,attention with linear biases,transformer

Simple relative position encoding.

aligner, manufacturing operations

Aligners position wafers accurately using notch or flat detection.

aligner,automation

Mechanism to orient wafer notch or flat to a standard position.

alignment marks,lithography

Reference patterns on wafer used to align each layer.

alignment tax,capability tradeoff,tradeoff

Alignment tax: safety measures may reduce capability. Balance helpfulness and harmlessness.

alignment,rlhf,dpo,preferences

Alignment = making models follow human values and instructions. RLHF/DPO leverage human preference data to push the model toward desired behavior.

all-mlp architectures, computer vision

Vision models without convolution or attention.

all-reduce operation, distributed training

Efficiently aggregate across nodes.

all-to-all communication, distributed training

Exchange data between all devices.

allegro, chemistry ai

Fast equivariant neural network.

allegro, graph neural networks

Allegro achieves fast equivariant message passing through strict locality and efficient tensor operations.

allocation,industry

Distribute limited supply among customers.