← Back to AI Factory Chat

AI Factory Glossary

513 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 4 of 11 (513 entries)

adversarial training, at, ai safety

Train on adversarial examples.

adversarial training,ai safety

Train on adversarial examples to make model more robust.

adversarial training,robust,defense

Adversarial training includes adversarial examples in training. Makes models more robust. Expensive.

adversarial watermark removal,security

Attacks to remove watermarks.

adversarial weight perturbation, awp, ai safety

Perturb weights for robustness.

adverse event detection, healthcare ai

Find medication side effects in text.

aerial image inspection, lithography

Inspect simulated image from mask.

affective computing,emerging tech

AI that recognizes and responds to emotions.

affinity diagram, quality & reliability

Affinity diagrams group related ideas revealing patterns and themes.

afm (atomic force microscopy),afm,atomic force microscopy,metrology

Scan surface with sharp tip to measure topography at nanometer scale.

afm, afm, recommendation systems

Attentional Factorization Machines use attention weights to learn importance of different feature interactions.

aft, aft, llm architecture

Attention Free Transformer uses element-wise operations instead of attention.

agent approval, ai agents

Agent approval requires human confirmation before executing high-stakes actions.

agent benchmarking, ai agents

Agent benchmarking evaluates performance across standardized task suites.

agent communication, ai agents

Agent communication protocols enable information exchange and coordination between agents.

agent debugging, ai agents

Agent debugging identifies and resolves issues in planning and execution logic.

agent feedback loop, ai agents

Feedback loops allow humans to correct and guide agent behavior iteratively.

agent handoff, ai agents

Agent handoff transfers responsibility for tasks between agents smoothly.

agent logging, ai agents

Agent logging records decisions actions and reasoning for debugging and auditing.

agent loop, ai agents

Agent loops repeatedly observe plan act and update until objectives are achieved.

agent memory, ai agents

Agent memory maintains conversation history observations and learned information across interactions.

agent negotiation, ai agents

Agent negotiation resolves conflicts through offers counteroffers and compromise.

agent orchestration,multi-agent

Framework to coordinate multiple specialized agents working on subtasks.

agent protocol, ai agents

Agent protocols standardize interfaces for agent interoperability.

agent stopping criteria, ai agents

Stopping criteria define conditions when agents should terminate execution.

agent-based modeling, digital manufacturing

Model fab using autonomous agents.

agent,tool,use tools,tool calling

An AI agent can call tools (APIs, DBs, code) based on the conversation. LLM plans, picks tools, reads results, responds with updated knowledge.

agentbench, ai agents

AgentBench provides comprehensive evaluation framework for LLM-based agents.

agentic rag,rag

RAG system where agent decides when to retrieve what queries to use and how to synthesize.

aggregate functions, graph neural networks

Aggregate functions in GNNs combine neighbor information using operations like sum mean max or attention.

aggregation strategy, recommendation systems

Aggregation strategies combine individual preferences into group recommendations through averaging or consensus.

aging monitor,reliability

Track degradation over time.

aging-aware timing analysis, design

Include degradation in timing.

agv (automated guided vehicle),agv,automated guided vehicle,automation

Mobile robot that transports wafers or materials on fab floor.

agv routing, agv, facility

Optimize AGV paths.

ai act,regulation,eu

EU AI Act regulates AI by risk level. High-risk requires compliance. Global regulatory trend.

ai bill of rights,ethics

Framework for protecting people from algorithmic harm.

ai feedback, ai, training techniques

AI feedback uses model-generated evaluations to train or align other models.

ai supercomputers, ai, infrastructure

Purpose-built systems for AI training.

aider,pair,programming

Aider is AI pair programming in terminal. Edit files with LLM.

aims, aims, lithography

Tool for aerial image inspection.

air bearing table,metrology

Ultra-stable surface for metrology.

air changes per hour (ach),air changes per hour,ach,facility

Number of times cleanroom air is completely replaced per hour.

air gap,beol

Use air (k=1) as insulator between metal lines for lowest capacitance.

air shower,facility

Enclosed space that blows high-velocity air to remove particles before cleanroom entry.

airborne molecular contamination, amc, contamination

Gaseous contaminants.

airflow,orchestration,dag

Apache Airflow orchestrates data pipelines. DAGs define dependencies. Standard for ETL.

airgap, process integration

Airgaps introduce air k equals one between metal lines providing the lowest possible dielectric constant reducing capacitance and crosstalk.

airl, adversarial inverse reinforcement learning, inverse rl, imitation learning, reward recovery, expert demonstrations, adversarial training

# Adversarial Inverse Reinforcement Learning (AIRL) ## AIRL **AIRL** (Adversarial Inverse Reinforcement Learning) is an advanced algorithm that combines inverse reinforcement learning with adversarial training to recover reward functions from expert demonstrations. ## The Core Problem AIRL Solves Traditional **Inverse Reinforcement Learning (IRL)** aims to recover a reward function from expert demonstrations. The fundamental challenges include: - **Reward ambiguity**: Many different reward functions can explain the same observed behavior - **Computational expense**: Requires solving an RL problem in an inner loop - **Poor scalability**: Struggles with high-dimensional problems - **Dynamics dependence**: Learned rewards often don't transfer to new environments ## Mathematical Formulation ### Discriminator Architecture The discriminator in AIRL has a specifically structured form: $$ D_\theta(s, a, s') = \frac{\exp(f_\theta(s, a, s'))}{\exp(f_\theta(s, a, s')) + \pi(a|s)} $$ Where: - $s$ = current state - $a$ = action taken - $s'$ = next state - $\pi(a|s)$ = policy probability - $f_\theta$ = learned function (detailed below) ### Reward-Shaping Decomposition The function $f_\theta$ is decomposed as: $$ f_\theta(s, a, s') = g_\theta(s, a) + \gamma h_\phi(s') - h_\phi(s) $$ | Component | Description | Role | |-----------|-------------|------| | $g_\theta(s, a)$ | Reward approximator | Transferable reward signal | | $h_\phi(s)$ | Shaping potential | Captures dynamics-dependent info | | $\gamma$ | Discount factor | Temporal discounting (typically 0.99) | ### State-Only Reward Variant For better transfer, use state-only rewards: $$ f_\theta(s, s') = g_\theta(s) + \gamma h_\phi(s') - h_\phi(s) $$ ## Training Algorithm ### Objective Functions **Discriminator Loss** (minimize): $$ \mathcal{L}_D = -\mathbb{E}_{\tau_E}\left[\log D_\theta(s, a, s')\right] - \mathbb{E}_{\tau_\pi}\left[\log(1 - D_\theta(s, a, s'))\right] $$ Where: - $\tau_E$ = expert trajectories - $\tau_\pi$ = policy-generated trajectories **Generator (Policy) Objective** (maximize): $$ \mathcal{L}_\pi = \mathbb{E}_{\tau_\pi}\left[\sum_{t=0}^{T} \gamma^t \log D_\theta(s_t, a_t, s_{t+1})\right] $$ ### Training Loop Pseudocode ```python AIRL Training Loop for iteration in range(max_iterations): Step 1: Sample trajectories from current policy policy_trajectories = sample_trajectories(policy, env, n_samples) Step 2: Update Discriminator for d_step in range(discriminator_steps): expert_batch = sample_batch(expert_demonstrations) policy_batch = sample_batch(policy_trajectories) Discriminator predictions D_expert = discriminator(expert_batch) D_policy = discriminator(policy_batch) Binary cross-entropy loss loss_D = -torch.mean(torch.log(D_expert)) \ -torch.mean(torch.log(1 - D_policy)) optimizer_D.zero_grad() loss_D.backward() optimizer_D.step() Step 3: Compute rewards for policy update rewards = torch.log(D_policy) - torch.log(1 - D_policy) Step 4: Update Policy (using PPO, TRPO, etc.) policy.update(policy_trajectories, rewards) ``` ## Theoretical Properties ### 1. Reward Recovery Guarantees At optimality, under ergodicity and sufficient expressiveness: $$ g_\theta(s, a) \rightarrow A^*(s, a) = Q^*(s, a) - V^*(s) $$ Or for state-only rewards: $$ g_\theta(s) \rightarrow r^*(s) $$ This recovers the **ground-truth reward** up to a constant. ### 2. Disentanglement Theorem The decomposition separates: $$ \underbrace{f_\theta(s, a, s')}_{\text{Full signal}} = \underbrace{g_\theta(s, a)}_{\text{Reward (transferable)}} + \underbrace{\gamma h_\phi(s') - h_\phi(s)}_{\text{Shaping (dynamics-dependent)}} $$ **Key insight**: Potential-based shaping ($\gamma h(s') - h(s)$) does not change the optimal policy, so $g_\theta$ captures the "true" reward. ### 3. Connection to Maximum Entropy IRL AIRL approximates MaxEnt IRL: $$ \max_\theta \mathbb{E}_{\tau_E}\left[\sum_t r_\theta(s_t, a_t)\right] + \mathcal{H}(\pi) $$ Where $\mathcal{H}(\pi)$ is the policy entropy. AIRL achieves this without the expensive inner-loop policy optimization. ## Comparison | Method | Recovers Reward | Dynamics-Invariant | Scalable | Sample Efficiency | |--------|-----------------|-------------------|----------|-------------------| | Behavioral Cloning | ❌ No | N/A | ✅ Yes | ✅ High | | GAIL | ❌ No (policy only) | ❌ No | ✅ Yes | ⚠️ Medium | | MaxEnt IRL | ✅ Yes | ⚠️ Partially | ❌ No | ❌ Low | | **AIRL** | ✅ **Yes** | ✅ **Yes** | ✅ **Yes** | ⚠️ Medium | ### GAIL vs AIRL **GAIL Discriminator**: $$ D_\theta^{GAIL}(s, a) = \sigma(f_\theta(s, a)) $$ **AIRL Discriminator**: $$ D_\theta^{AIRL}(s, a, s') = \frac{\exp(f_\theta(s, a, s'))}{\exp(f_\theta(s, a, s')) + \pi(a|s)} $$ The key difference: AIRL's structure enables reward recovery; GAIL's does not. ## Implementation Details ### Network Architecture ```python import torch import torch.nn as nn class AIRLDiscriminator(nn.Module): """ AIRL Discriminator with reward-shaping decomposition. """ def __init__(self, state_dim, action_dim, hidden_dim=256, gamma=0.99, state_only=True): super().__init__() self.gamma = gamma self.state_only = state_only Reward network g(s) or g(s,a) if state_only: self.g_net = nn.Sequential( nn.Linear(state_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) else: self.g_net = nn.Sequential( nn.Linear(state_dim + action_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) Shaping potential h(s) self.h_net = nn.Sequential( nn.Linear(state_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 1) ) def get_reward(self, states, actions=None): """Extract the learned reward g(s) or g(s,a).""" if self.state_only: return self.g_net(states) else: sa = torch.cat([states, actions], dim=-1) return self.g_net(sa) def forward(self, states, actions, next_states, log_pi, dones): """ Compute f(s,a,s') = g(s,a) + gamma*h(s') - h(s) Args: states: Current states [batch, state_dim] actions: Actions taken [batch, action_dim] next_states: Next states [batch, state_dim] log_pi: Log probability of actions [batch, 1] dones: Episode termination flags [batch, 1] Returns: D(s,a,s'): Discriminator output [batch, 1] """ Reward component g = self.get_reward(states, actions) Shaping component h_s = self.h_net(states) h_s_next = self.h_net(next_states) f = g + gamma*h(s') - h(s), with masking for terminal states shaping = self.gamma * (1 - dones) * h_s_next - h_s f = g + shaping D(s,a,s') = exp(f) / (exp(f) + pi(a|s)) In log space: D = sigmoid(f - log_pi) log_D = f - log_pi D = torch.sigmoid(log_D) return D, f, g ``` ### Hyperparameters ```python Recommended hyperparameters for AIRL config = { Environment "gamma": 0.99, # Discount factor Networks "hidden_dim": 256, # Hidden layer size "n_hidden_layers": 2, # Number of hidden layers "state_only_reward": True, # Use g(s) instead of g(s,a) Training "batch_size": 256, # Batch size for updates "discriminator_lr": 3e-4, # Discriminator learning rate "policy_lr": 3e-4, # Policy learning rate "discriminator_steps": 1, # D updates per policy update Regularization "gradient_penalty_coef": 10.0, # Gradient penalty (optional) "entropy_coef": 0.01, # Policy entropy bonus Data "n_expert_trajectories": 50, # Number of expert demos "samples_per_iteration": 2048, # Policy samples per iteration } ``` ## Practical Considerations ### Advantages - **Reward transfer**: Learned $g_\theta$ transfers to new dynamics - **Interpretability**: Explicit reward function for analysis - **Data efficiency**: Better than BC with limited demonstrations - **Theoretical grounding**: Provable reward recovery guarantees ### Challenges - **Training instability**: GAN-like adversarial dynamics - **Hyperparameter sensitivity**: Requires careful tuning - **Discriminator overfitting**: Can memorize expert data - **Absorbing states**: Terminal states need special handling ### Stability Tricks ```python 1. Gradient Penalty (from WGAN-GP) def gradient_penalty(discriminator, expert_data, policy_data): alpha = torch.rand(expert_data.size(0), 1) interpolated = alpha * expert_data + (1 - alpha) * policy_data interpolated.requires_grad_(True) d_interpolated = discriminator(interpolated) gradients = torch.autograd.grad( outputs=d_interpolated, inputs=interpolated, grad_outputs=torch.ones_like(d_interpolated), create_graph=True )[0] gradient_norm = gradients.norm(2, dim=1) penalty = ((gradient_norm - 1) ** 2).mean() return penalty 2. Spectral Normalization from torch.nn.utils import spectral_norm layer = spectral_norm(nn.Linear(256, 256)) 3. Label Smoothing expert_labels = 0.9 # Instead of 1.0 policy_labels = 0.1 # Instead of 0.0 ``` ## Extensions and Variants ### 1. FAIRL (Forward Adversarial IRL) Corrects for state distribution shift: $$ r_{FAIRL}(s, a) = r_{AIRL}(s, a) - \log \pi(a|s) $$ ### 2. Off-Policy AIRL Uses replay buffer for sample efficiency: $$ \mathcal{L}_D = -\mathbb{E}_{\tau_E}[\log D] - \mathbb{E}_{\mathcal{B}}[\rho(s,a) \log(1-D)] $$ Where $\rho(s,a)$ is an importance weight. ### 3. Multi-Task AIRL Learns shared reward structure across tasks: $$ g_\theta(s, a) = g_{shared}(s, a) + g_{task}(s, a) $$ ## When to Use AIRL ### Good Fit ✅ - Need the **reward function**, not just the policy - Want to **transfer behavior** to different dynamics - Have **limited but high-quality** demonstrations - **Interpretability** of learned behavior matters ### Consider Alternatives - Only need to **match behavior** → Use GAIL (simpler) - Have **abundant demonstrations** → BC might suffice - **Reward function is known** → Use standard RL - Need **real-time performance** → BC is faster ## AIRL AIRL provides a principled approach to learning **transferable reward functions** from demonstrations by: 1. Using a **structured discriminator** that separates reward from dynamics 2. Leveraging **adversarial training** for scalability 3. Providing **theoretical guarantees** on reward recovery 4. Enabling **reward transfer** across different environments The key equation to remember: $$ \boxed{f_\theta(s, a, s') = g_\theta(s, a) + \gamma h_\phi(s') - h_\phi(s)} $$ Where $g_\theta$ is your transferable reward signal.

alarm management,automation

Monitor and respond to tool alarms via automation system.