← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 175 of 200 (9,967 entries)

tally sheet, quality & reliability

Tally sheets record event counts or observations for frequency analysis.

tan barrier,beol

Tantalum nitride barrier.

tap controller, tap, advanced test & probe

Test Access Port controller is a state machine that interprets boundary scan instructions and controls test data register operations.

tape out,gdsii,foundry

Tape-out: final design sent to foundry as GDSII file. Point of no return. Months to get chips back.

tape width, packaging

Standard widths 8mm 12mm 16mm etc.

tapeout, business & strategy

Tapeout completes design sending data for mask fabrication.

tapeout,design

Finalize design and send to foundry for mask making.

tarc (top arc),tarc,top arc,lithography

ARC layer on top of resist.

target encoding,mean,category

Target encoding uses target mean per category. Can leak information.

target impedance, signal & power integrity

Target impedance specifies maximum allowable power distribution network impedance versus frequency ensuring acceptable voltage ripple.

target speaker extraction, audio & speech

Target speaker extraction isolates specific speaker from mixture using enrollment utterance.

target thickness, process

Desired final wafer thickness.

target tracking, manufacturing operations

Target tracking adjusts process aims as equipment drifts maintaining centered output.

target value engineering, quality

Design to hit nominal not just meet specs.

target value, spc

Desired nominal value.

target,pvd

Source material being sputtered (metal alloy).

task allocation, ai agents

Task allocation assigns responsibilities to agents based on capabilities and load.

task and motion planning (tamp),task and motion planning,tamp,robotics

Combine high-level planning with motion.

task arithmetic, model merging

Add/subtract task vectors.

task balancing, multi-task learning

Ensure fair task representation.

task decomposition, ai agents

Task decomposition breaks complex goals into manageable subtasks.

task diversity, training techniques

Task diversity in instruction tuning exposes models to varied problem types.

task grouping, multi-task learning

Cluster related tasks.

task instruction, prompting techniques

Task instructions specify desired actions clearly guiding model behavior.

task interference, multi-task learning

Tasks hurting each other's performance.

task prompting, multi-task learning

Use prompts to specify task.

task recognition in icl, theory

Model identifying task from examples.

task routing, multi-task learning

Direct inputs to task-specific modules.

task sampling strategies, multi-task learning

How to sample tasks during training.

task similarity, multi-task learning

Relatedness between tasks.

task tokens, multi-task learning

Prepend task identifiers.

task-incremental learning,continual learning

Learn tasks sequentially with task labels at test time.

task-oriented dialogue, dialogue

Goal-driven conversations.

task-specific heads, multi-task learning

Separate output layers per task.

task-specific parameters, multi-task learning

Separate parameters per task.

task-specific pre-training, transfer learning

Pre-train for specific downstream task.

taskfile,yaml,runner

Taskfile defines tasks in YAML. Go-based task runner.

tasnet, audio & speech

TasNet performs end-to-end time-domain audio separation using temporal convolutional networks.

taylor expansion pruning, model optimization

Taylor expansion pruning approximates loss change from removing weights using Taylor series.

tbats, tbats, time series models

TBATS combines Box-Cox transformation Fourier seasonality ARMA errors and trend for complex seasonal time series.

tcad (technology cad),tcad,technology cad,design

Simulation tools for process and device physics.

tcad model parameters, tcad, simulation

Physical parameters used in device/process simulation.

tcn, tcn, time series models

Temporal Convolutional Networks use dilated causal convolutions for sequence modeling with long effective history.

td3, td3, reinforcement learning

Improved continuous control algorithm.

td3, twin delayed ddpg, reinforcement learning advanced, continuous control, actor critic, ddpg, advanced rl

# TD3: Twin Delayed Deep Deterministic Policy Gradient **Advanced Reinforcement Learning Algorithm for Continuous Control** ## Overview TD3, introduced by Fujimoto et al. (2018), addresses a fundamental problem in continuous control: **overestimation bias** in actor-critic methods. DDPG (Deep Deterministic Policy Gradient), while effective, suffers from significant value overestimation that compounds over training, leading to poor policies. ## Core Motivation: The Overestimation Problem ### Mathematical Formulation In Q-learning, the Bellman target is: $$ y = r + \gamma \max_{a'} Q(s', a') $$ The problem arises because: $$ \mathbb{E}\left[\max_{a'} \hat{Q}(s', a')\right] \geq \max_{a'} \mathbb{E}\left[\hat{Q}(s', a')\right] $$ Where $\hat{Q}$ is the estimated Q-function with approximation error. ### Why This Matters - Function approximation introduces noise: $\hat{Q}(s,a) = Q^*(s,a) + \epsilon$ - The $\max$ operator preferentially selects overestimated values - Errors propagate and amplify through bootstrapping - Policy exploits these overestimations, leading to divergence ## The Three Pillars of TD3 ### 1. Clipped Double Q-Learning TD3 maintains **two** critic networks $(Q_{\theta_1}, Q_{\theta_2})$ and uses the minimum for targets: $$ y = r + \gamma \min_{i=1,2} Q_{\theta'_i}(s', \tilde{a}') $$ Where: - $Q_{\theta'_1}, Q_{\theta'_2}$ are target networks - $\tilde{a}'$ is the smoothed target action **Loss function for each critic:** $$ \mathcal{L}(\theta_i) = \mathbb{E}_{(s,a,r,s') \sim \mathcal{D}} \left[ \left( Q_{\theta_i}(s,a) - y \right)^2 \right] $$ #### Key Insight | Method | Approach | Effect | |--------|----------|--------| | Double DQN | Decouples selection and evaluation | Reduces overestimation | | TD3 | Takes minimum of two estimates | More aggressive bias reduction | ### 2. Delayed Policy Updates The actor is updated **less frequently** than critics: $$ \theta_{\pi} \leftarrow \theta_{\pi} + \alpha \nabla_{\theta_\pi} J(\theta_\pi) \quad \text{every } d \text{ steps} $$ Where $d$ is typically 2. **Policy gradient:** $$ \nabla_{\theta_\pi} J(\theta_\pi) = \mathbb{E}_{s \sim \mathcal{D}} \left[ \nabla_a Q_{\theta_1}(s,a) \big|_{a=\pi_{\theta_\pi}(s)} \nabla_{\theta_\pi} \pi_{\theta_\pi}(s) \right] $$ #### Rationale - Policy updates depend on accurate value estimates - High-variance critic estimates cause policy divergence - Delayed updates allow critic to stabilize - Reduces the "moving target" problem ### 3. Target Policy Smoothing Noise is added to target actions: $$ \tilde{a}' = \pi_{\theta'_\pi}(s') + \epsilon, \quad \epsilon \sim \text{clip}(\mathcal{N}(0, \sigma), -c, c) $$ **Purpose:** $$ Q(s', a') \approx Q(s', a' + \epsilon) \quad \text{for small } \epsilon $$ This regularizes the Q-function, preventing exploitation of narrow peaks. ## Complete TD3 Algorithm ### Pseudocode ``` Initialize: - Critic networks Q_θ₁, Q_θ₂ - Actor network π_φ - Target networks θ'₁ ← θ₁, θ'₂ ← θ₂, φ' ← φ - Replay buffer D For each timestep t: 1. Select action with exploration: a ~ π_φ(s) + ε, ε ~ N(0, σ) 2. Execute a, observe r, s' 3. Store (s, a, r, s') in D 4. Sample mini-batch from D 5. Compute target: ã ← π_φ'(s') + clip(ε, -c, c), ε ~ N(0, σ̃) y ← r + γ min_{i=1,2} Q_θ'ᵢ(s', ã) 6. Update critics: θᵢ ← θᵢ - α∇_θᵢ (Q_θᵢ(s,a) - y)² 7. If t mod d = 0: - Update actor: φ ← φ + β∇_φ Q_θ₁(s, π_φ(s)) - Update targets: θ'ᵢ ← τθᵢ + (1-τ)θ'ᵢ φ' ← τφ + (1-τ)φ' ``` ### Hyperparameters | Parameter | Symbol | Typical Value | Description | |-----------|--------|---------------|-------------| | Discount factor | $\gamma$ | 0.99 | Future reward weighting | | Soft update rate | $\tau$ | 0.005 | Target network update rate | | Policy delay | $d$ | 2 | Critic updates per actor update | | Target noise | $\tilde{\sigma}$ | 0.2 | Smoothing noise std | | Noise clip | $c$ | 0.5 | Smoothing noise bounds | | Exploration noise | $\sigma$ | 0.1 | Action exploration std | | Batch size | $N$ | 256 | Mini-batch size | | Learning rate | $\alpha, \beta$ | 3e-4 | Network learning rates | ## Mathematical Deep Dive ### Overestimation Bias Analysis Let the true Q-value be $Q^*(s,a)$ and the estimate be: $$ \hat{Q}(s,a) = Q^*(s,a) + \epsilon(s,a) $$ Where $\epsilon(s,a)$ is zero-mean noise with variance $\sigma^2$. **Single estimator bias:** $$ \mathbb{E}\left[\max_a \hat{Q}(s,a)\right] - \max_a Q^*(s,a) \approx \sigma\sqrt{\frac{2\log n}{\pi}} $$ For $n$ actions sampled. **Double estimator (TD3) bias:** $$ \mathbb{E}\left[\min_{i=1,2} \hat{Q}_i(s,a)\right] \leq Q^*(s,a) $$ TD3 trades overestimation for slight underestimation, which is empirically more stable. ### Deterministic Policy Gradient The actor maximizes expected return: $$ J(\theta_\pi) = \mathbb{E}_{s \sim \rho^\pi} \left[ Q^{\pi}(s, \pi_{\theta_\pi}(s)) \right] $$ Gradient (Silver et al., 2014): $$ \nabla_{\theta_\pi} J(\theta_\pi) = \mathbb{E}_{s \sim \rho^\pi} \left[ \nabla_{\theta_\pi} \pi_{\theta_\pi}(s) \nabla_a Q^{\pi}(s,a) \big|_{a=\pi_{\theta_\pi}(s)} \right] $$ ## Comparison with Related Algorithms ### TD3 vs DDPG vs SAC | Aspect | DDPG | TD3 | SAC | |--------|------|-----|-----| | Number of critics | 1 | 2 (min) | 2 (min) | | Policy type | Deterministic | Deterministic | Stochastic | | Entropy regularization | ✗ | ✗ | ✓ | | Exploration method | External noise | External noise | Policy entropy | | Actor update frequency | Every step | Delayed | Every step | | Target smoothing | ✗ | ✓ | ✗ | ### SAC Objective (for comparison) $$ J(\pi) = \mathbb{E}_{\tau \sim \pi} \left[ \sum_{t=0}^{\infty} \gamma^t \left( r_t + \alpha \mathcal{H}(\pi(\cdot|s_t)) \right) \right] $$ Where $\mathcal{H}$ is entropy and $\alpha$ is the temperature parameter. ## Practical Implementation Notes ### Network Architecture **Critic Network:** $$ Q_\theta(s,a) = f_\theta(\text{concat}(s, a)) $$ Typical architecture: - Input: $\text{dim}(s) + \text{dim}(a)$ - Hidden layers: [256, 256] with ReLU - Output: 1 (scalar Q-value) **Actor Network:** $$ \pi_\phi(s) = \tanh(f_\phi(s)) \cdot a_{\max} $$ Typical architecture: - Input: $\text{dim}(s)$ - Hidden layers: [256, 256] with ReLU - Output: $\text{dim}(a)$ with tanh activation ### Common Failure Modes 1. **Insufficient exploration** - Symptom: Premature convergence to suboptimal policy - Solution: Increase exploration noise, use parameter noise 2. **Critic divergence** - Symptom: Q-values grow unboundedly - Solution: Reduce learning rate, gradient clipping 3. **Slow learning** - Symptom: Policy improves very slowly - Solution: Reduce policy delay, increase batch size ### Debugging Tips - Monitor Q-value statistics: $\mathbb{E}[Q]$, $\text{Var}[Q]$, $\max Q$ - Track actor and critic losses separately - Visualize learned policy periodically - Compare predicted vs actual returns ## When to Use TD3 ### Good Fit - Continuous control with dense rewards - Robotic manipulation and locomotion - When sample efficiency matters - Environments with smooth dynamics ### Consider Alternatives - **Discrete actions** → DQN, Rainbow, C51 - **Maximum exploration needed** → SAC - **Model available** → MBPO, Dreamer, MuZero - **Multi-task/Meta-learning** → MAML, RL² ## Equations ### Target Computation $$ y = r + \gamma \min_{i=1,2} Q_{\theta'_i}\left(s', \pi_{\theta'_\pi}(s') + \text{clip}(\epsilon, -c, c)\right) $$ ### Critic Loss $$ \mathcal{L}_{\text{critic}} = \frac{1}{N} \sum_{j=1}^{N} \left( Q_{\theta_i}(s_j, a_j) - y_j \right)^2 $$ ### Actor Loss $$ \mathcal{L}_{\text{actor}} = -\frac{1}{N} \sum_{j=1}^{N} Q_{\theta_1}(s_j, \pi_{\theta_\pi}(s_j)) $$ ### Soft Target Update $$ \theta' \leftarrow \tau \theta + (1 - \tau) \theta' $$

tddb testing,reliability

Time-dependent dielectric breakdown testing.

tdr, tdr, signal & power integrity

Time Domain Reflectometry measures impedance discontinuities along signal paths by analyzing reflected waveforms from incident step signals.

te-nas, te-nas, neural architecture search

Training-free ensemble NAS combines multiple zero-cost proxies improving architecture evaluation reliability.

teacher-student cl, advanced training

Teacher-student curriculum learning uses a teacher model to assess sample difficulty and guide curriculum design for student training.

teacher-student framework, model compression

General paradigm for distillation.