Reinforcement Learning for Routing is the application of RL algorithms to the NP-hard problem of connecting millions of nets on a chip while satisfying design rules, minimizing wirelength, avoiding congestion, and meeting timing constraints — training agents to make sequential routing decisions that learn from trial-and-error experience across thousands of designs, discovering routing strategies that outperform traditional maze routing and negotiation-based algorithms.
Routing Problem as MDP:
- State Space: current partial routing solution represented as multi-layer occupancy grids (which routing tracks are used), congestion maps (routing demand vs capacity), timing criticality maps (which nets require shorter paths), and design rule violation indicators; state dimensionality scales with die area and metal layer count
- Action Space: for each net segment, select routing path from source to target; actions include choosing metal layer, selecting wire track, inserting vias, and deciding detour routes to avoid congestion; hierarchical action decomposition breaks routing into coarse-grained (global routing) and fine-grained (detailed routing) decisions
- Reward Function: negative reward for wirelength (longer wires increase delay and power), congestion violations (routing overflow), design rule violations (spacing, width, via rules), and timing violations (nets missing slack targets); positive reward for successful net completion and overall routing quality metrics
- Episode Structure: each episode routes a complete design or a batch of nets; episodic return measures final routing quality; intermediate rewards provide learning signal during routing process; curriculum learning starts with simple designs and progressively increases complexity
RL Routing Architectures:
- Policy Network: convolutional neural network processes routing grid as image; graph neural network encodes netlist connectivity; attention mechanism identifies critical nets requiring priority routing; policy outputs probability distribution over routing actions for current net segment
- Value Network: estimates expected future reward from current routing state; guides exploration by identifying promising routing regions; trained via temporal difference learning (TD(λ)) or Monte Carlo returns from completed routing episodes
- Actor-Critic Methods: policy gradient algorithms (PPO, A3C) balance exploration and exploitation; actor network proposes routing actions; critic network evaluates action quality; advantage estimation reduces variance in policy gradient updates
- Model-Based RL: learns transition dynamics (how routing actions affect congestion and timing); enables planning via tree search or trajectory optimization; reduces sample complexity by simulating routing outcomes before committing to actions
Global Routing with RL:
- Coarse-Grid Routing: divides die into global routing cells (gcells); assigns nets to sequences of gcells; RL agent learns to route nets through gcell graph while balancing congestion across regions
- Congestion-Aware Routing: RL policy trained to predict and avoid congestion hotspots; learns that routing through congested regions early in the process creates problems for later nets; develops strategies like detour routing and layer assignment to distribute routing demand
- Multi-Net Optimization: traditional routers process nets sequentially (rip-up and reroute); RL can learn joint optimization strategies that consider interactions between nets; discovers that routing critical timing paths first and leaving flexibility for non-critical nets improves overall quality
- Layer Assignment: RL learns optimal metal layer usage patterns; lower layers for short local connections; upper layers for long global routes; via minimization to reduce resistance and manufacturing defects
Detailed Routing with RL:
- Track Assignment: assigns nets to specific routing tracks within gcells; RL learns design-rule-aware track selection that minimizes spacing violations and maximizes routing density
- Via Optimization: RL policy learns when to insert vias (layer changes) vs continuing on current layer; balances via count (fewer is better for reliability) against wirelength and congestion
- Timing-Driven Routing: RL agent learns to identify timing-critical nets from slack distributions; routes critical nets on preferred layers with lower resistance; shields critical nets from crosstalk by maintaining spacing from noisy nets
- Incremental Routing: RL handles engineering change orders (ECOs) by learning to reroute modified nets while minimizing disruption to existing routing; faster than full re-routing and maintains design stability
Training and Deployment:
- Offline Training: RL agent trained on dataset of 1,000-10,000 previous designs; learns general routing strategies applicable across design families; training time 1-7 days on GPU cluster with distributed RL (hundreds of parallel environments)
- Online Fine-Tuning: agent fine-tuned on current design during routing iterations; adapts to design-specific characteristics (congestion patterns, timing bottlenecks); 10-50 iterations of online learning improve results by 5-10% over offline policy
- Hybrid Approaches: RL handles high-level routing decisions (net ordering, layer assignment, congestion avoidance); traditional algorithms handle low-level details (exact track assignment, DRC fixing); combines RL's strategic planning with proven algorithmic efficiency
- Commercial Integration: research prototypes demonstrate 10-20% improvements in routing quality metrics; commercial adoption limited by training data requirements, runtime overhead, and validation challenges; gradual integration as ML-enhanced subroutines within traditional routers
Reinforcement learning for routing represents the next generation of routing automation — moving beyond fixed-priority negotiation-based algorithms to adaptive policies that learn optimal routing strategies from data, enabling routers to handle the increasing complexity of advanced-node designs with billions of routing segments and hundreds of design rule constraints.
Related Topics
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.