Timing Closure

Timing Closure is the iterative physical design process of achieving timing slack ≥ 0 for all paths across all operating corners and modes — employing a combination of logic restructuring, gate sizing, buffer insertion, placement optimization, and clock skew scheduling to meet the target frequency while managing power and area trade-offs.

Timing Fundamentals:
- Setup Timing: data must arrive at the capture flip-flop before the clock edge minus setup time; setup slack = T_clk - (T_launch_clk + T_logic + T_setup - T_capture_clk); negative slack means the path is too slow and violates timing at the target frequency
- Hold Timing: data must remain stable for hold time after the clock edge; hold slack = (T_launch_clk + T_logic) - (T_capture_clk + T_hold); hold violations occur when data changes too quickly, typically at fast corners; cannot be fixed by frequency reduction
- Critical Path: the path with the worst (most negative) slack; determines maximum achievable frequency; timing closure focuses on fixing critical and near-critical paths (slack < 100ps) first
- Timing Corners: designs must meet timing across all PVT corners (slow-slow for setup, fast-fast for hold, typical for power); multi-corner optimization simultaneously considers all corners; advanced nodes require 10-20 corner analysis

Gate-Level Optimization:
- Gate Sizing (Upsizing): replace cells with higher drive strength versions to reduce delay; increases area and power but improves timing; automated sizing tools (Synopsys Design Compiler, Cadence Genus) upsize gates on critical paths while downsizing non-critical paths to recover area
- Threshold Voltage (Vt) Swapping: use low-Vt cells (faster, higher leakage) on critical paths and high-Vt cells (slower, lower leakage) on non-critical paths; multi-Vt optimization balances performance and leakage power; typical mix: 10-20% LVT, 60-70% RVT, 10-20% HVT
- Buffer Insertion: add buffers to split long wires and reduce RC delay; buffers also restore signal slew; optimal buffer insertion considers wire delay, buffer delay, and downstream load; Synopsys and Cadence tools use dynamic programming for optimal buffer placement
- Logic Restructuring: resynthesize critical paths with different logic structures; transform carry chains, rebalance trees, clone high-fanout gates; typically done early in physical synthesis before placement is locked

Physical Optimization:
- Placement Optimization: move cells closer to reduce wire length and delay; critical path cells placed in proximity with minimal detours; incremental placement refinement after each timing iteration; modern tools use analytical placement with timing-driven objectives
- Routing Optimization: minimize wire length on critical nets; use wider wires (lower resistance) or upper metal layers (lower RC) for critical paths; non-default routing rules (NDR) specify wider/spaced routing for specific nets
- Useful Skew: intentionally delay clock arrival to launching flip-flops relative to capturing flip-flops on critical paths; borrows time from the next cycle; can recover 5-15% frequency; Cadence Innovus and Synopsys ICC2 support automated useful skew optimization
- Pipelining: insert flip-flops to break long combinational paths into multiple shorter paths; increases latency but improves throughput and maximum frequency; requires RTL changes but provides the largest timing improvement (2-3× frequency possible)

Hold Fixing:
- Delay Cell Insertion: add delay buffers or delay cells on paths with hold violations; increases delay without affecting setup timing (if done carefully); hold fixing typically adds 2-5% area overhead
- Clock Skew Adjustment: reduce clock skew or reverse skew direction to fix hold violations; must ensure setup timing is not degraded; useful skew optimization considers both setup and hold simultaneously
- Minimum Delay Paths: hold violations often occur on short paths between nearby flip-flops; fixing requires adding delay without impacting critical setup paths; automated hold fixing tools insert minimum-sized delay cells
- Fast Corner Challenges: hold violations worsen at fast-fast corner (low Vt, high voltage, low temperature); must ensure hold slack ≥ 0 at fast corner while maintaining setup slack ≥ 0 at slow corner; conflicting requirements make hold fixing challenging

Advanced Techniques:
- Concurrent Clock and Data (CCD) Optimization: co-optimizes clock tree and data paths simultaneously; adjusts clock arrival times and data path delays together for optimal timing; more effective than sequential CTS followed by timing optimization
- Path-Based Analysis (PBA): analyzes each path individually with path-specific slew and delay values rather than using pessimistic worst-case values; recovers 50-200ps of timing margin by removing pessimism; essential for timing closure at advanced nodes
- Physically Aware Synthesis: performs logic synthesis with physical information (estimated placement and routing); reduces the gap between synthesis and physical implementation; Synopsys Fusion Compiler and Cadence Genus iSpatial provide physical synthesis
- Machine Learning Timing Prediction: ML models predict post-route timing from early design stages; enables faster design space exploration and reduces timing closure iterations; emerging capability in commercial EDA tools

Timing Closure Metrics:
- Worst Negative Slack (WNS): the most negative slack across all paths; primary metric for timing closure; target is WNS ≥ 0 with margin (typically +50ps to +100ps for design margin and variation)
- Total Negative Slack (TNS): sum of all negative slacks; indicates the overall timing health; TNS = 0 means all paths meet timing; large TNS indicates many failing paths requiring extensive optimization
- Number of Violating Paths: count of paths with negative slack; complements TNS by showing how widespread timing issues are; 1000 paths with -10ps each is easier to fix than 10 paths with -1000ps each
- Timing Margin: positive slack beyond zero; provides margin for process variation, aging, and design changes; typical target margin is 5-10% of clock period (50-100ps at 1GHz)

Timing closure is the most time-consuming and iterative phase of physical design — consuming 40-60% of implementation schedule at advanced nodes, requiring deep expertise in timing analysis, optimization techniques, and tool flows to achieve the target frequency while meeting power and area constraints.

Want to learn more?