Parallel Neural Architecture Search (NAS)

Parallel Neural Architecture Search (NAS) is the automated machine learning methodology that searches for optimal neural network architectures across a combinatorial design space using parallel evaluation across many processors or machines — automating the process of designing neural networks that traditionally required months of expert engineering intuition. By evaluating thousands of candidate architectures simultaneously on compute farms, NAS discovers architectures that outperform hand-designed networks on specific tasks and hardware targets, with modern one-shot and differentiable NAS methods reducing search cost from thousands of GPU-days to a few GPU-hours.

The NAS Problem

- Search space: Possible architectures defined by: layer types, connections, widths, depths, operations.
- Search strategy: How to select which architectures to evaluate.
- Performance estimation: How to evaluate each candidate architecture's quality.
- Objective: Find architecture maximizing accuracy subject to latency, memory, or FLOP constraints.

NAS Search Spaces

| Search Space | Description | Size |
|-------------|------------|------|
| Cell-based | Optimize repeating cell, stack N times | ~10²⁰ cells |
| Chain-structured | Each layer can be any block type | ~10¹⁰ |
| Full DAG | Arbitrary connections between layers | Exponential |
| Hardware-aware | Constrained to meet latency budget | Smaller |

NAS Strategies

1. Reinforcement Learning NAS (Original, Google 2017)
- Controller RNN generates architecture description as token sequence.
- Train child network on validation set → reward = validation accuracy.
- RL updates controller weights to generate better architectures.
- Cost: 500–2000 GPU-days → discovered NASNet architecture.
- Parallel: Evaluate 450 child networks simultaneously on 450 GPUs.

2. Evolutionary NAS
- Population of architectures → mutate + crossover → select best → repeat.
- AmoebaNet: Evolutionary search → discovered competitive image classification architecture.
- Easily parallelized: Evaluate whole population simultaneously.
- Cost: Hundreds of GPU-days.

3. One-Shot NAS (Weight Sharing)
- Train ONE supernetwork that contains all architectures as subgraphs.
- Sample sub-network from supernetwork → evaluate without training from scratch.
- Cost: Train supernetwork once (1–2 GPU-days) → search for free.
- Methods: SMASH (weight sharing), ENAS, SinglePath-NAS, FBNet.

4. DARTS (Differentiable Architecture Search)
- Relax discrete search space to continuous → each operation weighted by softmax.
- Jointly optimize architecture weights α and network weights W by gradient descent.
- After training: Discretize → keep highest-weight operations → final architecture.
- Cost: 4 GPU-days (vs. 2000 for RL-NAS).
- Variants: GDAS, PC-DARTS, iDARTS → improved efficiency and stability.

5. Hardware-Aware NAS
- Include hardware metric (latency, energy, memory) in objective function.
- ProxylessNAS, MNasNet, Once-for-All: Minimize (accuracy penalty + λ × hardware cost).
- Once-for-All: Train one supernetwork → specialize for different devices by subnet selection → no retraining.
- Used by: Apple MLX models, Google MobileNetV3, ARM EfficientNet.

Parallel NAS Infrastructure

- Hundreds of GPU workers evaluate candidate architectures simultaneously.
- Controller (RL) or search algorithm runs on separate CPU node → sends architecture specifications to workers.
- Workers: Train child network for N epochs → return validation accuracy → controller updates.
- Framework: Ray Tune, Optuna, BOHB (Bayesian + HyperBand) for parallel hyperparameter and architecture search.

HyperBand and ASHA

- Early stopping: Don't fully train all candidates → allocate more resources to promising ones.
- Successive Halving: Train all for r epochs → keep top 1/η → train for η×r epochs → repeat.
- ASHA (Asynchronous Successive HAlving): No synchronization barrier → workers continuously generate and evaluate → better GPU utilization.
- Result: Same search quality as full training at 10–100× lower GPU-hour cost.

NAS-discovered Architectures

| Architecture | Method | Target | Improvement |
|-------------|--------|--------|-------------|
| NASNet | RL NAS | ImageNet accuracy | +1% vs. ResNet |
| EfficientNet | Compound scaling + NAS | Accuracy+FLOPs | 8.4× fewer FLOPs |
| MobileNetV3 | Hardware-aware NAS | Mobile latency | Best accuracy@latency |
| GPT architecture | Human + empirical search | Language modeling | Foundational |

Parallel neural architecture search is the automated engineering discipline that democratizes deep learning design — by enabling compute to substitute for expert architectural intuition at scale, NAS has discovered efficient architectures for mobile vision (EfficientNet, MobileNet), edge AI (MCUNet), and specialized hardware (chip-specific networks), proving that systematic parallel search across architectural design spaces can consistently match or exceed the best hand-crafted designs, making automated architecture discovery an increasingly central tool in the ML engineer's arsenal.

Parallel Neural Architecture Search (NAS)

Want to learn more?