Neural Radiance Fields (NeRF)

Neural Radiance Fields (NeRF) is a neural implicit representation that encodes a 3D scene as a continuous volumetric function mapping spatial coordinates and viewing directions to color and density, enabling photorealistic novel view synthesis from a sparse set of posed photographs — revolutionizing 3D reconstruction by replacing explicit mesh or point cloud representations with a compact neural network that captures complex geometry, materials, and lighting effects.

Core Architecture and Rendering:
- Input Representation: Each point in 3D space is represented as a 5D coordinate: spatial position (x, y, z) and viewing direction (theta, phi)
- MLP Network: A multilayer perceptron maps the 5D input to volume density (sigma) and view-dependent RGB color, typically using 8–10 fully connected layers with 256 units each
- Positional Encoding: Raw coordinates are transformed using sinusoidal functions at multiple frequencies (gamma encoding) to enable the network to capture high-frequency geometric and appearance details
- Volume Rendering: Cast rays from the camera through each pixel, sample points along each ray, query the MLP for density and color at each sample, and composite using classical volume rendering (alpha compositing with transmittance weighting)
- Hierarchical Sampling: Use a coarse network to identify regions of high density, then concentrate fine samples in those regions for efficient rendering

Training Process:
- Input Requirements: A set of photographs with known camera poses (obtained via structure-from-motion tools like COLMAP), typically 20–100 images for a single scene
- Photometric Loss: Minimize the mean squared error between rendered pixel colors and ground truth pixel colors across all training views
- Per-Scene Optimization: Each scene requires training a separate MLP from scratch, typically taking 1–2 days on a single GPU for the original NeRF formulation
- Regularization: Total variation, sparsity priors on density, and depth supervision (when available) improve geometry quality and reduce floater artifacts

Major Extensions and Variants:
- Instant-NGP: Replaces the MLP with a multi-resolution hash encoding, reducing training time from hours to seconds while maintaining quality
- Mip-NeRF: Reasons about the volume of each cone-traced pixel rather than individual rays, eliminating aliasing artifacts across scales
- 3D Gaussian Splatting: Represents the scene as millions of anisotropic 3D Gaussians, enabling real-time rendering at 100+ FPS while matching NeRF quality
- TensoRF: Decomposes the radiance field into low-rank tensor components, achieving compact representations with fast training
- Zip-NeRF: Combines mip-NeRF 360's anti-aliasing with Instant-NGP's hash grid for state-of-the-art unbounded scene reconstruction

Dynamic and Generative Extensions:
- D-NeRF / Nerfies: Extend NeRF to dynamic scenes by learning a deformation field that warps points from observation time to a canonical frame
- PixelNeRF / MVSNeRF: Condition the radiance field on image features, enabling generalization to new scenes without per-scene training
- DreamFusion: Use a pretrained 2D diffusion model as a prior (Score Distillation Sampling) to generate 3D objects from text descriptions
- Block-NeRF: Scale neural radiance fields to city-scale environments by decomposing into independently trained blocks with learned appearance harmonization

Applications:
- Virtual Reality and Telepresence: Capture real environments as NeRFs for immersive free-viewpoint exploration
- E-Commerce: Create photorealistic 3D product visualizations from a few smartphone photos
- Film and Visual Effects: Generate novel camera angles and relighting of captured scenes without physical reshooting
- Autonomous Driving: Reconstruct and simulate realistic driving scenarios for testing self-driving systems
- Cultural Heritage: Digitally preserve archaeological sites and artifacts with photorealistic detail

NeRF and its successors have fundamentally shifted 3D computer vision from explicit geometric reconstruction to learned implicit representations — achieving unprecedented photorealism in novel view synthesis while inspiring a new generation of real-time rendering techniques that bridge the gap between captured reality and interactive 3D content.

Want to learn more?