Neural Radiance Fields (NeRF)

Neural Radiance Fields (NeRF) is the 3D scene representation that encodes a continuous volumetric scene as a neural network mapping 3D coordinates and viewing direction to color and density — enabling photorealistic novel view synthesis from a sparse set of input photographs through differentiable volume rendering.

NeRF Representation:
- Implicit Function: F(x,y,z,θ,φ) → (r,g,b,σ) maps spatial position (x,y,z) and viewing direction (θ,φ) to color (RGB) and volume density (σ); the neural network (typically 8-layer MLP with 256 hidden units) represents the entire scene as a continuous function
- View-Dependent Color: color depends on viewing direction to model specular reflections and view-dependent appearance; density depends only on position (geometry is view-independent); this separation is architecturally enforced by feeding direction only to later MLP layers
- Positional Encoding: raw coordinates are transformed via sinusoidal functions γ(x) = [sin(2⁰πx), cos(2⁰πx), ..., sin(2^(L-1)πx), cos(2^(L-1)πx)] with L=10 for position and L=4 for direction; without positional encoding, the MLP cannot learn high-frequency geometric and appearance details
- Scene Bounds: NeRF assumes a bounded scene; ray sampling is distributed within the scene bounds; unbounded scenes require specialized parameterization (mip-NeRF 360) that contracts distant regions into a bounded volume

Volume Rendering:
- Ray Marching: for each pixel, cast a ray from the camera through the image plane; sample N points (64 coarse + 64 fine) along the ray within the scene bounds; evaluate the MLP at each sample point to obtain (color, density)
- Alpha Compositing: pixel color C(r) = Σ_i T_i·α_i·c_i where α_i = 1-exp(-σ_i·δ_i), T_i = Π_{j<i}(1-α_j); T_i is transmittance (probability ray hasn't been absorbed yet); δ_i is distance between adjacent samples
- Hierarchical Sampling: coarse network samples uniformly; fine network samples more densely around surfaces detected by the coarse network (high-density regions); improves detail without wasting samples in empty space
- Differentiable Rendering: the entire rendering pipeline from ray → neural network → pixel color is differentiable; reconstruction loss |C_rendered - C_target|² provides supervision from 2D photographs

Speed Improvements:
- Instant-NGP (Hash Encoding): replaces the slow MLP positional encoding with a multi-resolution hash grid; spatial positions are hashed into learned feature tables at multiple resolutions; training time reduced from hours to seconds; inference enables real-time rendering (~60 fps)
- TensoRF: decomposes the radiance field into low-rank tensor components (vector-matrix factorization); 100× faster than original NeRF with comparable quality; memory-efficient representation enabling larger scenes
- Plenoxels: replaces the MLP entirely with a sparse voxel grid storing spherical harmonic coefficients; training via direct optimization of voxel values; removes neural network overhead entirely but requires more memory
- Baked NeRF: pre-computes the neural field into a discrete representation (sparse voxel grid, textured mesh) for real-time rendering; trades storage for interactive rendering speed

3D Gaussian Splatting:
- Explicit 3D Representation: represents the scene as millions of 3D Gaussian primitives, each with position, covariance (shape/orientation), opacity, and spherical harmonic color coefficients; fundamentally different from NeRF's implicit MLP representation
- Rasterization Pipeline: projects 3D Gaussians to 2D screen space and alpha-blends in depth order; differentiable rasterization enables end-to-end training from photographs; achieves real-time rendering (>100 fps at 1080p) through GPU-optimized splatting
- Adaptive Density: Gaussians are cloned (split large) and pruned (remove transparent) during training to adaptively adjust point density where scene complexity demands it; starts from SfM point cloud and densifies to capture fine details
- Quality vs Speed: matches or exceeds NeRF quality for novel view synthesis with 100-1000× faster rendering; enables VR/AR applications, game engine integration, and real-time scene exploration

NeRF and 3D Gaussian Splatting represent the revolution in neural 3D reconstruction — transforming sparse photographs into photorealistic, explorable 3D scenes, enabling applications from virtual reality to autonomous driving simulation to digital heritage preservation.

Want to learn more?