MobileNet is a family of efficient convolutional neural networks designed for mobile and edge deployment — using depthwise separable convolutions and width multipliers to dramatically reduce parameters and computation while maintaining competitive accuracy for vision tasks.
What Is MobileNet?
- Definition: Lightweight CNN architecture for efficient inference.
- Key Innovation: Depthwise separable convolutions.
- Goal: Deploy vision models on mobile/edge devices.
- Versions: MobileNetV1, V2, V3 (progressive improvements).
Why MobileNet
- Size: 10-20× smaller than VGG/ResNet.
- Speed: Real-time inference on mobile CPUs.
- Accuracy: Competitive with much larger models.
- Flexibility: Width/resolution multipliers for tuning.
Depthwise Separable Convolutions
Standard Convolution:
Input: H × W × C_in
Kernel: K × K × C_in × C_out
Output: H × W × C_out
Computation: H × W × K² × C_in × C_out
Depthwise Separable (MobileNet):
Step 1: Depthwise (spatial filtering per channel)
Input: H × W × C_in
Kernels: K × K × 1 (one per channel)
Output: H × W × C_in
Computation: H × W × K² × C_in
Step 2: Pointwise (1×1 convolution)
Input: H × W × C_in
Kernel: 1 × 1 × C_in × C_out
Output: H × W × C_out
Computation: H × W × C_in × C_out
Total: H × W × (K² + C_out) × C_in
Savings: ~K² (typically 8-9×)
Visual:
Standard Conv:
┌─────────┐ ┌─────────┐
│ Input │ → K×K×C_in ×C_out → │ Output │
│ H×W×C_in│ │ H×W×C_out│
└─────────┘ └─────────┘
Depthwise Separable:
┌─────────┐ K×K×1 ┌─────────┐ 1×1 ┌─────────┐
│ Input │ → per ch → │ H×W×C_in│ → conv →│ H×W×C_out│
│ H×W×C_in│ └─────────┘ └─────────┘
└─────────┘ Depthwise Pointwise
MobileNet Versions
V1 (2017):
# Core block
class MobileNetV1Block(nn.Module):
def __init__(self, in_ch, out_ch, stride=1):
super().__init__()
self.depthwise = nn.Conv2d(in_ch, in_ch, 3, stride, 1, groups=in_ch)
self.bn1 = nn.BatchNorm2d(in_ch)
self.pointwise = nn.Conv2d(in_ch, out_ch, 1)
self.bn2 = nn.BatchNorm2d(out_ch)
self.relu = nn.ReLU6(inplace=True)
def forward(self, x):
x = self.relu(self.bn1(self.depthwise(x)))
x = self.relu(self.bn2(self.pointwise(x)))
return x
V2 (2018) - Inverted Residuals:
# Inverted residual with linear bottleneck
class InvertedResidual(nn.Module):
def __init__(self, in_ch, out_ch, stride, expand_ratio):
super().__init__()
hidden_dim = in_ch * expand_ratio
self.use_residual = stride == 1 and in_ch == out_ch
layers = []
if expand_ratio != 1:
# Expansion
layers.append(nn.Conv2d(in_ch, hidden_dim, 1))
layers.append(nn.BatchNorm2d(hidden_dim))
layers.append(nn.ReLU6())
# Depthwise
layers.append(nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim))
layers.append(nn.BatchNorm2d(hidden_dim))
layers.append(nn.ReLU6())
# Projection (linear, no activation)
layers.append(nn.Conv2d(hidden_dim, out_ch, 1))
layers.append(nn.BatchNorm2d(out_ch))
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_residual:
return x + self.conv(x)
return self.conv(x)
V3 (2019) - Neural Architecture Search:
Improvements:
- NAS-discovered architecture
- Hard-swish activation
- Squeeze-and-excite attention
- Modified last layers
Width and Resolution Multipliers
Scaling Options:
Width multiplier (α): Scale channels
Channels = base_channels × α
α ∈ {0.25, 0.5, 0.75, 1.0}
Resolution multiplier (ρ): Scale input size
Input = 224 × ρ
ρ ∈ {0.57, 0.71, 0.86, 1.0} → {128, 160, 192, 224}
Trade-off: Smaller = faster but less accurate
Using MobileNet
PyTorch:
import torch
from torchvision.models import mobilenet_v3_small, mobilenet_v3_large
# Small version
model = mobilenet_v3_small(pretrained=True)
# Large version
model = mobilenet_v3_large(pretrained=True)
# Modify for custom classes
model.classifier[-1] = nn.Linear(1024, num_classes)
TensorFlow/Keras:
from tensorflow.keras.applications import MobileNetV3Small
model = MobileNetV3Small(
input_shape=(224, 224, 3),
include_top=True,
weights="imagenet"
)
Performance Comparison
Model | Params | MACs | Top-1 Acc
-----------------|---------|--------|----------
VGG-16 | 138M | 15.5G | 71.5%
ResNet-50 | 25M | 4.1G | 76.1%
MobileNetV1 1.0 | 4.2M | 569M | 70.6%
MobileNetV2 1.0 | 3.4M | 300M | 72.0%
MobileNetV3-Large| 5.4M | 219M | 75.2%
MobileNetV3-Small| 2.5M | 66M | 67.4%
MobileNet is the foundational efficient architecture for mobile AI — its depthwise separable convolution innovation enabled practical on-device computer vision and inspired subsequent efficient architectures like EfficientNet and MobileViT.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.