MobileNet | ChipFoundryServices

Home› Knowledge Base› MobileNet

MobileNet is a family of efficient convolutional neural networks designed for mobile and edge deployment — using depthwise separable convolutions and width multipliers to dramatically reduce parameters and computation while maintaining competitive accuracy for vision tasks.

What Is MobileNet?

Definition: Lightweight CNN architecture for efficient inference.
Key Innovation: Depthwise separable convolutions.
Goal: Deploy vision models on mobile/edge devices.
Versions: MobileNetV1, V2, V3 (progressive improvements).

Why MobileNet

Size: 10-20× smaller than VGG/ResNet.
Speed: Real-time inference on mobile CPUs.
Accuracy: Competitive with much larger models.
Flexibility: Width/resolution multipliers for tuning.

Depthwise Separable Convolutions

Standard Convolution:

Input: H × W × C_in
Kernel: K × K × C_in × C_out
Output: H × W × C_out

Computation: H × W × K² × C_in × C_out

Depthwise Separable (MobileNet):

Step 1: Depthwise (spatial filtering per channel)
  Input: H × W × C_in
  Kernels: K × K × 1 (one per channel)
  Output: H × W × C_in
  Computation: H × W × K² × C_in

Step 2: Pointwise (1×1 convolution)
  Input: H × W × C_in
  Kernel: 1 × 1 × C_in × C_out
  Output: H × W × C_out
  Computation: H × W × C_in × C_out

Total: H × W × (K² + C_out) × C_in
Savings: ~K² (typically 8-9×)

Visual:

Standard Conv:
┌─────────┐     ┌─────────┐
│  Input  │ → K×K×C_in ×C_out → │ Output │
│ H×W×C_in│                │ H×W×C_out│
└─────────┘                └─────────┘

Depthwise Separable:
┌─────────┐   K×K×1    ┌─────────┐   1×1   ┌─────────┐
│  Input  │ → per ch → │ H×W×C_in│ → conv →│ H×W×C_out│
│ H×W×C_in│            └─────────┘         └─────────┘
└─────────┘           Depthwise         Pointwise

MobileNet Versions

V1 (2017):

# Core block
class MobileNetV1Block(nn.Module):
    def __init__(self, in_ch, out_ch, stride=1):
        super().__init__()
        self.depthwise = nn.Conv2d(in_ch, in_ch, 3, stride, 1, groups=in_ch)
        self.bn1 = nn.BatchNorm2d(in_ch)
        self.pointwise = nn.Conv2d(in_ch, out_ch, 1)
        self.bn2 = nn.BatchNorm2d(out_ch)
        self.relu = nn.ReLU6(inplace=True)
    
    def forward(self, x):
        x = self.relu(self.bn1(self.depthwise(x)))
        x = self.relu(self.bn2(self.pointwise(x)))
        return x

V2 (2018) - Inverted Residuals:

# Inverted residual with linear bottleneck
class InvertedResidual(nn.Module):
    def __init__(self, in_ch, out_ch, stride, expand_ratio):
        super().__init__()
        hidden_dim = in_ch * expand_ratio
        self.use_residual = stride == 1 and in_ch == out_ch
        
        layers = []
        if expand_ratio != 1:
            # Expansion
            layers.append(nn.Conv2d(in_ch, hidden_dim, 1))
            layers.append(nn.BatchNorm2d(hidden_dim))
            layers.append(nn.ReLU6())
        
        # Depthwise
        layers.append(nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim))
        layers.append(nn.BatchNorm2d(hidden_dim))
        layers.append(nn.ReLU6())
        
        # Projection (linear, no activation)
        layers.append(nn.Conv2d(hidden_dim, out_ch, 1))
        layers.append(nn.BatchNorm2d(out_ch))
        
        self.conv = nn.Sequential(*layers)
    
    def forward(self, x):
        if self.use_residual:
            return x + self.conv(x)
        return self.conv(x)

V3 (2019) - Neural Architecture Search:

Improvements:
- NAS-discovered architecture
- Hard-swish activation
- Squeeze-and-excite attention
- Modified last layers

Width and Resolution Multipliers

Scaling Options:

Width multiplier (α): Scale channels
  Channels = base_channels × α
  α ∈ {0.25, 0.5, 0.75, 1.0}

Resolution multiplier (ρ): Scale input size
  Input = 224 × ρ
  ρ ∈ {0.57, 0.71, 0.86, 1.0} → {128, 160, 192, 224}

Trade-off: Smaller = faster but less accurate

Using MobileNet

PyTorch:

import torch
from torchvision.models import mobilenet_v3_small, mobilenet_v3_large

# Small version
model = mobilenet_v3_small(pretrained=True)

# Large version
model = mobilenet_v3_large(pretrained=True)

# Modify for custom classes
model.classifier[-1] = nn.Linear(1024, num_classes)

TensorFlow/Keras:

from tensorflow.keras.applications import MobileNetV3Small

model = MobileNetV3Small(
    input_shape=(224, 224, 3),
    include_top=True,
    weights="imagenet"
)

Performance Comparison

Model            | Params  | MACs   | Top-1 Acc
-----------------|---------|--------|----------
VGG-16           | 138M    | 15.5G  | 71.5%
ResNet-50        | 25M     | 4.1G   | 76.1%
MobileNetV1 1.0  | 4.2M    | 569M   | 70.6%
MobileNetV2 1.0  | 3.4M    | 300M   | 72.0%
MobileNetV3-Large| 5.4M    | 219M   | 75.2%
MobileNetV3-Small| 2.5M    | 66M    | 67.4%

MobileNet is the foundational efficient architecture for mobile AI — its depthwise separable convolution innovation enabled practical on-device computer vision and inspired subsequent efficient architectures like EfficientNet and MobileViT.

mobilenetdepthwiseseparableefficientmobileedgev2v3

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All