Home Knowledge Base Docker and Kubernetes for ML

Docker and Kubernetes for ML provide containerization and orchestration infrastructure for deploying machine learning models at scale — packaging models with dependencies into portable containers and managing clusters of GPU-enabled nodes for production serving, training jobs, and auto-scaling inference workloads.

Why Containers for ML?

Docker Basics for ML

Basic Dockerfile:

FROM nvidia/cuda:12.1-runtime-ubuntu22.04

# Install Python
RUN apt-get update && apt-get install -y python3 python3-pip

# Install dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# Copy application code
COPY . /app
WORKDIR /app

# Run inference server
CMD ["python3", "serve.py"]

Optimized Multi-Stage Build:

# Build stage
FROM python:3.10-slim AS builder
COPY requirements.txt .
RUN pip install --user -r requirements.txt

# Runtime stage
FROM nvidia/cuda:12.1-runtime-ubuntu22.04
COPY --from=builder /root/.local /root/.local
COPY . /app
WORKDIR /app
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "serve.py"]

GPU in Docker:

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

# Run with GPU access
docker run --gpus all -it my-ml-image

# Specific GPUs
docker run --gpus device=0,1 -it my-ml-image

Docker Compose for ML:

version: "3.8"
services:
  inference:
    build: .
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    environment:
      - MODEL_PATH=/app/models/model.pt

Kubernetes for ML

Why Kubernetes?:

Deployment Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-inference
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-inference
  template:
    metadata:
      labels:
        app: llm-inference
    spec:
      containers:
      - name: inference
        image: my-registry/llm-server:v1
        resources:
          limits:
            nvidia.com/gpu: 1
        ports:
        - containerPort: 8000
        readinessProbe:
          httpGet:
            path: /health
            port: 8000

Service & Load Balancing:

apiVersion: v1
kind: Service
metadata:
  name: llm-service
spec:
  selector:
    app: llm-inference
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

Horizontal Pod Autoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: llm-inference-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-inference
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

ML Platforms on Kubernetes

Platform   | Purpose                    | Use Case
-----------|----------------------------|-----------------------
KServe     | Model serving              | Deploy models easily
Kubeflow   | Full ML pipeline           | Training + serving
Ray        | Distributed compute        | Large-scale training
Seldon     | ML deployment platform     | Enterprise serving
MLflow     | Experiment tracking        | Model versioning

Best Practices

Container Best Practices:

K8s Best Practices:

Docker and Kubernetes are the production backbone of ML infrastructure — enabling reproducible deployments, horizontal scaling, and robust operations that transform ML experiments into reliable production systems.

docker mlkubernetescontainersgpu dockerkservekubeflowmodel servingdeployment

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.