Docker and Kubernetes for ML

Docker and Kubernetes for ML provide containerization and orchestration infrastructure for deploying machine learning models at scale — packaging models with dependencies into portable containers and managing clusters of GPU-enabled nodes for production serving, training jobs, and auto-scaling inference workloads.

Why Containers for ML?

- Reproducibility: Same environment everywhere (dev, test, prod).
- Dependency Isolation: No conflicts between project requirements.
- Portability: Run anywhere containers run.
- Scaling: Deploy multiple instances easily.
- GPU Support: NVIDIA Container Toolkit enables GPU access.

Docker Basics for ML

Basic Dockerfile:
``dockerfile FROM nvidia/cuda:12.1-runtime-ubuntu22.04

# Install Python RUN apt-get update && apt-get install -y python3 python3-pip

# Install dependencies COPY requirements.txt . RUN pip3 install -r requirements.txt

# Copy application code COPY . /app WORKDIR /app

# Run inference server CMD ["python3", "serve.py"]`

Optimized Multi-Stage Build:`dockerfile # Build stage FROM python:3.10-slim AS builder COPY requirements.txt . RUN pip install --user -r requirements.txt

# Runtime stage FROM nvidia/cuda:12.1-runtime-ubuntu22.04 COPY --from=builder /root/.local /root/.local COPY . /app WORKDIR /app ENV PATH=/root/.local/bin:$PATH CMD ["python", "serve.py"]`

GPU in Docker:`bash # Install NVIDIA Container Toolkit distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

# Run with GPU access docker run --gpus all -it my-ml-image

# Specific GPUs docker run --gpus device=0,1 -it my-ml-image`

Docker Compose for ML:`yaml version: "3.8" services: inference: build: . deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ports: - "8000:8000" volumes: - ./models:/app/models environment: - MODEL_PATH=/app/models/model.pt`

Kubernetes for ML

Why Kubernetes?: - Scale inference across many nodes. - Manage GPU allocation automatically. - Self-healing: restart failed pods. - Load balancing across replicas. - Rolling updates without downtime.

Deployment Example:`yaml apiVersion: apps/v1 kind: Deployment metadata: name: llm-inference spec: replicas: 3 selector: matchLabels: app: llm-inference template: metadata: labels: app: llm-inference spec: containers: - name: inference image: my-registry/llm-server:v1 resources: limits: nvidia.com/gpu: 1 ports: - containerPort: 8000 readinessProbe: httpGet: path: /health port: 8000`

Service & Load Balancing:`yaml apiVersion: v1 kind: Service metadata: name: llm-service spec: selector: app: llm-inference ports: - port: 80 targetPort: 8000 type: LoadBalancer`

Horizontal Pod Autoscaler:`yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: llm-inference-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: llm-inference minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70`

ML Platforms on Kubernetes

`Platform | Purpose | Use Case -----------|----------------------------|----------------------- KServe | Model serving | Deploy models easily Kubeflow | Full ML pipeline | Training + serving Ray | Distributed compute | Large-scale training Seldon | ML deployment platform | Enterprise serving MLflow | Experiment tracking | Model versioning``

Best Practices

Container Best Practices:
- Use specific version tags, not :latest.
- Multi-stage builds to reduce image size.
- Don't include training data in images.
- Use .dockerignore to exclude unnecessary files.
- Health checks for readiness/liveness.

K8s Best Practices:
- Set resource requests AND limits.
- Use NVIDIA device plugin for GPU scheduling.
- Implement graceful shutdown for model unloading.
- Use PersistentVolumes for model storage.
- Monitor GPU memory usage.

Docker and Kubernetes are the production backbone of ML infrastructure — enabling reproducible deployments, horizontal scaling, and robust operations that transform ML experiments into reliable production systems.

Want to learn more?