Feature stores provide centralized infrastructure for managing ML features — storing, versioning, and serving feature data consistently between training and inference, solving the common problem of training-serving skew and enabling feature reuse across models and teams.
What Is a Feature Store?
- Definition: System for managing ML feature data lifecycle.
- Problem: Features computed differently in training vs. serving.
- Solution: Single source of truth for feature computation and storage.
- Components: Offline store (training) + online store (serving).
Why Feature Stores Matter
- Consistency: Same features in training and serving.
- Reusability: Compute once, use in many models.
- Efficiency: Avoid redundant feature computation.
- Governance: Track feature lineage and ownership.
- Speed: Pre-computed features for low-latency serving.
Core Concepts
Feature Store Architecture:
┌─────────────────────────────────────────────────────────┐
│ Feature Store │
├─────────────────────────────────────────────────────────┤
│ Feature Registry │
│ - Feature definitions │
│ - Metadata, owners │
├─────────────────────────────────────────────────────────┤
│ Offline Store │ Online Store │
│ (Historical data) │ (Low-latency serving) │
│ - Training data │ - Real-time features │
│ - Batch features │ - Key-value store │
│ - Point-in-time lookups │ - <10ms latency │
└─────────────────────────────────────────────────────────┘
Feature Definition:
# Schema describing a feature
feature = Feature(
name="user_purchase_count_30d",
dtype=Int64,
description="Number of purchases in last 30 days",
owner="[email protected]",
tags=["user", "commerce"]
)
Feast (Open Source Feature Store)
Define Features:
from feast import Entity, Feature, FeatureView, FileSource
from feast.types import Int64, Float32
# Define entity
user = Entity(
name="user_id",
join_keys=["user_id"],
description="User identifier"
)
# Define data source
user_features_source = FileSource(
path="s3://bucket/user_features.parquet",
timestamp_field="event_timestamp"
)
# Define feature view
user_features = FeatureView(
name="user_features",
entities=[user],
schema=[
Feature(name="purchase_count_30d", dtype=Int64),
Feature(name="avg_order_value", dtype=Float32),
Feature(name="days_since_last_purchase", dtype=Int64),
],
source=user_features_source,
ttl=timedelta(days=1),
)
Use Features for Training:
from feast import FeatureStore
store = FeatureStore(repo_path=".")
# Get training data (point-in-time correct)
training_df = store.get_historical_features(
entity_df=entity_df, # user_ids + timestamps
features=[
"user_features:purchase_count_30d",
"user_features:avg_order_value",
]
).to_df()
Use Features for Inference:
# Get features for real-time serving
online_features = store.get_online_features(
features=[
"user_features:purchase_count_30d",
"user_features:avg_order_value",
],
entity_rows=[{"user_id": 1234}]
).to_dict()
Training-Serving Skew Problem
Without Feature Store:
Training: SQL query computes features → model trains
Serving: Python code re-computes features → model predicts
Problem: Different implementations = different values
Result: Model performs worse in production than training
With Feature Store:
Training: Feature store provides historical features
Serving: Feature store provides online features
Same computation, same values → consistent performance
Feature Store Options
Tool | Type | Best For
------------|-------------|----------------------------
Feast | Open source | Self-managed, flexibility
Tecton | Managed | Enterprise, real-time
Databricks | Managed | Delta Lake users
SageMaker | Managed | AWS ecosystem
Vertex AI | Managed | GCP ecosystem
Hopsworks | Open/Managed| Python-native
Best Practices
Feature Design:
- Name descriptively (user_purchase_count_30d)
- Document units and meaning
- Version features when logic changes
- Avoid leaking future information
Organization:
- Group features by entity
- Assign clear ownership
- Define data freshness SLAs
- Catalog features for discovery
Monitoring:
- Track feature freshness
- Alert on data quality issues
- Monitor online store latency
- Detect feature drift
Feature stores are critical infrastructure for production ML — they solve the insidious training-serving skew problem that silently degrades model performance, while enabling feature reuse that accelerates model development across an organization.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.