Home Knowledge Base Feature stores

Feature stores provide centralized infrastructure for managing ML features — storing, versioning, and serving feature data consistently between training and inference, solving the common problem of training-serving skew and enabling feature reuse across models and teams.

What Is a Feature Store?

Why Feature Stores Matter

Core Concepts

Feature Store Architecture:

┌─────────────────────────────────────────────────────────┐
│                   Feature Store                         │
├─────────────────────────────────────────────────────────┤
│  Feature Registry                                       │
│  - Feature definitions                                  │
│  - Metadata, owners                                     │
├─────────────────────────────────────────────────────────┤
│  Offline Store              │  Online Store            │
│  (Historical data)          │  (Low-latency serving)   │
│  - Training data            │  - Real-time features    │
│  - Batch features           │  - Key-value store       │
│  - Point-in-time lookups    │  - <10ms latency         │
└─────────────────────────────────────────────────────────┘

Feature Definition:

# Schema describing a feature
feature = Feature(
    name="user_purchase_count_30d",
    dtype=Int64,
    description="Number of purchases in last 30 days",
    owner="[email protected]",
    tags=["user", "commerce"]
)

Feast (Open Source Feature Store)

Define Features:

from feast import Entity, Feature, FeatureView, FileSource
from feast.types import Int64, Float32

# Define entity
user = Entity(
    name="user_id",
    join_keys=["user_id"],
    description="User identifier"
)

# Define data source
user_features_source = FileSource(
    path="s3://bucket/user_features.parquet",
    timestamp_field="event_timestamp"
)

# Define feature view
user_features = FeatureView(
    name="user_features",
    entities=[user],
    schema=[
        Feature(name="purchase_count_30d", dtype=Int64),
        Feature(name="avg_order_value", dtype=Float32),
        Feature(name="days_since_last_purchase", dtype=Int64),
    ],
    source=user_features_source,
    ttl=timedelta(days=1),
)

Use Features for Training:

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Get training data (point-in-time correct)
training_df = store.get_historical_features(
    entity_df=entity_df,  # user_ids + timestamps
    features=[
        "user_features:purchase_count_30d",
        "user_features:avg_order_value",
    ]
).to_df()

Use Features for Inference:

# Get features for real-time serving
online_features = store.get_online_features(
    features=[
        "user_features:purchase_count_30d",
        "user_features:avg_order_value",
    ],
    entity_rows=[{"user_id": 1234}]
).to_dict()

Training-Serving Skew Problem

Without Feature Store:

Training: SQL query computes features → model trains
Serving:  Python code re-computes features → model predicts

Problem: Different implementations = different values
Result:  Model performs worse in production than training

With Feature Store:

Training: Feature store provides historical features
Serving:  Feature store provides online features

Same computation, same values → consistent performance

Feature Store Options

Tool        | Type        | Best For
------------|-------------|----------------------------
Feast       | Open source | Self-managed, flexibility
Tecton      | Managed     | Enterprise, real-time
Databricks  | Managed     | Delta Lake users
SageMaker   | Managed     | AWS ecosystem
Vertex AI   | Managed     | GCP ecosystem
Hopsworks   | Open/Managed| Python-native

Best Practices

Feature Design:

- Name descriptively (user_purchase_count_30d)
- Document units and meaning
- Version features when logic changes
- Avoid leaking future information

Organization:

- Group features by entity
- Assign clear ownership
- Define data freshness SLAs
- Catalog features for discovery

Monitoring:

- Track feature freshness
- Alert on data quality issues
- Monitor online store latency
- Detect feature drift

Feature stores are critical infrastructure for production ML — they solve the insidious training-serving skew problem that silently degrades model performance, while enabling feature reuse that accelerates model development across an organization.

feature storefeastml featurestraining serving skewfeature engineeringoffline online

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.