Home Knowledge Base Prodigy

Prodigy is a scriptable annotation tool from Explosion AI (the creators of spaCy) that combines active learning, rapid micro-task annotation, and programmatic customization — enabling NLP engineers to collect high-quality training data efficiently by having machine learning models select the most valuable examples for human review, maximizing annotation ROI while producing custom datasets for NER, text classification, dependency parsing, and computer vision tasks.

What Is Prodigy?

Why Prodigy Matters

Core Prodigy Recipes

Named Entity Recognition (from scratch):

python -m prodigy ner.manual my_dataset blank:en data.jsonl --label PERSON,ORG,GPE
# Annotate spans — click to highlight, select label, press Enter to accept

NER with Active Learning (model in the loop):

python -m prodigy ner.correct my_dataset en_core_web_md data.jsonl --label ORG,PRODUCT
# Model pre-annotates, human corrects errors — much faster than from scratch

Text Classification (binary):

python -m prodigy textcat.manual my_dataset data.jsonl --label POSITIVE,NEGATIVE
# Press A (Accept/Positive), X (Reject/Negative), Space (skip) — 1000+ per hour

Prodigy Annotation UI Philosophy

Custom Recipe Example

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe("custom-classify")
def custom_recipe(dataset, source):
    def get_stream():
        for eg in JSONL(source):
            eg["options"] = [
                {"id": "urgent", "text": "Urgent"},
                {"id": "normal", "text": "Normal"},
                {"id": "low", "text": "Low Priority"}
            ]
            yield eg

    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "choice",
    }

Run: python -m prodigy custom-classify my_tickets data.jsonl

Prodigy vs Alternatives

FeatureProdigyLabel StudioScale AILabelbox
Active learningBuilt-inPluginNoLimited
Developer-orientedExcellentGoodLimitedLimited
PricingOne-time ~$490Free (open source)Usage-basedSubscription
Data ownershipFull (local)Full (self-hosted)SharedCloud
spaCy integrationNativeGoodNoLimited
Custom workflowsPython recipesTemplatesNoLimited
Annotation speedVery highHighHighHigh

When to Choose Prodigy

Prodigy is the annotation tool of choice for NLP engineers who prioritize efficiency, data ownership, and programmatic control over labeling workflows — by combining active learning's sample efficiency with a micro-task UI optimized for speed and a fully scriptable recipe system, Prodigy enables practitioners to collect the exact training data their models need in a fraction of the time required by traditional annotation approaches.

prodigyannotationactive

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.