Home Knowledge Base Promptfoo

Promptfoo is an open-source command-line tool for systematically testing and evaluating LLM prompts across multiple models and providers — enabling developers to define test cases in YAML, run them against OpenAI, Anthropic, Ollama, and any other provider simultaneously, and get quantitative scores that replace "vibes-based" prompt engineering with data-driven iteration.

What Is Promptfoo?

Why Promptfoo Matters

Core Usage

Basic Configuration (promptfooconfig.yaml):

prompts:
  - "Summarize the following in one sentence: {{input}}"
  - "Provide a concise one-sentence summary of: {{input}}"

providers:
  - openai:gpt-4o
  - anthropic:claude-3-5-haiku-20241022
  - ollama:llama3

tests:
  - vars:
      input: "The quick brown fox jumps over the lazy dog near the riverbank."
    assert:
      - type: contains
        value: "fox"
      - type: llm-rubric
        value: "Is the summary accurate and under 20 words?"
  - vars:
      input: "Quarterly earnings exceeded analyst expectations by 15% on strong cloud revenue."
    assert:
      - type: regex
        value: "earnings|revenue|quarter"

Run with: npx promptfoo eval

Assertion Types

Red Teaming:

redteam:
  plugins:
    - harmful:hate      # Test for hate speech generation
    - jailbreak         # Test prompt injection resistance
    - pii:direct        # Test PII leakage
  strategies:
    - jailbreak
    - prompt-injection

CI/CD Integration:

# .github/workflows/eval.yml
- name: Run LLM Evals
  run: npx promptfoo eval --ci
  # Fails if any assertion fails — blocks PR merge

Promptfoo vs Alternatives

FeaturePromptfooBraintrustDeepEvalLangfuse
Open sourceYes (MIT)NoYesYes
CLI-firstYesNoYes (pytest)No
Multi-providerExcellentGoodGoodGood
Red teamingBuilt-inNoLimitedNo
CI/CD integrationExcellentGoodGoodGood
Setup timeMinutesHoursHoursHours

Promptfoo is the open-source evaluation tool that brings test-driven development discipline to prompt engineering — by making it trivial to define test cases, run them across multiple models, and integrate evaluation into CI/CD, promptfoo enables any developer to replace subjective prompt quality judgments with objective, reproducible, data-driven iteration.

promptfootestingeval

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.