Home Knowledge Base Qasper

Qasper is the question answering dataset over full NLP scientific papers — containing real questions asked by NLP researchers who had only seen the title and abstract of a paper, with answers grounded in the complete paper text including body paragraphs, figures, and tables, creating a direct benchmark for AI research assistant capabilities on technical scientific literature.

What Is Qasper?

Answer Types

Qasper classifies each answer into one of four types:

Type 1 — Extractive: The answer is a direct verbatim span from the paper.

Type 2 — Abstractive: The answer synthesizes information from multiple passages.

Type 3 — Boolean: Yes/No question with supporting evidence.

Type 4 — Unanswerable: The paper does not contain sufficient information to answer.

Why Qasper Is Challenging

Performance Results

ModelF1 (Overall)Extractive F1Boolean AccAbstractive F1
Longformer baseline28.8%35.2%72.4%14.6%
LED (Allenai)32.1%38.4%75.1%18.9%
GPT-3.5 (RAG)42.6%49.3%81.2%28.4%
GPT-4 (full paper)58.3%64.7%87.9%42.1%
Human annotator82.4%86.1%91.3%72.8%

Why Qasper Matters

Applications This Enables

Qasper is the literature review benchmark — measuring AI's ability to answer the specific technical questions that scientists ask about papers, grounded in complete paper text, setting the standard for AI research assistant tools that could transform how humans navigate and synthesize the scientific literature.

qasperevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.