Home Knowledge Base Cloze Task

Cloze Task is the psycholinguistic and reading comprehension assessment where participants fill in words deleted from a text — the direct intellectual ancestor of masked language modeling (MLM) that was formalized by Wilson Taylor in 1953 and scaled by BERT into the most influential self-supervised pre-training objective in modern NLP.

Historical Origins

Wilson L. Taylor introduced the Cloze Task in 1953 in "Cloze Procedure: A New Tool for Measuring Readability." The name derives from the Gestalt psychology concept of "closure" — the human tendency to mentally complete incomplete perceptual patterns. Taylor's insight was that a reader's ability to fill in deleted words from a text directly measures their comprehension of and familiarity with the language and content.

The original application was educational measurement: by deleting every N-th word from a passage (typically every 5th) and asking readers to fill in the blanks, readability researchers could quantify how accessible a text was to a given population without relying on subjective expert judgment.

Original Cloze Task Formats

Fixed-Ratio Deletion: Delete every 5th (or 7th, or 10th) word mechanically. Produces an objective, reproducible test. Example: "The quick brown fox [___] over the lazy [___]. It was [___] a beautiful [___]."

Rational Deletion: Select words for deletion based on semantic importance — delete nouns and verbs preferentially over function words. More targeted but requires human judgment in test construction.

Exact-Word Scoring: Only the original deleted word counts as correct. Strict, reliable, but penalizes synonyms that preserve meaning equally well.

Acceptable-Word Scoring: Any contextually appropriate word counts as correct. More generous and arguably measures comprehension more validly than exact matching, but requires human scoring.

The Bridge to Machine Learning: Pre-BERT Applications

Cloze format appeared in ML contexts before BERT. Key milestones:

Children's Book Test (CBT, 2015): Created from Project Gutenberg children's books. Questions ask models to choose the correct word (from 10 candidates) to fill a blank in a passage read aloud. Separate evaluations for named entities, common nouns, verbs, and prepositions allowed dissecting what types of context different model architectures could leverage.

CNN/Daily Mail Reading Comprehension (2015): Reformulated news article bullet-point summaries as cloze items over anonymized entity mentions — replacing named entities with placeholder symbols (Entity123) to prevent simple lookup. Established reading comprehension as a tractable ML benchmark using automatic cloze construction from existing editorial structure.

LAMBADA (2016): Predict the final word of a passage where the correct prediction requires understanding the entire preceding narrative context, not just the immediately preceding sentence. Specifically curated to require document-level comprehension rather than local context.

BERT and the Industrialization of Cloze

BERT (Devlin et al., 2018) transformed the cloze task from an evaluation tool into a training objective, scaling it to billions of examples:

Human Cloze vs. MLM: Key Differences

AspectTaylor's Cloze (1953)BERT MLM
Deletion methodEvery N-th wordRandom 15%
Target focusContent words (semantic)All tokens including function words
Context windowFull document512-token window
ScaleHundreds of sentencesBillions of tokens
EvaluationHuman judgmentCross-entropy loss
PurposeReadability measurementRepresentation learning
DirectionalitySequential readingFully bidirectional

Zero-Shot Evaluation via Cloze Format

Cloze format enables zero-shot evaluation of language models for factual knowledge:

The LAMA benchmark converts knowledge graph triples into cloze questions:

By measuring the probability a language model assigns to the correct answer vs. competitors in cloze format, researchers assess how much factual world knowledge was encoded during pre-training — without any fine-tuning or in-context examples.

Cloze in Major NLP Benchmarks

Cloze Task is the 1950s classroom exercise that became the foundation of modern language model pre-training — a fill-in-the-blank procedure designed to measure human reading comprehension that, when scaled to billions of examples with bidirectional context, teaches neural networks the statistical and semantic structure of natural language.

cloze tasknlp

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.