Home Knowledge Base ReClor (Reading Comprehension from Examinations for Logical Reasoning)

ReClor (Reading Comprehension from Examinations for Logical Reasoning) is a benchmark built from standardized graduate-admissions exam questions, primarily LSAT and GMAT-style critical reasoning problems, designed to test whether AI systems can analyze arguments, identify assumptions, and perform structured logical reasoning rather than simple pattern matching. Introduced by Yu et al. in 2020, ReClor became one of the clearest stress tests for the gap between language fluency and genuine reasoning, because the benchmark is deliberately constructed from questions meant to fool intelligent humans, not to reward superficial lexical cues.

What ReClor Contains

Each ReClor example typically includes:

Typical question types:

This mirrors the structure of LSAT Logical Reasoning sections, where success depends on carefully modeling the argument rather than recalling facts.

Why ReClor Is Hard

ReClor is difficult because the wrong choices are intentionally crafted to look reasonable. A model must separate:

For example, in a weaken question, a distractor answer may mention the same nouns and context as the passage but not actually undermine the causal or logical link in the argument. Models that rely on semantic similarity often pick these distractors.

What Skills ReClor Measures

SkillWhy It Matters
Argument structure trackingIdentify premises, conclusions, and hidden assumptions
Counterfactual reasoningTest what happens if a new fact is introduced
Distractor resistanceIgnore plausible but irrelevant answer choices
Abstract reasoningGeneralize beyond surface wording
Careful readingSmall wording changes can reverse logical meaning

This makes ReClor different from ordinary reading comprehension. The challenge is not reading the passage, but reasoning about it correctly.

Historical Performance Trend

ReClor was especially notable because early transformer models that looked strong on many NLP benchmarks performed poorly:

Why the slow progress? Because ReClor penalizes shortcut learning. Many NLP benchmarks contain annotation artifacts or lexical regularities that models can exploit. ReClor, drawn from exam questions refined by humans to test reasoning, contains fewer such shortcuts.

Why ReClor Matters in the LLM Era

Modern LLMs are much better at ReClor than earlier models, especially when given:

But ReClor still matters because it probes a failure mode that remains important in production: a model can sound persuasive while following invalid reasoning. This matters in:

A fluent but logically weak model is dangerous in all of these domains.

Comparison With Related Benchmarks

BenchmarkFocusDifference From ReClor
MMLUBroad academic knowledgeMore breadth, less concentrated logical trap design
HellaSwagCommonsense completionMore world knowledge, less explicit argument structure
GSM8KArithmetic reasoningNumeric reasoning rather than verbal logic
LogiQALogical reasoning from textSimilar family, but ReClor is closely tied to LSAT and GMAT quality
ARCScience exam QAFact and reasoning mix, less adversarial logic structure

Main Limitations

Even with those limitations, ReClor remains one of the most respected benchmarks for verbal logical reasoning. It asks a sharper question than many general NLP tests: not whether a model can read, but whether it can follow an argument carefully enough to avoid being fooled by plausible nonsense.

reclorlogical reasoning benchmarkcritical reasoning evaluationlsat benchmarkargument reasoning ai

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.