Home Knowledge Base LogiQA

LogiQA is the logical reasoning benchmark sourced from the Chinese National Civil Service Examination (NCSE) — providing multiple-choice reading comprehension questions that require formal deductive and inductive reasoning, making it one of the most challenging standardized logic benchmarks for language models and a key test of whether models can approximate a logical inference engine.

What Is LogiQA?

The Five Logic Types Covered

Categorical Logic (Class Inclusion/Exclusion):

Conditional Logic (If-Then Chains):

Disjunctive Reasoning (Either-Or):

Causal Analysis:

Argument Evaluation:

Why LogiQA Is Hard for LLMs

Performance Results

ModelLogiQA 1.0 Accuracy
Random baseline25.0%
Human (NCSE examinees)~86%
RoBERTa-large35.3%
DAGN (graph-augmented)39.9%
GPT-3.5~58%
GPT-4~72%
GPT-4 + CoT~80%

LogiQA 2.0 Improvements

LogiQA 2.0 (2023) addresses weaknesses of the original:

ReClor Comparison

LogiQA is often paired with ReClor (from LSAT Logical Reasoning) for logic evaluation:

BenchmarkSourceScaleFocus
LogiQAChinese NCSE8.7kFormal deductive/inductive
ReClorLSAT6.1kAnalytical argument evaluation
AR-LSATLSAT2.0kConstraint satisfaction

All three require multi-step logical reasoning but differ in reasoning style — LogiQA emphasizes categorical and conditional logic, ReClor focuses on argument analysis.

Why LogiQA Matters

LogiQA is civil service logic for AI — adapting the rigorous deductive and inductive reasoning standards that governments use to select public administrators, providing language models with a demanding test of whether they can actually follow chains of formal logical argumentation.

logiqaevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.