Home Knowledge Base COGS (Compositional Generalization related to Semantic parsing)

COGS (Compositional Generalization related to Semantic parsing) is the semantic parsing benchmark for testing systematic compositional generalization — mapping English sentences to logical form representations (lambda calculus notation) with controlled splits that hold out specific lexical and structural combinations to measure whether models genuinely learn reusable syntactic and semantic rules or merely memorize training instances.

What Is COGS?

The Generalization Conditions

COGS tests 21 distinct generalization types:

Lexical Generalization (9 conditions):

Structural Generalization (12 conditions):

The Core Claim

Two types of language generalization are theoretically required for compositional competence:

1. Lexical Generalization: Understanding "dax" in "The dax was eaten" → dax(x) even though "dax" never appeared as an object-role noun in training. 2. Structural Generalization: Parsing "The girl that the hedgehog helped the cake for ate" — a structure with unseen depth of center-embedding — by applying known rules recursively.

Why Models Fail COGS

Performance Results

ModelLexical GeneralizationStructural GeneralizationOverall
LSTM seq2seq~65%~18%~35%
Transformer~75%~26%~45%
Pretrained BART~82%~41%~59%
LEAR (specialized)~97%~78%~85%
GPT-4 + CoT~92%~70%~82%

Why COGS Matters

Connection to SCAN, CFQ, and gSCAN

BenchmarkModalityOutput TypeGeneralization Split Design
SCANLanguageAction sequencesLexical holdout (verb)
gSCANLanguage+VisionNavigation actionsConcept combination
COGSLanguageLogical forms (λ-calculus)Lexical + structural
CFQLanguageSPARQL queriesCompound structure

COGS is stress-testing the syntax of meaning — using formal linguistic methods to determine whether AI models have internalized the syntactic rules that generate natural language structure or merely learned statistical co-occurrence patterns that collapse when presented with novel but grammatically valid constructions.

cogscogsevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.