Home Knowledge Base CLUTRR (Compositional Language Understanding and Text-based Relational Reasoning)

CLUTRR (Compositional Language Understanding and Text-based Relational Reasoning) is the diagnostic benchmark for inductive reasoning over kinship relations — testing whether models can learn compositional rules from text (Mother of Father = Grandmother) and systematically generalize them to longer relationship chains never seen during training, directly probing the length generalization failure of transformer architectures.

What Is CLUTRR?

Example (2-hop training vs. 5-hop testing)

2-hop training story: "Sarah gives her son John a birthday card. John introduces Mary as his daughter." Question: "What is Sarah to Mary?" Answer: Grandmother. Derivation: Sarah → (mother of) → John → (grandfather of / parent of) → Mary (granddaughter). Wait: Sarah is mother of John. John is father of Mary. Sarah is Grandmother of Mary. ✓

5-hop test story: "Linda hugged her nephew Travis. Travis went to visit his son Robert. Robert's sister is Nina. Nina is married to Kevin. Kevin waved to his mother Carol." Question: "What is Linda to Carol?" Answer: Requires 5 composition steps: Linda → (aunt of) → Travis → (father of) → Robert → (brother of) → Nina → (daughter-in-law's husband's sister → ...). Requires systematic rule application.

Why Length Generalization Fails

Transformers exhibit a well-documented failure mode: they can learn 2-3 hop compositions but fail catastrophically on 5-7 hops. The reason:

Performance Results

Model2-hop3-hop5-hop10-hop
RoBERTa-large~98%~82%~48%~22%
Graph Neural Network~99%~95%~78%~45%
GPT-4 (few-shot CoT)~99%~97%~89%~68%
Symbolic solver100%100%100%100%

Why CLUTRR Matters

CLUTRR is automated genealogy as a reasoning stress test — using the universally understood domain of family relationships to precisely measure whether AI can learn logical composition rules that generalize to arbitrarily complex kinship chains, or whether it memorizes training configurations and fails when the chain grows longer than it has seen before.

clutrrclutrrevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.