Compositional Reasoning

Compositional Reasoning is the cognitive capability of solving complex problems by decomposing them into simpler sub-problems, solving each sub-problem independently, and combining the sub-solutions according to the compositional structure of the original problem — the fundamental reasoning ability that enables systematic generalization to novel combinations of known concepts, and the critical weakness of current language models that can master individual skills yet fail when those skills must be composed in unseen ways.

What Is Compositional Reasoning?

- Definition: Breaking complex problems into hierarchically organized components, solving each component using known skills or knowledge, and assembling the solutions following the structural relationships between components — mirroring how compositional semantics builds sentence meaning from word meanings.
- Systematic Generalization: The ability to recombine known primitives in novel ways — having seen "red circle" and "blue square," correctly handling "blue circle" despite never encountering that specific combination.
- Recursive Structure: Compositionality enables unbounded complexity from finite primitives — just as finite words generate infinite sentences through recursive grammar, finite reasoning skills generate unlimited problem-solving capability through composition.
- Decompose-Solve-Recompose: The canonical three-phase pattern: (1) parse the complex problem into its compositional structure, (2) solve each leaf sub-problem, (3) combine results according to the structural relationships.

Why Compositional Reasoning Matters

- Generalization to Novel Problems: Compositional reasoners solve problems they've never seen before by recombining known skills — non-compositional systems fail on any novel combination, regardless of component mastery.
- Scalable Complexity: Composed solutions scale to arbitrary complexity — once you can compose 2 steps, you can compose 20 steps using the same mechanism.
- LLM Weakness: Current LLMs demonstrate strong individual capabilities (math, retrieval, logic) but degrade rapidly when these must be composed — the "compositionality gap" where models fail on composed tasks despite mastering components.
- Trustworthy AI: Compositional reasoning is verifiable step-by-step — each sub-problem solution can be independently checked, unlike end-to-end black-box reasoning.
- Human-Like Reasoning: Human intelligence is fundamentally compositional — our ability to understand novel sentences, solve new math problems, and navigate unfamiliar situations relies on composing known concepts.

Compositional Reasoning in LLMs

Chain-of-Thought (CoT):
- Decomposes reasoning into sequential steps — each step is a simpler sub-problem.
- Implicit composition: the output of each step feeds into the next.
- Effective for 2-4 step compositions; degrades for longer chains.

Least-to-Most Prompting:
- Explicitly decompose the problem into ordered sub-questions.
- Solve from simplest to most complex, each building on previous answers.
- Better at longer chains than standard CoT — explicit decomposition prevents error accumulation.

Program-of-Thought:
- Decompose reasoning into executable code (Python) where each function is a sub-problem.
- Code execution guarantees correct combination of sub-solutions.
- Most reliable for mathematical composition — code prevents arithmetic error propagation.

Faithful Decomposition:
- Generate a decomposition plan before solving — make the compositional structure explicit.
- Verify that the decomposition faithfully captures the original problem's structure.
- Enables targeted error correction when a specific decomposition step fails.

Compositional Reasoning Benchmarks

| Benchmark | Task | Composition Type | LLM Performance |
|-----------|------|-----------------|----------------|
| SCAN | Command → action sequence | Spatial + sequential | Poor (without augmentation) |
| COGS | Sentence → logical form | Syntactic composition | Moderate |
| CFQ (Freebase) | NL → SPARQL query | Relational composition | Moderate-Good |
| GSM8K | Math word problems | Arithmetic + logic | Good (with CoT) |
| DROP | Reading comprehension | Extraction + comparison | Moderate |

Compositional Reasoning is the holy grail of artificial intelligence — the capability that would transform language models from impressive pattern matchers into genuine reasoning engines capable of systematic generalization, and the most important open problem in making AI systems that can reliably solve novel problems by composing the skills they have already mastered.

Want to learn more?