Lean integration

Lean integration involves connecting large language models with the Lean proof assistant — a modern formal verification system for mathematics and software — enabling AI systems to generate formal proofs, verify mathematical statements, and translate between natural language and Lean's formal language.

What Is Lean?

- Lean is a proof assistant and programming language based on dependent type theory — developed by Leonardo de Moura at Microsoft Research.
- It's designed for formalizing mathematics — expressing theorems and proofs in a machine-checkable format.
- Mathlib: Lean's extensive mathematical library containing formalized definitions, theorems, and proofs across many areas of mathematics.
- Lean 4: The latest version combines theorem proving with practical programming — a unified language for proofs and programs.

Why Integrate LLMs with Lean?

- Accessibility: Lean's formal language is precise but difficult for non-experts — LLMs can provide a natural language interface.
- Proof Automation: LLMs can suggest tactics, complete proof steps, and find relevant lemmas — accelerating proof development.
- Autoformalization: LLMs can translate informal mathematical statements into Lean code — bridging informal and formal mathematics.
- Learning: LLMs trained on Lean proofs can learn proof strategies and mathematical reasoning patterns.

LLM + Lean Integration Approaches

- Tactic Suggestion: Given a proof state (current goal and hypotheses), the LLM suggests which Lean tactic to apply next.
``Proof state: ⊢ n + 0 = n LLM suggests: rw [add_zero] Result: Goal proven ✓`

- Proof Completion: Given a partial proof with holes, the LLM fills in the missing steps. - Lemma Retrieval: The LLM searches Mathlib for relevant lemmas that could help prove the current goal. - Natural Language to Lean: Translate informal mathematical statements into formal Lean code.`Input: "For all natural numbers n, n + 0 = n" Output: theorem add_zero_right (n : ℕ) : n + 0 = n``

- Lean to Natural Language: Explain Lean proofs in plain English for human understanding.

Key Projects

- LeanDojo: A platform for training and evaluating LLMs on Lean theorem proving — provides datasets, tools, and benchmarks.
- Lean Copilot: An LLM-powered assistant for Lean — suggests tactics and completes proofs within the Lean environment.
- ReProver: A retrieval-augmented LLM for Lean theorem proving — retrieves relevant premises from Mathlib.
- Draft-Sketch-Prove: A method where LLMs generate informal proof sketches that are then formalized in Lean.

How LLM-Lean Integration Works

1. Training: LLMs are trained on Lean code and proofs from Mathlib and other sources.
2. Proof State Encoding: The current proof state (goals, hypotheses, context) is encoded as text for the LLM.
3. Tactic Generation: The LLM generates candidate tactics or proof steps.
4. Execution: Tactics are executed in Lean to see if they make progress.
5. Iteration: The process repeats, with the LLM seeing the updated proof state after each tactic.
6. Verification: Lean verifies that the completed proof is correct.

Benefits

- Accelerated Formalization: LLMs can speed up the process of formalizing mathematics — reducing the effort required.
- Proof Discovery: LLMs can find proofs that humans might miss — exploring the proof space more thoroughly.
- Education: LLM-Lean systems can teach formal mathematics — providing hints, explanations, and feedback.
- Bridging Informal and Formal: Makes formal mathematics more accessible to mathematicians who don't know Lean.

Challenges

- Correctness: LLM-generated tactics may be invalid — Lean catches errors, but failed attempts waste computation.
- Context Limits: Proof states can be large — fitting them into LLM context windows is challenging.
- Library Knowledge: Effective proof requires knowing what's in Mathlib — LLMs must learn the library structure.
- Novel Proofs: LLMs may struggle with proofs requiring genuinely new insights not seen in training data.

Applications

- Mathematics Research: Formalizing new theorems and proofs — making mathematical knowledge machine-verifiable.
- Software Verification: Proving properties of programs written in Lean.
- Education: Interactive tutoring systems for learning formal mathematics.
- Automated Formalization: Converting textbooks and papers into formal Lean code.

Lean integration represents the cutting edge of AI-assisted mathematics — combining the creativity of LLMs with the rigor of formal verification to advance both fields.

Want to learn more?